CLASS Args(
exp_name: Optional[str] = None,
seed: int = 1,
torch_deterministic: bool = True,
cuda: bool = True,
track: bool = False,
wandb_project_name: str = "ManiSkill",
wandb_entity: Optional[str] = None,
capture_video: bool = True,
save_model: bool = True,
evaluate: bool = False,
checkpoint: Optional[str] = None,
env_id: str = "PickCube-v1",
total_timesteps: int = 10000000,
learning_rate: float = 3e-4,
num_envs: int = 512,
num_eval_envs: int = 8,
partial_reset: bool = True,
eval_partial_reset: bool = False,
num_steps: int = 50,
num_eval_steps: int = 50,
reconfiguration_freq: Optional[int] = None,
eval_reconfiguration_freq: Optional[int] = 1,
control_mode: Optional[str] = "pd_joint_delta_pos",
anneal_lr: bool = False,
gamma: float = 0.8,
gae_lambda: float = 0.9,
num_minibatches: int = 32,
update_epochs: int = 4,
norm_adv: bool = True,
clip_coef: float = 0.2,
clip_vloss: bool = False,
ent_coef: float = 0.0,
vf_coef: float = 0.5,
max_grad_norm: float = 0.5,
target_kl: float = 0.1,
reward_scale: float = 1.0,
eval_freq: int = 25,
save_train_video_freq: Optional[int] = None,
finite_horizon_gae: bool = True,
batch_size: int = 0,
minibatch_size: int = 0,
num_iterations: int = 0)
Args类用于配置和管理PPO算法训练过程中的超参数和设置选项,支持通过命令行设置参数。
该类使用@dataclass
装饰器,可以方便地管理和访问各类训练参数。
Parameters
- exp_name (str, optional) – 实验名称;如果为None,将使用脚本文件名作为实验名称。(default:
None
) - seed (int, optional) – 随机种子,用于确保实验可重复性。(default:
1
) - torch_deterministic (bool, optional) – 是否启用Torch的确定性计算。如果为True,将设置cudnn.deterministic=True。(default:
True
) - cuda (bool, optional) – 是否使用CUDA进行训练。如果为True且可用,将使用GPU训练。(default:
True
) - track (bool, optional) – 是否使用 Weights & Biases(W&B)追踪实验。如果为True,将记录训练过程到W&B平台。(default:
False
) - env_id (str, optional) – 训练环境的ID。指定要使用的ManiSkill环境。(default:
"PickCube-v1"
) - total_timesteps (int, optional) – 总训练步数;决定整个训练过程的长度。(default:
10000000
) - learning_rate (float, optional) – 学习率;用于优化器的参数更新。(default:
3e-4
) - num_envs (int, optional) – 并行环境数量;用于同时收集训练数据的环境副本数。(default:
512
) - num_steps (int, optional) – 每次策略更新前在每个环境中收集的步数。(default:
50
) - gamma (float, optional) – 折扣因子。用于计算未来奖励的衰减。(default:
0.8
) - gae_lambda (float, optional) – GAE(Generalized Advantage Estimation)的λ参数。(default:
0.9
) - clip_coef (float, optional) – PPO算法中的裁剪系数;用于限制策略更新的幅度。(default:
0.2
) - ent_coef (float, optional) – 熵系数;用于鼓励探索的正则化项。(default:
0.0
) - vf_coef (float, optional) – 价值函数系数;平衡策略损失和价值损失的权重。(default:
0.5
) - max_grad_norm (float, optional) – 梯度裁剪的最大范数;用于防止梯度爆炸。(default:
0.5
) - target_kl (float) - 目标KL散度阈值。(default:
0.1
) - reward_scale (float) - 奖励缩放因子。(default:
1.0
) - eval_freq (int) - 评估频率(以迭代次数计)。(default:
25
) - save_train_video_freq (Optional[int]) - 保存训练视频的频率;如果为None则不保存训练视频。(default:
None
) - finite_horizon_gae (bool) - 是否使用有限视野GAE。(default:
True
)
- NOTE
batch_size
,minibatch_size
和num_iterations
这些参数会在运行时根据其他参数自动计算,不需要直接设置。