module moplayground.learning.startup
function train_policy
train_policy(config, env, eval_env, run)
Train a policy on the given environment.
Sets up the GPU, builds PPO/network parameters from config, saves the resolved config alongside the run, and dispatches to either the standard single-objective trainer (when config.mo2so.enabled is True — wrapping env/eval_env with Multi2SingleObjective) or the multi-objective mo_train loop.
Args:
config: Training config (ConfigDict). Must includesave_dir,name,mo2so(withenabledand, if enabled,weighting), andlearning_params.env: Training environment.eval_env: Evaluation environment used for periodic rollouts.run: Experiment-tracking handle (e.g. a wandb run) forwarded to the multi-objective trainer; ignored on the single-objective path.
Returns:
Tuple (make_inference_fn, params) — a factory that builds an inference function and the trained policy parameters.