`module` `moplayground.learning.startup`

`function` `train_policy`

train_policy(config, env, eval_env, run)

Train a policy on the given environment.

Sets up the GPU, builds PPO/network parameters from config, saves the resolved config alongside the run, and dispatches to either the standard single-objective trainer (when config.mo2so.enabled is True — wrapping env/eval_env with Multi2SingleObjective) or the multi-objective mo_train loop.

Args:

config: Training config (ConfigDict). Must include save_dir, name, mo2so (with enabled and, if enabled, weighting), and learning_params.
env: Training environment.
eval_env: Evaluation environment used for periodic rollouts.
run: Experiment-tracking handle (e.g. a wandb run) forwarded to the multi-objective trainer; ignored on the single-objective path.

Returns: Tuple (make_inference_fn, params) — a factory that builds an inference function and the trained policy parameters.

module moplayground.learning.startup

function train_policy

`module` `moplayground.learning.startup`

`function` `train_policy`