module moplayground.learning.startup


function train_policy

train_policy(config, env, eval_env, run)

Train a policy on the given environment.

Sets up the GPU, builds PPO/network parameters from config, saves the resolved config alongside the run, and dispatches to either the standard single-objective trainer (when config.mo2so.enabled is True — wrapping env/eval_env with Multi2SingleObjective) or the multi-objective mo_train loop.

Args:

  • config: Training config (ConfigDict). Must include save_dir, name, mo2so (with enabled and, if enabled, weighting), and learning_params.
  • env: Training environment.
  • eval_env: Evaluation environment used for periodic rollouts.
  • run: Experiment-tracking handle (e.g. a wandb run) forwarded to the multi-objective trainer; ignored on the single-objective path.

Returns: Tuple (make_inference_fn, params) — a factory that builds an inference function and the trained policy parameters.


This site uses Just the Docs, a documentation theme for Jekyll.