module moplayground.envs.generic.mobase
class MultiObjectiveBase
Base environment that emits a vector-valued reward.
Extends minimal_mjx.envs.generic.base.SwappableBase so concrete environments can be constructed against either NumPy or JAX backends. Per-step reward components are bucketed into objective groups defined by env_params.reward.optimization.objectives (one entry per output dimension of the reward vector) and an optional set of shared_objectives that are added to every dimension.
Args:
xml_path: Path to the MuJoCo XML model.env_params: ConfigDict with at leastreward.weights,reward.optimization.objectives(list of lists of reward keys, one list per objective dimension), andreward.optimization.shared_objectives(list of reward keys added to every objective).backend:'jnp'for JAX (training),'np'for NumPy (eval).num_free: Number of free joints in the model. Forwarded toSwappableBase.
method MultiObjectiveBase.__init__
__init__(
xml_path: pathlib.Path,
env_params: ml_collections.config_dict.config_dict.ConfigDict,
backend: str = 'jnp',
num_free: int = 3
)
property MultiObjectiveBase.action_size
Required action size for the environment
property MultiObjectiveBase.dt
Control timestep for the environment.
property MultiObjectiveBase.mj_model
property MultiObjectiveBase.mjx_model
property MultiObjectiveBase.n_substeps
Number of sim steps per control step.
property MultiObjectiveBase.observation_size
property MultiObjectiveBase.sim_dt
Simulation timestep for the environment.
property MultiObjectiveBase.unwrapped
property MultiObjectiveBase.xml_path
method MultiObjectiveBase.get_reward_and_metrics
get_reward_and_metrics(
rewards: jax.Array,
metrics: dict
) → tuple[jax.Array, dict[str, jax.Array]]
Combine per-key rewards into a vector reward plus updated metrics.
Each entry of self.objectives maps to one component of the returned reward vector, computed as a weighted sum of the listed per-key rewards. Shared objectives are then added to every component.
Args:
rewards: Mapping from reward key to scalar reward for the current step.metrics: Existing metrics dict to extend.
Returns:
(reward, metrics) where reward has shape (len(self.objectives),) and metrics is the updated metrics dict.
class Multi2SingleObjective
Wrap a multi-objective env to expose a scalar reward.
Replaces the vector reward from the wrapped environment with the inner product reward · weighting, so the wrapped env can be plugged into standard single-objective PPO. All other attributes/methods are delegated to the underlying env via __getattr__.
Args:
env: AMultiObjectiveBase(or compatible) environment whosereset/stepreturn states with vector rewards.weighting: Per-objective weights, length must match the env’s reward dimension.
method Multi2SingleObjective.__init__
__init__(env, weighting)
method Multi2SingleObjective.reset
reset(rng)
method Multi2SingleObjective.step
step(state, action)