module moplayground.envs.generic.mobase


class MultiObjectiveBase

Base environment that emits a vector-valued reward.

Extends minimal_mjx.envs.generic.base.SwappableBase so concrete environments can be constructed against either NumPy or JAX backends. Per-step reward components are bucketed into objective groups defined by env_params.reward.optimization.objectives (one entry per output dimension of the reward vector) and an optional set of shared_objectives that are added to every dimension.

Args:

  • xml_path: Path to the MuJoCo XML model.
  • env_params: ConfigDict with at least reward.weights, reward.optimization.objectives (list of lists of reward keys, one list per objective dimension), and reward.optimization.shared_objectives (list of reward keys added to every objective).
  • backend: 'jnp' for JAX (training), 'np' for NumPy (eval).
  • num_free: Number of free joints in the model. Forwarded to SwappableBase.

method MultiObjectiveBase.__init__

__init__(
    xml_path: pathlib.Path,
    env_params: ml_collections.config_dict.config_dict.ConfigDict,
    backend: str = 'jnp',
    num_free: int = 3
)

property MultiObjectiveBase.action_size

Required action size for the environment


property MultiObjectiveBase.dt

Control timestep for the environment.


property MultiObjectiveBase.mj_model


property MultiObjectiveBase.mjx_model


property MultiObjectiveBase.n_substeps

Number of sim steps per control step.


property MultiObjectiveBase.observation_size


property MultiObjectiveBase.sim_dt

Simulation timestep for the environment.


property MultiObjectiveBase.unwrapped


property MultiObjectiveBase.xml_path


method MultiObjectiveBase.get_reward_and_metrics

get_reward_and_metrics(
    rewards: jax.Array,
    metrics: dict
)  tuple[jax.Array, dict[str, jax.Array]]

Combine per-key rewards into a vector reward plus updated metrics.

Each entry of self.objectives maps to one component of the returned reward vector, computed as a weighted sum of the listed per-key rewards. Shared objectives are then added to every component.

Args:

  • rewards: Mapping from reward key to scalar reward for the current step.
  • metrics: Existing metrics dict to extend.

Returns: (reward, metrics) where reward has shape (len(self.objectives),) and metrics is the updated metrics dict.


class Multi2SingleObjective

Wrap a multi-objective env to expose a scalar reward.

Replaces the vector reward from the wrapped environment with the inner product reward · weighting, so the wrapped env can be plugged into standard single-objective PPO. All other attributes/methods are delegated to the underlying env via __getattr__.

Args:

  • env: A MultiObjectiveBase (or compatible) environment whose reset/step return states with vector rewards.
  • weighting: Per-objective weights, length must match the env’s reward dimension.

method Multi2SingleObjective.__init__

__init__(env, weighting)

method Multi2SingleObjective.reset

reset(rng)

method Multi2SingleObjective.step

step(state, action)

This site uses Just the Docs, a documentation theme for Jekyll.