Environment Synchronization#

The environment synchronization wrappers collect and send the data that will eventually be used to call the update_on_step, update_on_episode, and update_on_progress methods of the Curriculum. These wrappers also request a new task at the start of each episode in the reset method before calling the environment’s reset method. Since updating the curriculum is separate from the main training process, we can batch updates without slowing down training to more efficiently transfer data between processes. Episode and task updates are sent immediately, but step updates are batched for efficient transfer. The size of these batches can be controlled by the batch_size argument in the synchronization wrapper initializer. Additionally, the environment synchronization wrapper will not send any step updates if the curriculum’s requires_step_updates method is not implemented or returns False.

The environment synchronization wrapper also has a buffer_size argument, which controls how many tasks will be in the task queue at any given time. Increasing this can be useful if you want to reduce the amount of time the environment spends waiting for a new task. If the buffer is too small, the environment may run out of tasks and be forced to wait for the curriculum to send more. However, increasing the buffer_size will also cause the environment to be less responsive to changes in the curriculum. Increasing the buffer_size by one effectively causes the environment to use a task distribution from a number of episodes in the past equal to the number of parallel environments. We have seen some evidence that this delay can impact the performance of methods like PLR, so we recommend setting this value to at most 2. If you are seeing the environment waiting for tasks often, then check if your update queue size is spiking. If it is, you may need to optimize your curriculum because it is having trouble keeping up with the number of updates it is receiving.

class syllabus.core.environment_sync_wrapper.GymnasiumSyncWrapper(env, task_space: TaskSpace, components: MultiProcessingComponents, batch_size: int = 100, buffer_size: int = 2, remove_keys: list | None = None, change_task_on_completion: bool = False, global_task_completion: Callable[[Curriculum, ndarray, float, bool, Dict[str, Any]], bool] | None = None)[source]#

Bases: Wrapper

This wrapper is used to set the task on reset for a Gym environments running on parallel processes created using multiprocessing.Process. Meant to be used with a QueueLearningProgressCurriculum running on the main process.

get_task()[source]#
reset(*args, **kwargs)[source]#

Uses the reset() of the env that can be overwritten to change the returned data.

step(action)[source]#

Uses the step() of the env that can be overwritten to change the returned data.

class syllabus.core.environment_sync_wrapper.PettingZooSyncWrapper(env, task_space: TaskSpace, components: MultiProcessingComponents, batch_size: int = 100, buffer_size: int = 2, remove_keys: list | None = None, change_task_on_completion: bool = False, global_task_completion: Callable[[Curriculum, ndarray, float, bool, Dict[str, Any]], bool] | None = None)[source]#

Bases: BaseParallelWrapper

This wrapper is used to set the task on reset for a Gym environments running on parallel processes created using multiprocessing.Process. Meant to be used with a QueueLearningProgressCurriculum running on the main process.

get_task()[source]#
reset(*args, **kwargs)[source]#

Resets the environment.

And returns a dictionary of observations (keyed by the agent name)

step(actions)[source]#

Receives a dictionary of actions keyed by the agent name.

Returns the observation dictionary, reward dictionary, terminated dictionary, truncated dictionary and info dictionary, where each dictionary is keyed by the agent.

class syllabus.core.environment_sync_wrapper.RayGymnasiumSyncWrapper(env, update_on_step: bool = True, task_space: Space | None = None, global_task_completion: Callable[[Curriculum, ndarray, float, bool, Dict[str, Any]], bool] | None = None)[source]#

Bases: Wrapper

This wrapper is used to set the task on reset for a Gym environments running on parallel processes created using ray. Meant to be used with a RayLearningProgressCurriculum running on the main process.

change_task(new_task)[source]#

Changes the task of the existing environment to the new_task.

Each environment will implement tasks differently. The easiest system would be to call a function or set an instance variable to change the task.

Some environments may need to be reset or even reinitialized to change the task. If you need to reset or re-init the environment here, make sure to check that it is not in the middle of an episode to avoid unexpected behavior.

reset(*args, **kwargs)[source]#

Uses the reset() of the env that can be overwritten to change the returned data.

step(action)[source]#

Uses the step() of the env that can be overwritten to change the returned data.

class syllabus.core.environment_sync_wrapper.RayPettingZooSyncWrapper(env, task_space: TaskSpace, update_on_step: bool = True, global_task_completion: Callable[[Curriculum, ndarray, float, bool, Dict[str, Any]], bool] | None = None)[source]#

Bases: BaseParallelWrapper

This wrapper is used to set the task on reset for a Gym environments running on parallel processes created using ray. Meant to be used with a RayLearningProgressCurriculum running on the main process.

change_task(new_task)[source]#

Changes the task of the existing environment to the new_task.

Each environment will implement tasks differently. The easiest system would be to call a function or set an instance variable to change the task.

Some environments may need to be reset or even reinitialized to change the task. If you need to reset or re-init the environment here, make sure to check that it is not in the middle of an episode to avoid unexpected behavior.

reset(*args, **kwargs)[source]#

Resets the environment.

And returns a dictionary of observations (keyed by the agent name)

step(action)[source]#

Receives a dictionary of actions keyed by the agent name.

Returns the observation dictionary, reward dictionary, terminated dictionary, truncated dictionary and info dictionary, where each dictionary is keyed by the agent.