Curriculum#
Syllabus’s Curriculum API is a unified interface for curriculum learning methods. Curricula following this API can be used with all of Syllabus’s infrastructure. We hope that future curriculum learning research will provide implementations following this API to encourage reproducibility and ease of use.
The full documentation for the curriculum class can be found Global Synchronization
The Curriculum class has three main jobs:
Maintain a sampling distribution over the task space.
Incorporate feedback from the environments or training process to update the sampling distribution.
Provide a sampling interface for the environment to draw tasks from.
In reality, the sampling distribution can be whatever you want, such as a uniform distribution, a deterministic sequence of tasks, or a single constant task depending on the curriculum learning method.
To incorporate feedback from the environment, the API provides multiple methods:
Curriculum#
- class syllabus.core.curriculum_base.Curriculum(task_space: TaskSpace, random_start_tasks: int = 0, task_names: Callable | None = None, record_stats: bool = False)[source]#
Bases:
object
Base class and API for defining curricula to interface with Gym environments.
- add_agent(agent: Agent)[source]#
Add an agent to the curriculum.
- Parameters:
agent – Agent to add to the curriculum
- Return agent_id:
Identifier of the added agent
- get_agent(agent_id: int) Agent [source]#
Load an agent from the buffer of saved agents.
- Parameters:
agent_id – Identifier of the agent to load
- Returns:
Loaded agent
- log_metrics(writer, logs: List[Dict], step: int | None = None, log_n_tasks: int = 1)[source]#
Log the task distribution to the provided writer.
- Parameters:
writer – Tensorboard summary writer or wandb object
logs – Cumulative list of logs to write
step – Global step number
log_n_tasks – Maximum number of tasks to log, defaults to 1. Use -1 to log all tasks.
- Returns:
Updated logs list
- normalize(reward: float, task: Any) float [source]#
Normalize reward by task.
- Parameters:
reward – Reward to normalize
task – Task for which the reward was received
- Returns:
Normalized reward
- property num_tasks: int#
Counts the number of tasks in the task space.
- Returns:
Returns the number of tasks in the task space if it is countable, TODO: -1 otherwise
- property requires_step_updates: bool#
Returns whether the curriculum requires step updates from the environment.
- Returns:
True if the curriculum requires step updates, False otherwise
- sample(k: int = 1) List | Any [source]#
Sample k tasks from the curriculum.
- Parameters:
k – Number of tasks to sample, defaults to 1
- Returns:
Either returns a single task if k=1, or a list of k tasks
- property tasks: List[tuple]#
List all of the tasks in the task space.
- Returns:
List of tasks if task space is enumerable, TODO: empty list otherwise?
- update_on_episode(episode_return: float, length: int, task: Any, progress: float | bool, env_id: int | None = None) None [source]#
Update the curriculum with episode results from the environment.
- Parameters:
episode_return – Episodic return
length – Length of the episode
task – Task for which the episode was completed
progress – Progress toward completion or success rate of the given task. 1.0 or True typically indicates a complete task.
env_id – Environment identifier
- update_on_step(task: Any, obs: Any, rew: float, term: bool, trunc: bool, info: dict, progress: float | bool, env_id: int | None = None) None [source]#
Update the curriculum with the current step results from the environment.
- Parameters:
obs – Observation from the environment
rew – Reward from the environment
term – True if the episode ended on this step, False otherwise
trunc – True if the episode was truncated on this step, False otherwise
info – Extra information from the environment
progress – Progress toward completion or success rate of the given task. 1.0 or True typically indicates a complete task.
env_id – Environment identifier
- Raises:
NotImplementedError –
- update_on_step_batch(step_results: Tuple[List[Any], List[Any], List[int], List[bool], List[bool], List[Dict], List[int]], env_id: int | None = None) None [source]#
Update the curriculum with a batch of step results from the environment.
This method can be overridden to provide a more efficient implementation. It is used as a convenience function and to optimize the multiprocessing message passing throughput.
- Parameters:
step_results – List of step results
env_id – Environment identifier
- update_task_progress(task: Any, progress: float | bool, env_id: int | None = None) None [source]#
Update the curriculum with a task and its progress. This is used for binary tasks that can be completed mid-episode.
- Parameters:
task – Task for which progress is being updated.
progress – Progress toward completion or success rate of the given task. 1.0 or True typically indicates a complete task.
env_id – Environment identifier