Learning Progress#

This is an implementation of the curriculum introduced in the paper Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft (Kanitscheider et al 2021). It has been used to achieve strong performance in minecraft without offline data.It maintains a fast and slow exponential moving average (EMA) of the task completion rates for a set of discrete tasks. By measuring the difference between the fast and slow EMAs and reweighting it to adjust for the time delay created by the EMA, this method can estimate the learning progress of a task. If the difference is positive, the agent is learning to solve the task. If the difference is negative, the agent is forgetting how to solve a task. To improve performance in both cases, the curriculum samples tasks according to the magnitude of the learning progress. You can reference the paper for more details. Syllabus’s implementation is based on the open-source implementation used for OMNI that can be found here: https://github.com/jennyzzt/omni.

class syllabus.curricula.learning_progress.LearningProgress(eval_envs, evaluator, *args, ema_alpha=0.1, eval_interval=None, eval_interval_steps=None, **kwargs)[source]#

Bases: Curriculum

Provides an interface for tracking success rates of discrete tasks and sampling tasks based on their success rate using the method from https://arxiv.org/abs/2106.14876. TODO: Support task spaces aside from Discrete

update_on_episode(episode_return: float, length: int, task: Any, progress: float | bool, env_id: int | None = None) → None[source]#

Update the curriculum with episode results from the environment.

Parameters:

episode_return – Episodic return
length – Length of the episode
task – Task for which the episode was completed
progress – Progress toward completion or success rate of the given task. 1.0 or True typically indicates a complete task.
env_id – Environment identifier

update_task_progress(task: int, progress: float | bool, env_id: int | None = None)[source]#: Update the success rate for the given task using a fast and slow exponential moving average.