Task Wrappers

CartPole Task Wrapper

class syllabus.examples.task_wrappers.cartpole_task_wrapper.CartPoleTaskWrapper(env, discretize=True)[source]

Bases: TaskWrapper

reset(**kwargs)[source]

Uses the reset() of the env that can be overwritten to change the returned data.

Minigrid Task Wrapper

Task wrapper that can select a new MiniGrid task on reset.

class syllabus.examples.task_wrappers.minigrid_task_wrapper.MinigridTaskWrapper(env: Env)[source]

Bases: TaskWrapper

This wrapper allows you to change the task of an NLE environment.

change_task(new_task: int)[source]

Change task by directly editing environment class.

Ignores requests for unknown tasks or task changes outside of a reset.

observation(obs)[source]

Adds the goal encoding to the observation. Override to add additional task-specific observations. Returns a modified observation. TODO: Complete this implementation and find way to support centralized encodings

reset(new_task=None, **kwargs)[source]

Resets the environment along with all available tasks, and change the current task.

This ensures that all instance variables are reset, not just the ones for the current task. We do this efficiently by keeping track of which reset functions have already been called, since very few tasks override reset. If new_task is provided, we change the task before calling the final reset.

step(action)[source]

Step through environment and update task completion.

NetHack Task Wrapper

Task wrapper for NLE that can change tasks at reset using the NLE’s task definition format.

class syllabus.examples.task_wrappers.nethack_wrappers.NetHackCollect(*args, **kwargs)[source]

Bases: NetHackGold

Environment for “staircase” task.

This task requires the agent to get on top of a staircase down (>). The reward function is \(I + ext{TP}\), where \(I\) is 1 if the task is successful, and 0 otherwise, and :math:` ext{TP}` is the time step function as defined by NetHackScore.

reset(wizkit_items=None)[source]

Resets the environment.

Note

We attempt to manually navigate the first few menus so that the first seen state is ready to be acted upon by the user. This might fail in case Nethack is initialized with some uncommon options.

Returns:

(tuple) (Observation of the state as

defined by self.observation_space, Extra game state information)

class syllabus.examples.task_wrappers.nethack_wrappers.NetHackDescend(*args, penalty_mode='constant', penalty_step: float = -0.01, penalty_time: float = -0.0, **kwargs)[source]

Bases: NetHackScore

Environment for “staircase” task.

This task requires the agent to get on top of a staircase down (>). The reward function is \(I + ext{TP}\), where \(I\) is 1 if the task is successful, and 0 otherwise, and :math:` ext{TP}` is the time step function as defined by NetHackScore.

reset(wizkit_items=None)[source]

Resets the environment.

Note

We attempt to manually navigate the first few menus so that the first seen state is ready to be acted upon by the user. This might fail in case Nethack is initialized with some uncommon options.

Returns:

(tuple) (Observation of the state as

defined by self.observation_space, Extra game state information)

class syllabus.examples.task_wrappers.nethack_wrappers.NetHackExtendedActionEnv(*args, no_progress_timeout: int = 10000, **kwargs)[source]

Bases: NLE

get_scout_score(last_observation)[source]
reset(*args, **kwargs)[source]

Resets the environment.

Note

We attempt to manually navigate the first few menus so that the first seen state is ready to be acted upon by the user. This might fail in case Nethack is initialized with some uncommon options.

Returns:

(tuple) (Observation of the state as

defined by self.observation_space, Extra game state information)

step(action: int)[source]

Steps the environment.

Parameters:

action (int) – action integer as defined by self.action_space.

Returns:

a tuple containing
  • (dict): an observation of the state; this will contain the keys specified by self.observation_space.

  • (float): a reward; see self._reward_fn to see how it is specified.

  • (bool): True if the state is terminal, False otherwise.

  • (dict): a dictionary of extra information (such as end_status, i.e. a status info – death, task win, etc. – for the terminal state).

Return type:

(dict, float, bool, dict)

class syllabus.examples.task_wrappers.nethack_wrappers.NetHackSatiate(*args, penalty_mode='constant', penalty_step: float = -0.01, penalty_time: float = -0.0, **kwargs)[source]

Bases: NetHackScore

Environment for the “eat” task.

The task is similar to the one defined by NetHackScore, but the reward uses positive changes in the character’s hunger level (e.g. by consuming comestibles or monster corpses), rather than the score.

class syllabus.examples.task_wrappers.nethack_wrappers.NetHackScore(*args, penalty_mode='constant', penalty_step: float = -0.01, penalty_time: float = -0.0, **kwargs)[source]

Bases: NLE

Environment for “score” task.

The task is an augmentation of the standard NLE task. The return function is defined as: :math:` ext{score}_t - ext{score}_{t-1} + ext{TP}`, where the :math:` ext{TP}` is a time penalty that grows with the amount of environment steps that do not change the state (such as navigating menus).

Parameters:
  • penalty_mode (str) – name of the mode for calculating the time step penalty. Can be constant, exp, square, linear, or always. Defaults to constant.

  • penalty_step (float) – constant applied to amount of frozen steps. Defaults to -0.01.

  • penalty_time (float) – constant applied to amount of frozen steps. Defaults to -0.0.

get_scout_score(last_observation)[source]
step(action: int)[source]

Steps the environment.

Parameters:

action (int) – action integer as defined by self.action_space.

Returns:

a tuple containing
  • (dict): an observation of the state; this will contain the keys specified by self.observation_space.

  • (float): a reward; see self._reward_fn to see how it is specified.

  • (bool): True if the state is terminal, False otherwise.

  • (dict): a dictionary of extra information (such as end_status, i.e. a status info – death, task win, etc. – for the terminal state).

Return type:

(dict, float, bool, dict)

class syllabus.examples.task_wrappers.nethack_wrappers.NetHackScoreExtendedActions(*args, **kwargs)[source]

Bases: NetHackExtendedActionEnv, NetHackScore

class syllabus.examples.task_wrappers.nethack_wrappers.NetHackScoutClipped(*args, penalty_mode='constant', penalty_step: float = -0.01, penalty_time: float = -0.0, **kwargs)[source]

Bases: NetHackScore

Environment for the “scout” task.

The task is similar to the one defined by NetHackScore, but the score is defined by the changes in glyphs discovered by the agent.

reset(*args, **kwargs)[source]

Resets the environment.

Note

We attempt to manually navigate the first few menus so that the first seen state is ready to be acted upon by the user. This might fail in case Nethack is initialized with some uncommon options.

Returns:

(tuple) (Observation of the state as

defined by self.observation_space, Extra game state information)

class syllabus.examples.task_wrappers.nethack_wrappers.NetHackSeed(*args, character='@', allow_all_yn_questions=True, allow_all_modes=True, penalty_mode='constant', penalty_step: float = -0.0, penalty_time: float = -0.0, max_episode_steps: int = 1000000.0, observation_keys=('glyphs', 'chars', 'colors', 'specials', 'blstats', 'message', 'inv_glyphs', 'inv_strs', 'inv_letters', 'inv_oclasses', 'tty_chars', 'tty_colors', 'tty_cursor', 'misc'), no_progress_timeout: int = 10000, **kwargs)[source]

Bases: NetHackScore

Environment for the NetHack Challenge.

The task is an augmentation of the standard NLE task. This is the NLE Score Task but with some subtle differences: * the action space is fixed to include the full keyboard * menus and “<More>” tokens are not skipped * starting character is randomly assigned

reset(*args, **kwargs)[source]

Resets the environment.

Note

We attempt to manually navigate the first few menus so that the first seen state is ready to be acted upon by the user. This might fail in case Nethack is initialized with some uncommon options.

Returns:

(tuple) (Observation of the state as

defined by self.observation_space, Extra game state information)

class syllabus.examples.task_wrappers.nethack_wrappers.NethackDummyWrapper(env: Env, num_seeds: int = 200)[source]

Bases: TaskWrapper

class syllabus.examples.task_wrappers.nethack_wrappers.NethackSeedWrapper(env: Env, seed: int = 0, num_seeds: int = 200)[source]

Bases: TaskWrapper

This wrapper allows you to change the task of an NLE environment.

This wrapper was designed to meet two goals.
  1. Allow us to change the task of the NLE environment at the start of an episode

  2. Allow us to use the predefined NLE task definitions without copying/modifying their code. This makes it easier to integrate with other work on nethack tasks or curricula.

Each task is defined as a subclass of the NLE, so you need to cast and reinitialize the environment to change its task. This wrapper manipulates the __class__ property to achieve this, but does so in a safe way. Specifically, we ensure that the instance variables needed for each task are available and reset at the start of the episode regardless of which task is active.

change_task(new_task: int)[source]

Change task by setting the seed.

observation(observation)[source]

Returns a modified observation.

reset(new_task=None, **kwargs)[source]

Resets the environment along with all available tasks, and change the current task.

This ensures that all instance variables are reset, not just the ones for the current task. We do this efficiently by keeping track of which reset functions have already been called, since very few tasks override reset. If new_task is provided, we change the task before calling the final reset.

seed(seed)[source]
step(action)[source]

Step through environment and update task completion.

class syllabus.examples.task_wrappers.nethack_wrappers.NethackTaskWrapper(env: Env, additional_tasks: List[NLE] = None, use_default_tasks: bool = True, env_kwargs: Dict[str, Any] = {}, wrappers: List[Tuple[Wrapper, List[Any], Dict[str, Any]]] = None, seed: int = None)[source]

Bases: TaskWrapper

This wrapper allows you to change the task of an NLE environment.

This wrapper was designed to meet two goals.
  1. Allow us to change the task of the NLE environment at the start of an episode

  2. Allow us to use the predefined NLE task definitions without copying/modifying their code. This makes it easier to integrate with other work on nethack tasks or curricula.

Each task is defined as a subclass of the NLE, so you need to cast and reinitialize the environment to change its task. This wrapper manipulates the __class__ property to achieve this, but does so in a safe way. Specifically, we ensure that the instance variables needed for each task are available and reset at the start of the episode regardless of which task is active.

change_task(new_task: int)[source]

Change task by directly editing environment class.

Ignores requests for unknown tasks or task changes outside of a reset.

observation(observation)[source]

Parses current inventory and new items gained this timestep from the observation. Returns a modified observation.

reset(new_task=None, **kwargs)[source]

Resets the environment along with all available tasks, and change the current task.

This ensures that all instance variables are reset, not just the ones for the current task. We do this efficiently by keeping track of which reset functions have already been called, since very few tasks override reset. If new_task is provided, we change the task before calling the final reset.

seed(seed)[source]
step(action)[source]

Step through environment and update task completion.

Pistonball Task Wrapper

Task wrapper for NLE that can change tasks at reset using the NLE’s task definition format.

class syllabus.examples.task_wrappers.pistonball_task_wrapper.PistonballTaskWrapper(env: ParallelEnv)[source]

Bases: PettingZooTaskWrapper

This wrapper simply changes the seed of a Minigrid environment.

reset(new_task: int = None, **kwargs)[source]

Resets the environment.

And returns a dictionary of observations (keyed by the agent name)

Procgen Task Wrapper

class syllabus.examples.task_wrappers.procgen_task_wrapper.ProcgenTaskWrapper(env: Env, env_id, seed=0)[source]

Bases: TaskWrapper

This wrapper allows you to change the task of an NLE environment.

change_task(new_task: int)[source]

Change task by directly editing environment class.

Ignores requests for unknown tasks or task changes outside of a reset.

observation(observation)[source]

Adds the goal encoding to the observation. Override to add additional task-specific observations. Returns a modified observation. TODO: Complete this implementation and find way to support centralized encodings

reset(new_task=None, **kwargs)[source]

Resets the environment along with all available tasks, and change the current task.

This ensures that all instance variables are reset, not just the ones for the current task. We do this efficiently by keeping track of which reset functions have already been called, since very few tasks override reset. If new_task is provided, we change the task before calling the final reset.

seed(seed)[source]
step(action)[source]

Step through environment and update task completion.