Example Utils#


syllabus.examples.utils.vecenv module#

class syllabus.examples.utils.vecenv.RunningMeanStd(epsilon=0.0001, shape=())[source]#

Bases: object

update_from_moments(batch_mean, batch_var, batch_count)[source]#
class syllabus.examples.utils.vecenv.VecEnv(num_envs, observation_space, action_space)[source]#

Bases: object

An abstract asynchronous, vectorized environment. Used to batch data from multiple copies of an environment, so that each observation becomes an batch of observations, and expected action is a batch of actions to be applied per-environment.


Clean up the extra resources, beyond what’s in this base class. Only runs when not self.closed.

closed = False#

Return RGB images from each environment

metadata = {'render.modes': ['human', 'rgb_array']}#

Reset all the environments and return an array of observations, or a dict of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.


Step the environments synchronously.

This is available for backwards compatibility.


Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

step_env(actions, reset_random=False)[source]#

Wait for the step taken with step_async().

Returns (obs, rews, dones, infos):
  • obs: an array of observations, or a dict of

    arrays of observations.

  • rews: an array of rewards

  • dones: an array of “episode done” booleans

  • infos: a sequence of info objects

property unwrapped#
viewer = None#
class syllabus.examples.utils.vecenv.VecEnvObservationWrapper(venv, observation_space=None, action_space=None)[source]#

Bases: VecEnvWrapper


Reset all the environments and return an array of observations, or a dict of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.


Wait for the step taken with step_async().

Returns (obs, rews, dones, infos):
  • obs: an array of observations, or a dict of

    arrays of observations.

  • rews: an array of rewards

  • dones: an array of “episode done” booleans

  • infos: a sequence of info objects

class syllabus.examples.utils.vecenv.VecEnvWrapper(venv, observation_space=None, action_space=None)[source]#

Bases: VecEnv

An environment wrapper that applies to an entire batch of environments at once.


Return RGB images from each environment


Reset all the environments and return an array of observations, or a dict of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.


Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.


Wait for the step taken with step_async().

Returns (obs, rews, dones, infos):
  • obs: an array of observations, or a dict of

    arrays of observations.

  • rews: an array of rewards

  • dones: an array of “episode done” booleans

  • infos: a sequence of info objects

class syllabus.examples.utils.vecenv.VecExtractDictObs(venv, key)[source]#

Bases: VecEnvObservationWrapper

class syllabus.examples.utils.vecenv.VecMonitor(venv, filename=None, keep_buf=0, info_keywords=())[source]#

Bases: VecEnvWrapper

reset(seed=None, options=None)[source]#

Reset all the environments and return an array of observations, or a dict of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.


Wait for the step taken with step_async().

Returns (obs, rews, dones, infos):
  • obs: an array of observations, or a dict of

    arrays of observations.

  • rews: an array of rewards

  • dones: an array of “episode done” booleans

  • infos: a sequence of info objects

class syllabus.examples.utils.vecenv.VecNormalize(venv, ob=True, ret=True, clipob=10.0, cliprew=10.0, gamma=0.99, epsilon=1e-08, use_tf=False)[source]#

Bases: VecEnvWrapper

A vectorized wrapper that normalizes the observations and returns from an environment.

reset(seed=None, options=None)[source]#

Reset all the environments and return an array of observations, or a dict of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.


Wait for the step taken with step_async().

Returns (obs, rews, dones, infos):
  • obs: an array of observations, or a dict of

    arrays of observations.

  • rews: an array of rewards

  • dones: an array of “episode done” booleans

  • infos: a sequence of info objects

syllabus.examples.utils.vecenv.update_mean_var_count_from_moments(mean, var, count, batch_mean, batch_var, batch_count)[source]#

syllabus.examples.utils.vtrace module#

Functions to compute V-trace off-policy actor critic targets.

For details and theory see:

“IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures” by Espeholt, Soyer, Munos et al.

See https://arxiv.org/abs/1802.01561 for the full paper.

class syllabus.examples.utils.vtrace.VTraceFromLogitsReturns(vs, pg_advantages, log_rhos, behavior_action_log_probs, target_action_log_probs)#

Bases: tuple


Alias for field number 3


Alias for field number 2


Alias for field number 1


Alias for field number 4


Alias for field number 0

class syllabus.examples.utils.vtrace.VTraceReturns(vs, pg_advantages)#

Bases: tuple


Alias for field number 1


Alias for field number 0

syllabus.examples.utils.vtrace.action_log_probs(policy_logits, actions)[source]#
syllabus.examples.utils.vtrace.from_importance_weights(log_rhos, discounts, rewards, values, bootstrap_value, clip_rho_threshold=1.0, clip_pg_rho_threshold=1.0)[source]#

V-trace from log importance weights.

syllabus.examples.utils.vtrace.from_logits(behavior_policy_logits, target_policy_logits, actions, discounts, rewards, values, bootstrap_value, clip_rho_threshold=1.0, clip_pg_rho_threshold=1.0)[source]#

V-trace for softmax policies.

Module contents#