Example Utils#

Submodules#

syllabus.examples.utils.vecenv module#

class syllabus.examples.utils.vecenv.RunningMeanStd(epsilon=0.0001, shape=())#

Bases: object

update(x)#
update_from_moments(batch_mean, batch_var, batch_count)#
class syllabus.examples.utils.vecenv.VecEnv(num_envs, observation_space, action_space)#

Bases: object

An abstract asynchronous, vectorized environment. Used to batch data from multiple copies of an environment, so that each observation becomes an batch of observations, and expected action is a batch of actions to be applied per-environment.

close()#
close_extras()#

Clean up the extra resources, beyond what’s in this base class. Only runs when not self.closed.

closed = False#
get_images()#

Return RGB images from each environment

get_viewer()#
metadata = {'render.modes': ['human', 'rgb_array']}#
render(mode='human')#
reset()#

Reset all the environments and return an array of observations, or a dict of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

step(actions)#

Step the environments synchronously.

This is available for backwards compatibility.

step_async(actions)#

Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

step_env(actions, reset_random=False)#
step_wait()#

Wait for the step taken with step_async().

Returns (obs, rews, dones, infos):
  • obs: an array of observations, or a dict of

    arrays of observations.

  • rews: an array of rewards

  • dones: an array of “episode done” booleans

  • infos: a sequence of info objects

property unwrapped#
viewer = None#
class syllabus.examples.utils.vecenv.VecEnvObservationWrapper(venv, observation_space=None, action_space=None)#

Bases: VecEnvWrapper

process(obs)#
reset()#

Reset all the environments and return an array of observations, or a dict of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

step_wait()#

Wait for the step taken with step_async().

Returns (obs, rews, dones, infos):
  • obs: an array of observations, or a dict of

    arrays of observations.

  • rews: an array of rewards

  • dones: an array of “episode done” booleans

  • infos: a sequence of info objects

class syllabus.examples.utils.vecenv.VecEnvWrapper(venv, observation_space=None, action_space=None)#

Bases: VecEnv

An environment wrapper that applies to an entire batch of environments at once.

close()#
get_images()#

Return RGB images from each environment

render(mode='human')#
reset()#

Reset all the environments and return an array of observations, or a dict of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

step_async(actions)#

Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

step_wait()#

Wait for the step taken with step_async().

Returns (obs, rews, dones, infos):
  • obs: an array of observations, or a dict of

    arrays of observations.

  • rews: an array of rewards

  • dones: an array of “episode done” booleans

  • infos: a sequence of info objects

class syllabus.examples.utils.vecenv.VecExtractDictObs(venv, key)#

Bases: VecEnvObservationWrapper

process(obs)#
class syllabus.examples.utils.vecenv.VecMonitor(venv, filename=None, keep_buf=0, info_keywords=())#

Bases: VecEnvWrapper

reset()#

Reset all the environments and return an array of observations, or a dict of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

step_wait()#

Wait for the step taken with step_async().

Returns (obs, rews, dones, infos):
  • obs: an array of observations, or a dict of

    arrays of observations.

  • rews: an array of rewards

  • dones: an array of “episode done” booleans

  • infos: a sequence of info objects

class syllabus.examples.utils.vecenv.VecNormalize(venv, ob=True, ret=True, clipob=10.0, cliprew=10.0, gamma=0.99, epsilon=1e-08, use_tf=False)#

Bases: VecEnvWrapper

A vectorized wrapper that normalizes the observations and returns from an environment.

reset()#

Reset all the environments and return an array of observations, or a dict of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

step_wait()#

Wait for the step taken with step_async().

Returns (obs, rews, dones, infos):
  • obs: an array of observations, or a dict of

    arrays of observations.

  • rews: an array of rewards

  • dones: an array of “episode done” booleans

  • infos: a sequence of info objects

syllabus.examples.utils.vecenv.update_mean_var_count_from_moments(mean, var, count, batch_mean, batch_var, batch_count)#

syllabus.examples.utils.vtrace module#

Functions to compute V-trace off-policy actor critic targets.

For details and theory see:

“IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures” by Espeholt, Soyer, Munos et al.

See https://arxiv.org/abs/1802.01561 for the full paper.

class syllabus.examples.utils.vtrace.VTraceFromLogitsReturns(vs, pg_advantages, log_rhos, behavior_action_log_probs, target_action_log_probs)#

Bases: tuple

behavior_action_log_probs#

Alias for field number 3

log_rhos#

Alias for field number 2

pg_advantages#

Alias for field number 1

target_action_log_probs#

Alias for field number 4

vs#

Alias for field number 0

class syllabus.examples.utils.vtrace.VTraceReturns(vs, pg_advantages)#

Bases: tuple

pg_advantages#

Alias for field number 1

vs#

Alias for field number 0

syllabus.examples.utils.vtrace.action_log_probs(policy_logits, actions)#
syllabus.examples.utils.vtrace.from_importance_weights(log_rhos, discounts, rewards, values, bootstrap_value, clip_rho_threshold=1.0, clip_pg_rho_threshold=1.0)#

V-trace from log importance weights.

syllabus.examples.utils.vtrace.from_logits(behavior_policy_logits, target_policy_logits, actions, discounts, rewards, values, bootstrap_value, clip_rho_threshold=1.0, clip_pg_rho_threshold=1.0)#

V-trace for softmax policies.

Module contents#