Example Utils#

Submodules#

syllabus.examples.utils.vecenv module#

class syllabus.examples.utils.vecenv.RunningMeanStd(epsilon=0.0001, shape=())[source]#

Bases: object

update(x)[source]#

update_from_moments(batch_mean, batch_var, batch_count)[source]#

class syllabus.examples.utils.vecenv.VecEnv(num_envs, observation_space, action_space)[source]#

Bases: object

An abstract asynchronous, vectorized environment. Used to batch data from multiple copies of an environment, so that each observation becomes an batch of observations, and expected action is a batch of actions to be applied per-environment.

close()[source]#

close_extras()[source]#: Clean up the extra resources, beyond what’s in this base class. Only runs when not self.closed.

closed = False#

get_images()[source]#: Return RGB images from each environment

get_viewer()[source]#

metadata = {'render.modes': ['human', 'rgb_array']}#

render(mode='human')[source]#

reset()[source]#

Reset all the environments and return an array of observations, or a dict of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

step(actions)[source]#

Step the environments synchronously.

This is available for backwards compatibility.

step_async(actions)[source]#

Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

step_env(actions, reset_random=False)[source]#

step_wait()[source]#

Wait for the step taken with step_async().

Returns (obs, rews, dones, infos):

obs: an array of observations, or a dict of
arrays of observations.
rews: an array of rewards
dones: an array of “episode done” booleans
infos: a sequence of info objects

property unwrapped#

viewer = None#

class syllabus.examples.utils.vecenv.VecEnvObservationWrapper(venv, observation_space=None, action_space=None)[source]#

Bases: VecEnvWrapper

process(obs)[source]#

reset()[source]#

Reset all the environments and return an array of observations, or a dict of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

step_wait()[source]#

Wait for the step taken with step_async().

Returns (obs, rews, dones, infos):

obs: an array of observations, or a dict of
arrays of observations.
rews: an array of rewards
dones: an array of “episode done” booleans
infos: a sequence of info objects

class syllabus.examples.utils.vecenv.VecEnvWrapper(venv, observation_space=None, action_space=None)[source]#

Bases: VecEnv

An environment wrapper that applies to an entire batch of environments at once.

close()[source]#

get_images()[source]#: Return RGB images from each environment

render(mode='human')[source]#

reset()[source]#

Reset all the environments and return an array of observations, or a dict of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

step_async(actions)[source]#

Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

step_wait()[source]#

Wait for the step taken with step_async().

Returns (obs, rews, dones, infos):

obs: an array of observations, or a dict of
arrays of observations.
rews: an array of rewards
dones: an array of “episode done” booleans
infos: a sequence of info objects

class syllabus.examples.utils.vecenv.VecExtractDictObs(venv, key)[source]#

Bases: VecEnvObservationWrapper

process(obs)[source]#

class syllabus.examples.utils.vecenv.VecMonitor(venv, filename=None, keep_buf=0, info_keywords=())[source]#

Bases: VecEnvWrapper

reset(seed=None, options=None)[source]#

Reset all the environments and return an array of observations, or a dict of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

step_wait()[source]#

Wait for the step taken with step_async().

Returns (obs, rews, dones, infos):

obs: an array of observations, or a dict of
arrays of observations.
rews: an array of rewards
dones: an array of “episode done” booleans
infos: a sequence of info objects

class syllabus.examples.utils.vecenv.VecNormalize(venv, ob=True, ret=True, clipob=10.0, cliprew=10.0, gamma=0.99, epsilon=1e-08, use_tf=False)[source]#

Bases: VecEnvWrapper

A vectorized wrapper that normalizes the observations and returns from an environment.

reset(seed=None, options=None)[source]#

Reset all the environments and return an array of observations, or a dict of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

step_wait()[source]#

Wait for the step taken with step_async().

Returns (obs, rews, dones, infos):

obs: an array of observations, or a dict of
arrays of observations.
rews: an array of rewards
dones: an array of “episode done” booleans
infos: a sequence of info objects

syllabus.examples.utils.vecenv.update_mean_var_count_from_moments(mean, var, count, batch_mean, batch_var, batch_count)[source]#

syllabus.examples.utils.vtrace module#

Functions to compute V-trace off-policy actor critic targets.

For details and theory see:

“IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures” by Espeholt, Soyer, Munos et al.

See https://arxiv.org/abs/1802.01561 for the full paper.

class syllabus.examples.utils.vtrace.VTraceFromLogitsReturns(vs, pg_advantages, log_rhos, behavior_action_log_probs, target_action_log_probs)#

Bases: tuple

behavior_action_log_probs#: Alias for field number 3

log_rhos#: Alias for field number 2

pg_advantages#: Alias for field number 1

target_action_log_probs#: Alias for field number 4

vs#: Alias for field number 0

class syllabus.examples.utils.vtrace.VTraceReturns(vs, pg_advantages)#

Bases: tuple

pg_advantages#: Alias for field number 1

vs#: Alias for field number 0

syllabus.examples.utils.vtrace.action_log_probs(policy_logits, actions)[source]#

syllabus.examples.utils.vtrace.from_importance_weights(log_rhos, discounts, rewards, values, bootstrap_value, clip_rho_threshold=1.0, clip_pg_rho_threshold=1.0)[source]#: V-trace from log importance weights.

syllabus.examples.utils.vtrace.from_logits(behavior_policy_logits, target_policy_logits, actions, discounts, rewards, values, bootstrap_value, clip_rho_threshold=1.0, clip_pg_rho_threshold=1.0)[source]#: V-trace for softmax policies.

Example Utils#

Submodules#

syllabus.examples.utils.vecenv module#

syllabus.examples.utils.vtrace module#

Module contents#