Example Utils#
Submodules#
syllabus.examples.utils.vecenv module#
- class syllabus.examples.utils.vecenv.RunningMeanStd(epsilon=0.0001, shape=())#
Bases:
object
- update(x)#
- update_from_moments(batch_mean, batch_var, batch_count)#
- class syllabus.examples.utils.vecenv.VecEnv(num_envs, observation_space, action_space)#
Bases:
object
An abstract asynchronous, vectorized environment. Used to batch data from multiple copies of an environment, so that each observation becomes an batch of observations, and expected action is a batch of actions to be applied per-environment.
- close()#
- close_extras()#
Clean up the extra resources, beyond what’s in this base class. Only runs when not self.closed.
- closed = False#
- get_images()#
Return RGB images from each environment
- get_viewer()#
- metadata = {'render.modes': ['human', 'rgb_array']}#
- render(mode='human')#
- reset()#
Reset all the environments and return an array of observations, or a dict of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- step(actions)#
Step the environments synchronously.
This is available for backwards compatibility.
- step_async(actions)#
Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.
You should not call this if a step_async run is already pending.
- step_env(actions, reset_random=False)#
- step_wait()#
Wait for the step taken with step_async().
- Returns (obs, rews, dones, infos):
- obs: an array of observations, or a dict of
arrays of observations.
rews: an array of rewards
dones: an array of “episode done” booleans
infos: a sequence of info objects
- property unwrapped#
- viewer = None#
- class syllabus.examples.utils.vecenv.VecEnvObservationWrapper(venv, observation_space=None, action_space=None)#
Bases:
VecEnvWrapper
- process(obs)#
- reset()#
Reset all the environments and return an array of observations, or a dict of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- step_wait()#
Wait for the step taken with step_async().
- Returns (obs, rews, dones, infos):
- obs: an array of observations, or a dict of
arrays of observations.
rews: an array of rewards
dones: an array of “episode done” booleans
infos: a sequence of info objects
- class syllabus.examples.utils.vecenv.VecEnvWrapper(venv, observation_space=None, action_space=None)#
Bases:
VecEnv
An environment wrapper that applies to an entire batch of environments at once.
- close()#
- get_images()#
Return RGB images from each environment
- render(mode='human')#
- reset()#
Reset all the environments and return an array of observations, or a dict of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- step_async(actions)#
Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.
You should not call this if a step_async run is already pending.
- step_wait()#
Wait for the step taken with step_async().
- Returns (obs, rews, dones, infos):
- obs: an array of observations, or a dict of
arrays of observations.
rews: an array of rewards
dones: an array of “episode done” booleans
infos: a sequence of info objects
- class syllabus.examples.utils.vecenv.VecExtractDictObs(venv, key)#
Bases:
VecEnvObservationWrapper
- process(obs)#
- class syllabus.examples.utils.vecenv.VecMonitor(venv, filename=None, keep_buf=0, info_keywords=())#
Bases:
VecEnvWrapper
- reset()#
Reset all the environments and return an array of observations, or a dict of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- step_wait()#
Wait for the step taken with step_async().
- Returns (obs, rews, dones, infos):
- obs: an array of observations, or a dict of
arrays of observations.
rews: an array of rewards
dones: an array of “episode done” booleans
infos: a sequence of info objects
- class syllabus.examples.utils.vecenv.VecNormalize(venv, ob=True, ret=True, clipob=10.0, cliprew=10.0, gamma=0.99, epsilon=1e-08, use_tf=False)#
Bases:
VecEnvWrapper
A vectorized wrapper that normalizes the observations and returns from an environment.
- reset()#
Reset all the environments and return an array of observations, or a dict of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- step_wait()#
Wait for the step taken with step_async().
- Returns (obs, rews, dones, infos):
- obs: an array of observations, or a dict of
arrays of observations.
rews: an array of rewards
dones: an array of “episode done” booleans
infos: a sequence of info objects
- syllabus.examples.utils.vecenv.update_mean_var_count_from_moments(mean, var, count, batch_mean, batch_var, batch_count)#
syllabus.examples.utils.vtrace module#
Functions to compute V-trace off-policy actor critic targets.
For details and theory see:
“IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures” by Espeholt, Soyer, Munos et al.
See https://arxiv.org/abs/1802.01561 for the full paper.
- class syllabus.examples.utils.vtrace.VTraceFromLogitsReturns(vs, pg_advantages, log_rhos, behavior_action_log_probs, target_action_log_probs)#
Bases:
tuple
- behavior_action_log_probs#
Alias for field number 3
- log_rhos#
Alias for field number 2
- pg_advantages#
Alias for field number 1
- target_action_log_probs#
Alias for field number 4
- vs#
Alias for field number 0
- class syllabus.examples.utils.vtrace.VTraceReturns(vs, pg_advantages)#
Bases:
tuple
- pg_advantages#
Alias for field number 1
- vs#
Alias for field number 0
- syllabus.examples.utils.vtrace.action_log_probs(policy_logits, actions)#
- syllabus.examples.utils.vtrace.from_importance_weights(log_rhos, discounts, rewards, values, bootstrap_value, clip_rho_threshold=1.0, clip_pg_rho_threshold=1.0)#
V-trace from log importance weights.
- syllabus.examples.utils.vtrace.from_logits(behavior_policy_logits, target_policy_logits, actions, discounts, rewards, values, bootstrap_value, clip_rho_threshold=1.0, clip_pg_rho_threshold=1.0)#
V-trace for softmax policies.