Example Utils#
Submodules#
syllabus.examples.utils.vecenv module#
- class syllabus.examples.utils.vecenv.RunningMeanStd(epsilon=0.0001, shape=())[source]#
Bases:
object
- class syllabus.examples.utils.vecenv.VecEnv(num_envs, observation_space, action_space)[source]#
Bases:
object
An abstract asynchronous, vectorized environment. Used to batch data from multiple copies of an environment, so that each observation becomes an batch of observations, and expected action is a batch of actions to be applied per-environment.
- close_extras()[source]#
Clean up the extra resources, beyond what’s in this base class. Only runs when not self.closed.
- closed = False#
- metadata = {'render.modes': ['human', 'rgb_array']}#
- reset()[source]#
Reset all the environments and return an array of observations, or a dict of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- step(actions)[source]#
Step the environments synchronously.
This is available for backwards compatibility.
- step_async(actions)[source]#
Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.
You should not call this if a step_async run is already pending.
- step_wait()[source]#
Wait for the step taken with step_async().
- Returns (obs, rews, dones, infos):
- obs: an array of observations, or a dict of
arrays of observations.
rews: an array of rewards
dones: an array of “episode done” booleans
infos: a sequence of info objects
- property unwrapped#
- viewer = None#
- class syllabus.examples.utils.vecenv.VecEnvObservationWrapper(venv, observation_space=None, action_space=None)[source]#
Bases:
VecEnvWrapper
- class syllabus.examples.utils.vecenv.VecEnvWrapper(venv, observation_space=None, action_space=None)[source]#
Bases:
VecEnv
An environment wrapper that applies to an entire batch of environments at once.
- reset()[source]#
Reset all the environments and return an array of observations, or a dict of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- class syllabus.examples.utils.vecenv.VecExtractDictObs(venv, key)[source]#
Bases:
VecEnvObservationWrapper
- class syllabus.examples.utils.vecenv.VecMonitor(venv, filename=None, keep_buf=0, info_keywords=())[source]#
Bases:
VecEnvWrapper
- class syllabus.examples.utils.vecenv.VecNormalize(venv, ob=True, ret=True, clipob=10.0, cliprew=10.0, gamma=0.99, epsilon=1e-08, use_tf=False)[source]#
Bases:
VecEnvWrapper
A vectorized wrapper that normalizes the observations and returns from an environment.
syllabus.examples.utils.vtrace module#
Functions to compute V-trace off-policy actor critic targets.
For details and theory see:
“IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures” by Espeholt, Soyer, Munos et al.
See https://arxiv.org/abs/1802.01561 for the full paper.
- class syllabus.examples.utils.vtrace.VTraceFromLogitsReturns(vs, pg_advantages, log_rhos, behavior_action_log_probs, target_action_log_probs)#
Bases:
tuple
- behavior_action_log_probs#
Alias for field number 3
- log_rhos#
Alias for field number 2
- pg_advantages#
Alias for field number 1
- target_action_log_probs#
Alias for field number 4
- vs#
Alias for field number 0
- class syllabus.examples.utils.vtrace.VTraceReturns(vs, pg_advantages)#
Bases:
tuple
- pg_advantages#
Alias for field number 1
- vs#
Alias for field number 0