tfmdp.policy package¶
Subpackages¶
Submodules¶
tfmdp.policy.drp module¶
-
class
tfmdp.policy.drp.DeepReactivePolicy(compiler: rddl2tf.compiler.Compiler, config: Dict)¶ Bases:
objectDeepReactivePolicy abstract base class.
It defines the basic API for building, saving and restoring reactive policies implemented as deep neural nets.
A reactive policy defines a mapping from current state fluents to action fluents.
Parameters: - compiler (
rddl2tf.compiler.Compiler) – RDDL2TensorFlow compiler. - config (Dict) – The reactive policy configuration parameters.
-
__call__(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor]¶ Returns action fluents for the current state and timestep.
Parameters: - state (Sequence[tf.Tensor]) – A tuple of state fluents.
- timestep (tf.Tensor) – The current timestep.
Returns: A tuple of action fluents.
Return type: Sequence[tf.Tensor]
-
build() → None¶ Create the DRP layers and trainable weights.
-
classmethod
from_json(compiler: rddl2tf.compiler.Compiler, json_config: str) → tfmdp.policy.drp.DeepReactivePolicy¶ Instantiates a DRP from a json_config string.
Parameters: - compiler (
rddl2tf.compiler.Compiler) – RDDL2TensorFlow compiler. - json_config (str) – A DRP configuration encoded in JSON format.
Returns: A DRP object.
Return type: - compiler (
-
graph¶
-
name¶ Returns the canonical DRP name.
-
restore(sess: tensorflow.python.client.session.Session, path: Optional[str] = None) → None¶ Restores previously saved DRP trainable variables.
If path is not provided, restores from last saved checkpoint.
Parameters: - sess (
tf.Session) – A running session. - path (Optional[str]) – An optional path to a checkpoint directory.
- sess (
-
save(sess: tensorflow.python.client.session.Session, path: str) → str¶ Serializes all DRP trainable variables into a checkpoint file.
Parameters: - sess (
tf.Session) – A running session. - path (str) – The path to a checkpoint directory.
Returns: The path prefix of the newly created checkpoint file.
Return type: str
- sess (
-
size¶ Returns the number of trainable parameters.
-
summary() → None¶ Prints a string summary of the DRP.
-
to_json() → str¶ Returns the policy configuration parameters serialized in JSON format.
-
vars¶ Returns a list of the trainable variables.
- compiler (
tfmdp.policy.feedforward module¶
-
class
tfmdp.policy.feedforward.FeedforwardPolicy(compiler: rddl2tf.compiler.Compiler, config: dict)¶ Bases:
tfmdp.policy.drp.DeepReactivePolicyFeedforwardPolicy implements a DRP as a multi-layer perceptron.
- It is parameterized by the following configuration params:
- config[‘layers’]: a list of number of units; and
- config[‘activation’]: an activation function.
Parameters: - compiler (
rddl2tf.compiler.Compiler) – RDDL2TensorFlow compiler. - config (Dict) – The policy configuration parameters.
-
__call__(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor]¶ Returns action fluents for the current state and timestep.
Parameters: - state (Sequence[tf.Tensor]) – A tuple of state fluents.
- timestep (tf.Tensor) – The current timestep.
Returns: A tuple of action fluents.
Return type: Sequence[tf.Tensor]
Builds all hidden layers as tf.layers.Dense layers.
-
_build_input_layer() → None¶ Builds the DRP input layer using a tfmdp.policy.layers.state_layer.StateLayer.
-
_build_output_layer() → None¶ Builds the DRP output layer using a tfmdp.policy.layers.action_layer.ActionLayer.
-
build() → None¶ Create the DRP layers and trainable weights.
-
name¶ Returns the canonical DRP name.
-
size¶ Returns the number of trainable parameters.
-
vars¶ Returns a list of the trainable variables.