tfmdp.policy package¶
Subpackages¶
Submodules¶
tfmdp.policy.drp module¶
-
class
tfmdp.policy.drp.
DeepReactivePolicy
(compiler: rddl2tf.compiler.Compiler, config: Dict)¶ Bases:
object
DeepReactivePolicy abstract base class.
It defines the basic API for building, saving and restoring reactive policies implemented as deep neural nets.
A reactive policy defines a mapping from current state fluents to action fluents.
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – RDDL2TensorFlow compiler. - config (Dict) – The reactive policy configuration parameters.
-
__call__
(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor]¶ Returns action fluents for the current state and timestep.
Parameters: - state (Sequence[tf.Tensor]) – A tuple of state fluents.
- timestep (tf.Tensor) – The current timestep.
Returns: A tuple of action fluents.
Return type: Sequence[tf.Tensor]
-
build
() → None¶ Create the DRP layers and trainable weights.
-
classmethod
from_json
(compiler: rddl2tf.compiler.Compiler, json_config: str) → tfmdp.policy.drp.DeepReactivePolicy¶ Instantiates a DRP from a json_config string.
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – RDDL2TensorFlow compiler. - json_config (str) – A DRP configuration encoded in JSON format.
Returns: A DRP object.
Return type: - compiler (
-
graph
¶
-
name
¶ Returns the canonical DRP name.
-
restore
(sess: tensorflow.python.client.session.Session, path: Optional[str] = None) → None¶ Restores previously saved DRP trainable variables.
If path is not provided, restores from last saved checkpoint.
Parameters: - sess (
tf.Session
) – A running session. - path (Optional[str]) – An optional path to a checkpoint directory.
- sess (
-
save
(sess: tensorflow.python.client.session.Session, path: str) → str¶ Serializes all DRP trainable variables into a checkpoint file.
Parameters: - sess (
tf.Session
) – A running session. - path (str) – The path to a checkpoint directory.
Returns: The path prefix of the newly created checkpoint file.
Return type: str
- sess (
-
size
¶ Returns the number of trainable parameters.
-
summary
() → None¶ Prints a string summary of the DRP.
-
to_json
() → str¶ Returns the policy configuration parameters serialized in JSON format.
-
vars
¶ Returns a list of the trainable variables.
- compiler (
tfmdp.policy.feedforward module¶
-
class
tfmdp.policy.feedforward.
FeedforwardPolicy
(compiler: rddl2tf.compiler.Compiler, config: dict)¶ Bases:
tfmdp.policy.drp.DeepReactivePolicy
FeedforwardPolicy implements a DRP as a multi-layer perceptron.
- It is parameterized by the following configuration params:
- config[‘layers’]: a list of number of units; and
- config[‘activation’]: an activation function.
Parameters: - compiler (
rddl2tf.compiler.Compiler
) – RDDL2TensorFlow compiler. - config (Dict) – The policy configuration parameters.
-
__call__
(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor]¶ Returns action fluents for the current state and timestep.
Parameters: - state (Sequence[tf.Tensor]) – A tuple of state fluents.
- timestep (tf.Tensor) – The current timestep.
Returns: A tuple of action fluents.
Return type: Sequence[tf.Tensor]
Builds all hidden layers as tf.layers.Dense layers.
-
_build_input_layer
() → None¶ Builds the DRP input layer using a tfmdp.policy.layers.state_layer.StateLayer.
-
_build_output_layer
() → None¶ Builds the DRP output layer using a tfmdp.policy.layers.action_layer.ActionLayer.
-
build
() → None¶ Create the DRP layers and trainable weights.
-
name
¶ Returns the canonical DRP name.
-
size
¶ Returns the number of trainable parameters.
-
vars
¶ Returns a list of the trainable variables.