tfmdp.policy package

Submodules

tfmdp.policy.drp module

class tfmdp.policy.drp.DeepReactivePolicy(compiler: rddl2tf.compiler.Compiler, config: Dict)

Bases: object

DeepReactivePolicy abstract base class.

It defines the basic API for building, saving and restoring reactive policies implemented as deep neural nets.

A reactive policy defines a mapping from current state fluents to action fluents.

Parameters:
  • compiler (rddl2tf.compiler.Compiler) – RDDL2TensorFlow compiler.
  • config (Dict) – The reactive policy configuration parameters.
__call__(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor]

Returns action fluents for the current state and timestep.

Parameters:
  • state (Sequence[tf.Tensor]) – A tuple of state fluents.
  • timestep (tf.Tensor) – The current timestep.
Returns:

A tuple of action fluents.

Return type:

Sequence[tf.Tensor]

build() → None

Create the DRP layers and trainable weights.

classmethod from_json(compiler: rddl2tf.compiler.Compiler, json_config: str) → tfmdp.policy.drp.DeepReactivePolicy

Instantiates a DRP from a json_config string.

Parameters:
  • compiler (rddl2tf.compiler.Compiler) – RDDL2TensorFlow compiler.
  • json_config (str) – A DRP configuration encoded in JSON format.
Returns:

A DRP object.

Return type:

tfmdp.policy.drp.DeepReactivePolicy

graph
name

Returns the canonical DRP name.

restore(sess: tensorflow.python.client.session.Session, path: Optional[str] = None) → None

Restores previously saved DRP trainable variables.

If path is not provided, restores from last saved checkpoint.

Parameters:
  • sess (tf.Session) – A running session.
  • path (Optional[str]) – An optional path to a checkpoint directory.
save(sess: tensorflow.python.client.session.Session, path: str) → str

Serializes all DRP trainable variables into a checkpoint file.

Parameters:
  • sess (tf.Session) – A running session.
  • path (str) – The path to a checkpoint directory.
Returns:

The path prefix of the newly created checkpoint file.

Return type:

str

size

Returns the number of trainable parameters.

summary() → None

Prints a string summary of the DRP.

to_json() → str

Returns the policy configuration parameters serialized in JSON format.

vars

Returns a list of the trainable variables.

tfmdp.policy.feedforward module

class tfmdp.policy.feedforward.FeedforwardPolicy(compiler: rddl2tf.compiler.Compiler, config: dict)

Bases: tfmdp.policy.drp.DeepReactivePolicy

FeedforwardPolicy implements a DRP as a multi-layer perceptron.

It is parameterized by the following configuration params:
  • config[‘layers’]: a list of number of units; and
  • config[‘activation’]: an activation function.
Parameters:
  • compiler (rddl2tf.compiler.Compiler) – RDDL2TensorFlow compiler.
  • config (Dict) – The policy configuration parameters.
__call__(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor]

Returns action fluents for the current state and timestep.

Parameters:
  • state (Sequence[tf.Tensor]) – A tuple of state fluents.
  • timestep (tf.Tensor) – The current timestep.
Returns:

A tuple of action fluents.

Return type:

Sequence[tf.Tensor]

_build_hidden_layers() → None

Builds all hidden layers as tf.layers.Dense layers.

_build_input_layer() → None

Builds the DRP input layer using a tfmdp.policy.layers.state_layer.StateLayer.

_build_output_layer() → None

Builds the DRP output layer using a tfmdp.policy.layers.action_layer.ActionLayer.

build() → None

Create the DRP layers and trainable weights.

name

Returns the canonical DRP name.

size

Returns the number of trainable parameters.

vars

Returns a list of the trainable variables.

Module contents