tfmdp.policy package¶

Subpackages¶

tfmdp.policy.layers package

Submodules¶

tfmdp.policy.drp module¶

class tfmdp.policy.drp.DeepReactivePolicy(compiler: rddl2tf.compiler.Compiler, config: Dict)¶

Bases: object

DeepReactivePolicy abstract base class.

It defines the basic API for building, saving and restoring reactive policies implemented as deep neural nets.

A reactive policy defines a mapping from current state fluents to action fluents.

Parameters:	compiler (`rddl2tf.compiler.Compiler`) – RDDL2TensorFlow compiler. config (Dict) – The reactive policy configuration parameters.

__call__(state: Sequence[tensorflow.python.framework.ops.Tensor], timestep: tensorflow.python.framework.ops.Tensor) → Sequence[tensorflow.python.framework.ops.Tensor]¶

Returns action fluents for the current state and timestep.

Parameters:	state (Sequence[tf.Tensor]) – A tuple of state fluents. timestep (tf.Tensor) – The current timestep.
Returns:	A tuple of action fluents.
Return type:	Sequence[tf.Tensor]

build() → None¶: Create the DRP layers and trainable weights.

classmethod from_json(compiler: rddl2tf.compiler.Compiler, json_config: str) → tfmdp.policy.drp.DeepReactivePolicy¶

Instantiates a DRP from a json_config string.

Parameters:	compiler (`rddl2tf.compiler.Compiler`) – RDDL2TensorFlow compiler. json_config (str) – A DRP configuration encoded in JSON format.
Returns:	A DRP object.
Return type:	`tfmdp.policy.drp.DeepReactivePolicy`

graph¶

name¶: Returns the canonical DRP name.

restore(sess: tensorflow.python.client.session.Session, path: Optional[str] = None) → None¶

Restores previously saved DRP trainable variables.

If path is not provided, restores from last saved checkpoint.

Parameters:	sess (`tf.Session`) – A running session. path (Optional[str]) – An optional path to a checkpoint directory.

save(sess: tensorflow.python.client.session.Session, path: str) → str¶

Serializes all DRP trainable variables into a checkpoint file.

Parameters:	sess (`tf.Session`) – A running session. path (str) – The path to a checkpoint directory.
Returns:	The path prefix of the newly created checkpoint file.
Return type:	str

size¶: Returns the number of trainable parameters.

summary() → None¶: Prints a string summary of the DRP.

to_json() → str¶: Returns the policy configuration parameters serialized in JSON format.

vars¶: Returns a list of the trainable variables.

tfmdp.policy.feedforward module¶

class tfmdp.policy.feedforward.FeedforwardPolicy(compiler: rddl2tf.compiler.Compiler, config: dict)¶

Bases: tfmdp.policy.drp.DeepReactivePolicy

FeedforwardPolicy implements a DRP as a multi-layer perceptron.

It is parameterized by the following configuration params:

config[‘layers’]: a list of number of units; and
config[‘activation’]: an activation function.

Parameters:	compiler (`rddl2tf.compiler.Compiler`) – RDDL2TensorFlow compiler. config (Dict) – The policy configuration parameters.