Sensor Managers

class stonesoup.sensormanager.base.SensorManager(sensors: Set[Sensor] = None, platforms: Set[Platform] = None, reward_function: Callable = None, take_sensors_from_platforms: bool = True)[source]

Bases: Base, ABC

The sensor manager base class.

The purpose of a sensor manager is to return a mapping of sensors and sensor actions appropriate to a specific scenario and with a particular objective, or objectives, in mind. This involves using estimates of the situation and knowledge of the sensor system to calculate metrics associated with actions, and then determine optimal, or near optimal, actions to take.

There is considerable freedom in both the theory and practice of sensor management and these classes do not enforce a particular solution. A sensor manager may be ‘centralised’ in that it controls the actions of multiple sensors, or individual sensors may have their own managers which communicate with other sensor managers in a networked fashion.

Parameters:

sensors (Set[ForwardRef('Sensor')], optional) – The sensor(s) which the sensor manager is managing.
platforms (Set[ForwardRef('Platform')], optional) – The platform(s) which the sensor manager is managing.
reward_function (Callable, optional) – A function or class designed to work out the reward associated with an action or set of actions. For an example see RewardFunction. This may also incorporate a notion of the cost of making a measurement. The values returned may be scalar or vector in the case of multi-objective optimisation. Metrics may be of any type and in any units.
take_sensors_from_platforms (bool, optional) – Whether to include sensors that are on the platform(s) but not explicitly passed to the sensor manager. Any sensors not added will not be considered by the sensor manager or reward function.

reward_function: Callable: A function or class designed to work out the reward associated with an action or set of actions. For an example see RewardFunction. This may also incorporate a notion of the cost of making a measurement. The values returned may be scalar or vector in the case of multi-objective optimisation. Metrics may be of any type and in any units.

take_sensors_from_platforms: bool: Whether to include sensors that are on the platform(s) but not explicitly passed to the sensor manager. Any sensors not added will not be considered by the sensor manager or reward function.

platforms: Set[Platform]: The platform(s) which the sensor manager is managing.

sensors: Set[Sensor]: The sensor(s) which the sensor manager is managing.

abstract choose_actions(timestamp, nchoose, **kwargs)[source]

A method which returns a set of actions, designed to be enacted by a sensor, or sensors, chosen by some means. This will likely make use of optimisation algorithms.

Returns:: Key-value pairs of the form ‘sensor: actions’. In the general case a sensor may be given a single action, or a list. The actions themselves are objects which must be interpretable by the sensor to which they are assigned.
Return type:: dict {Sensor: [Action]}

class stonesoup.sensormanager.base.RandomSensorManager(sensors: Set[Sensor] = None, platforms: Set[Platform] = None, reward_function: Callable = None, take_sensors_from_platforms: bool = True)[source]

Bases: SensorManager

As the name suggests, a sensor manager which returns a random choice of action or actions from the list available. Its practical purpose is to serve as a baseline to test against.

Parameters:

sensors (Set[ForwardRef('Sensor')], optional) – The sensor(s) which the sensor manager is managing.
platforms (Set[ForwardRef('Platform')], optional) – The platform(s) which the sensor manager is managing.
reward_function (Callable, optional) – A function or class designed to work out the reward associated with an action or set of actions. For an example see RewardFunction. This may also incorporate a notion of the cost of making a measurement. The values returned may be scalar or vector in the case of multi-objective optimisation. Metrics may be of any type and in any units.
take_sensors_from_platforms (bool, optional) – Whether to include sensors that are on the platform(s) but not explicitly passed to the sensor manager. Any sensors not added will not be considered by the sensor manager or reward function.

choose_actions(tracks, timestamp, nchoose=1, **kwargs)[source]

Returns a randomly chosen [list of] action(s) from the action set for each sensor.

Parameters:

tracks (set of Track) – Set of tracks at given time. Used in reward function.
timestamp (datetime.datetime) – Time at which the actions are carried out until
nchoose (int) – Number of actions from the set to choose (default is 1)

Returns:

The pairs of Sensor: [Action] selected

Return type:

dict

class stonesoup.sensormanager.base.BruteForceSensorManager(sensors: Set[Sensor] = None, platforms: Set[Platform] = None, reward_function: Callable = None, take_sensors_from_platforms: bool = True)[source]

Bases: SensorManager

A sensor manager which returns a choice of action from those available. The sensor manager iterates through every possible configuration of sensors and actions and selects the configuration which returns the maximum reward as calculated by a reward function.

Parameters:

sensors (Set[ForwardRef('Sensor')], optional) – The sensor(s) which the sensor manager is managing.
platforms (Set[ForwardRef('Platform')], optional) – The platform(s) which the sensor manager is managing.
reward_function (Callable, optional) – A function or class designed to work out the reward associated with an action or set of actions. For an example see RewardFunction. This may also incorporate a notion of the cost of making a measurement. The values returned may be scalar or vector in the case of multi-objective optimisation. Metrics may be of any type and in any units.
take_sensors_from_platforms (bool, optional) – Whether to include sensors that are on the platform(s) but not explicitly passed to the sensor manager. Any sensors not added will not be considered by the sensor manager or reward function.

choose_actions(tracks, timestamp, nchoose=1, return_reward=False, **kwargs)[source]

Returns a chosen [list of] action(s) from the action set for each sensor. Chosen action(s) is selected by finding the configuration of sensors: actions which returns the maximum reward, as calculated by a reward function.

Parameters:

tracks (set of Track) – Set of tracks at given time. Used in reward function.
timestamp (datetime.datetime) – Time at which the actions are carried out until
nchoose (int) – Number of actions from the set to choose (default is 1)
return_reward (bool) – Whether to return the reward for chosen actions (default is False) When True, returns a tuple of 1d arrays: (dictionaries of chosen actions, rewards)

Returns:

The pairs of Sensor: [Action] selected and the array contains the corresponding reward.

Return type:

list(dict) or (list(dict), numpy.ndarray)

class stonesoup.sensormanager.base.GreedySensorManager(sensors: Set[Sensor] = None, platforms: Set[Platform] = None, reward_function: Callable = None, take_sensors_from_platforms: bool = True)[source]

Bases: SensorManager

A sensor manager that returns a choice of actions from those available. Calculates a reward function for each sensor in isolation. Selects the action that maximises reward for each sensor.

Parameters:

sensors (Set[ForwardRef('Sensor')], optional) – The sensor(s) which the sensor manager is managing.
platforms (Set[ForwardRef('Platform')], optional) – The platform(s) which the sensor manager is managing.
reward_function (Callable, optional) – A function or class designed to work out the reward associated with an action or set of actions. For an example see RewardFunction. This may also incorporate a notion of the cost of making a measurement. The values returned may be scalar or vector in the case of multi-objective optimisation. Metrics may be of any type and in any units.
take_sensors_from_platforms (bool, optional) – Whether to include sensors that are on the platform(s) but not explicitly passed to the sensor manager. Any sensors not added will not be considered by the sensor manager or reward function.

choose_actions(tracks, timestamp, nchoose=1, **kwargs)[source]

Returns a chosen [list of] action(s) from the action set for each sensor. Chosen action(s) is selected by finding the configuration of sensors: actions which returns the maximum reward, as calculated by a reward function.

Parameters:

tracks (set of Track) – Set of tracks at given time. Used in reward function.
timestamp (datetime.datetime) – Time at which the actions are carried out until
nchoose (int) – Number of actions from the set to choose (default is 1)

Returns:

The pairs of Sensor: [Action] selected

Return type:

dict

class stonesoup.sensormanager.optimise.OptimizeBruteSensorManager(sensors: Set[Sensor] = None, platforms: Set[Platform] = None, reward_function: Callable = None, take_sensors_from_platforms: bool = True, n_grid_points: int = 10, generate_full_output: bool = False, finish: bool = False, disp: bool = False)[source]

Bases: _OptimizeSensorManager

A sensor manager built around the SciPy brute() method. The sensor manager takes all possible configurations of sensors and actions and uses the optimising function to optimise a given reward function, returning the optimal configuration.

Scipy optimize provides functions which can minimize or maximize functions using a variety of algorithms. The brute() minimizes a function over a given range, using a brute force method. This is done by computing the function’s value at each point of a multidimensional grid of points, to find the global minimum.

A default version of the optimiser is used, or on initiation the sensor manager can be passed some parameters to alter the configuration of the optimiser. Please see the Scipy documentation site for full details on what each parameter does.

Parameters:

sensors (Set[ForwardRef('Sensor')], optional) – The sensor(s) which the sensor manager is managing.
platforms (Set[ForwardRef('Platform')], optional) – The platform(s) which the sensor manager is managing.
reward_function (Callable, optional) – A function or class designed to work out the reward associated with an action or set of actions. For an example see RewardFunction. This may also incorporate a notion of the cost of making a measurement. The values returned may be scalar or vector in the case of multi-objective optimisation. Metrics may be of any type and in any units.
take_sensors_from_platforms (bool, optional) – Whether to include sensors that are on the platform(s) but not explicitly passed to the sensor manager. Any sensors not added will not be considered by the sensor manager or reward function.
n_grid_points (int, optional) – Number of grid points to search along axis. See Ns in brute(). Default is 10.
generate_full_output (bool, optional) – If True, returns the evaluation grid and the objective function’s values on it.
finish (bool, optional) – A polishing function can be applied to the result of brute force minimisation. If True this is set as fmin() which minimizes a function using the downhill simplex algorithm.As a default no polishing function is applied.
disp (bool, optional) – Set to True to print convergence messages from the finish callable.

n_grid_points: int: Number of grid points to search along axis. See Ns in brute(). Default is 10.

generate_full_output: bool: If True, returns the evaluation grid and the objective function’s values on it.

finish: bool: A polishing function can be applied to the result of brute force minimisation. If True this is set as fmin() which minimizes a function using the downhill simplex algorithm.As a default no polishing function is applied.

disp: bool: Set to True to print convergence messages from the finish callable.

get_full_output()[source]

Returns the output generated when generate_full_output=True for the most recent time step. This returns the evaluation grid and reward function’s values on it, as generated by the optimize.brute() method. See Scipy documentation for full details.

Returns:: full_output
Return type:: tuple

choose_actions(tracks, timestamp, nchoose=1, return_reward=False, **kwargs)

Returns a chosen [list of] action(s) from the action set for each sensor. Chosen action(s) is selected by finding the configuration of sensors: actions which returns the maximum reward, as calculated by a reward function.

Parameters:

tracks (set of Track) – Set of tracks at given time. Used in reward function.
timestamp (datetime.datetime) – Time at which the actions are carried out until
nchoose (int) – Number of actions from the set to choose (default is 1)
return_reward (bool) – Whether to return the reward for chosen actions (default is False) When True, returns a tuple of 1d arrays: (dictionaries of chosen actions, rewards)

Returns:

The pairs of Sensor: [Action] selected and the array contains the corresponding reward.

Return type:

list(dict) or (list(dict), numpy.ndarray)

platforms: Set['Platform']: The platform(s) which the sensor manager is managing.

reward_function: Callable: A function or class designed to work out the reward associated with an action or set of actions. For an example see RewardFunction. This may also incorporate a notion of the cost of making a measurement. The values returned may be scalar or vector in the case of multi-objective optimisation. Metrics may be of any type and in any units.

sensors: Set['Sensor']: The sensor(s) which the sensor manager is managing.

take_sensors_from_platforms: bool: Whether to include sensors that are on the platform(s) but not explicitly passed to the sensor manager. Any sensors not added will not be considered by the sensor manager or reward function.

class stonesoup.sensormanager.optimise.OptimizeBasinHoppingSensorManager(sensors: Set[Sensor] = None, platforms: Set[Platform] = None, reward_function: Callable = None, take_sensors_from_platforms: bool = True, n_iter: int = 100, T: float = 1.0, stepsize: float = 0.5, interval: int = 50, disp: bool = False, niter_success: int = None)[source]

Bases: _OptimizeSensorManager

A sensor manager built around the SciPy optimize.basinhopping() method. The sensor manager takes all possible configurations of sensors and actions and uses the optimising function to optimise a given reward function, returning the optimal configuration for the sensing system.

The basinhopping() finds the global minimum of a function using the basin-hopping algorithm. This is a combination of a global stepping algorithm and local minimization at each step.

A default version of the optimiser is used, or on initiation the sensor manager can be passed some parameters to alter the configuration of the optimiser. Please see the Scipy documentation site for full details on what each parameter does.

Parameters:

sensors (Set[ForwardRef('Sensor')], optional) – The sensor(s) which the sensor manager is managing.
platforms (Set[ForwardRef('Platform')], optional) – The platform(s) which the sensor manager is managing.
reward_function (Callable, optional) – A function or class designed to work out the reward associated with an action or set of actions. For an example see RewardFunction. This may also incorporate a notion of the cost of making a measurement. The values returned may be scalar or vector in the case of multi-objective optimisation. Metrics may be of any type and in any units.
take_sensors_from_platforms (bool, optional) – Whether to include sensors that are on the platform(s) but not explicitly passed to the sensor manager. Any sensors not added will not be considered by the sensor manager or reward function.
n_iter (int, optional) – The number of basin hopping iterations.
T (float, optional) – The “temperature” parameter for the accept or reject criterion. Higher temperatures mean larger jumps in function value will be accepted.
stepsize (float, optional) – Maximum step size for use in the random displacement.
interval (int, optional) – Interval for how often to update the stepsize.
disp (bool, optional) – Set to True to print status messages.
niter_success (int, optional) – Stop the run if the global minimum candidate remains the same for this number of iterations.

n_iter: int: The number of basin hopping iterations.

T: float: The “temperature” parameter for the accept or reject criterion. Higher temperatures mean larger jumps in function value will be accepted.

stepsize: float: Maximum step size for use in the random displacement.

interval: int: Interval for how often to update the stepsize.

disp: bool: Set to True to print status messages.

niter_success: int: Stop the run if the global minimum candidate remains the same for this number of iterations.

choose_actions(tracks, timestamp, nchoose=1, return_reward=False, **kwargs)

Returns a chosen [list of] action(s) from the action set for each sensor. Chosen action(s) is selected by finding the configuration of sensors: actions which returns the maximum reward, as calculated by a reward function.

Parameters:

tracks (set of Track) – Set of tracks at given time. Used in reward function.
timestamp (datetime.datetime) – Time at which the actions are carried out until
nchoose (int) – Number of actions from the set to choose (default is 1)
return_reward (bool) – Whether to return the reward for chosen actions (default is False) When True, returns a tuple of 1d arrays: (dictionaries of chosen actions, rewards)

Returns:

The pairs of Sensor: [Action] selected and the array contains the corresponding reward.

Return type:

list(dict) or (list(dict), numpy.ndarray)

platforms: Set['Platform']: The platform(s) which the sensor manager is managing.

reward_function: Callable: A function or class designed to work out the reward associated with an action or set of actions. For an example see RewardFunction. This may also incorporate a notion of the cost of making a measurement. The values returned may be scalar or vector in the case of multi-objective optimisation. Metrics may be of any type and in any units.

sensors: Set['Sensor']: The sensor(s) which the sensor manager is managing.

take_sensors_from_platforms: bool: Whether to include sensors that are on the platform(s) but not explicitly passed to the sensor manager. Any sensors not added will not be considered by the sensor manager or reward function.

Tree Search Managers

class stonesoup.sensormanager.tree_search.MCTSBestChildPolicyEnum(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Best child policy Enum class for specifying which policy to use when selecting the best child at the end of the MCTS process.

class stonesoup.sensormanager.tree_search.MonteCarloTreeSearchSensorManager(sensors: Set[Sensor] = None, platforms: Set[Platform] = None, reward_function: Callable = None, take_sensors_from_platforms: bool = True, niterations: int = 100, time_step: timedelta = datetime.timedelta(seconds=1), exploration_factor: float = 1.0, best_child_policy: MCTSBestChildPolicyEnum = MCTSBestChildPolicyEnum.MAXCREWARD)[source]

Bases: SensorManager

A Monte Carlo tree search based sensor management algorithm implementing simple value estimation.

Monte Carlo tree search works by simultaneously constructing and evaluating a search tree of states and actions through an iterative process. The process consists of 4 stages: Selection, Expansion, Simulation and Backpropagation. The purpose of the algorithm is to arrive at the optimal action policy by sequentially estimating the action value function, \(Q\), and returning the maximum argument to this at the end of the process.

Starting from the root node (current state or estimated state) the best child node is selected. The most common way, and the way implemented here, is to select this node according to the upper confidence bound (UCB) for trees. This is given by

\[\text{argmax}_{a} \frac{Q(h, a)}{N(h, a)}+c\sqrt{\frac{\log N(h)}{N(h,a)}},\]

where \(a\) is the action, \(h\) is the history (for POMDP problems a history or belief is commonly used but in MDP problems h would be replaced with a state), \(Q(h, a)\) is the current cumulative action value estimate, \(N(h, a)\) is the number of visits or simulations of this node, \(N(h)\) is the number of visits to the parent node and \(c\) is the exploration factor, defined with exploration_factor. The purpose of the UCB is to trade off between exploitation of the most rewarding nodes in the tree and exploration of those that have been visited fewer times, as the second term in the above expression will accumulate as the ratio of number of parent visits and child visits increases.

Once the best child node has been selected, this becomes a parent node and a new child node added according to the available set of unvisited actions. This selection happens at random. This node is then simulated by predicting the current state estimate in the parent node and updating this estimate with a generated detection after applying the candidate action. This provides a predicted future state which is used to calculate the action value of this node. This is done by providing a reward_function. Finally, this reward is added to the current action value estimated in each node on the search tree branch that was descended during selection. This creates a tradeoff between future and immediate rewards during the next iteration of the search process.

Once a predefined computational budget has been reached, which in this implementation is the niterations attribute, the best child to the root node in the tree is determined and returned from the choose_actions(). The user can select which criteria used to select this best action by defining the best_child_policy. Further detail on this particular implementation, including the rollout process in MCTSRolloutSensorManager can be seen in work by Glover et al [1]. Further detail on MCTS and its variations can also be seen in [2].

References

Parameters:

sensors (Set[ForwardRef('Sensor')], optional) – The sensor(s) which the sensor manager is managing.
platforms (Set[ForwardRef('Platform')], optional) – The platform(s) which the sensor manager is managing.
reward_function (Callable, optional) – A function or class designed to work out the reward associated with an action or set of actions. This will be implemented to evaluate each action within the rollout with the discounted sum being stored at the node representing the first action.
take_sensors_from_platforms (bool, optional) – Whether to include sensors that are on the platform(s) but not explicitly passed to the sensor manager. Any sensors not added will not be considered by the sensor manager or reward function.
niterations (int, optional) – The number of iterations of the tree search process to be carried out.
time_step (datetime.timedelta, optional) – The sample time between steps in the horizon.
exploration_factor (float, optional) – The exploration factor used in the upper confidence bound for trees.
best_child_policy (MCTSBestChildPolicyEnum, optional) – The policy for selecting the best child. Options are 'max_average_reward' for the maximum reward per visit to a node, 'max_cumulative_reward' for the maximum total reward after all simulations and 'max_visits' for the node with the maximum number of visits. Default is 'max_cumulative_reward'.

reward_function: Callable: A function or class designed to work out the reward associated with an action or set of actions. This will be implemented to evaluate each action within the rollout with the discounted sum being stored at the node representing the first action.

niterations: int: The number of iterations of the tree search process to be carried out.

time_step: timedelta: The sample time between steps in the horizon.

exploration_factor: float: The exploration factor used in the upper confidence bound for trees.

best_child_policy: MCTSBestChildPolicyEnum: The policy for selecting the best child. Options are 'max_average_reward' for the maximum reward per visit to a node, 'max_cumulative_reward' for the maximum total reward after all simulations and 'max_visits' for the node with the maximum number of visits. Default is 'max_cumulative_reward'.

choose_actions(tracks, timestamp, nchoose=1, **kwargs)[source]

Returns a list of actions that reflect the best child nodes to the root node in the tree.

Parameters:

tracks (set of Track) – Set of tracks at given time. Used in reward function.
timestamp (datetime.datetime) – Time at which the actions are carried out until
nchoose (int) – Number of actions from the set to choose (default is 1)

Returns:

The pairs of Sensor: [Action] selected

Return type:

dict

tree_policy(nodes, node_indx)[source]: Implements the upper confidence bound for trees, which balances exploitation of highly rewarding actioned and exploring actions that have been visited a fewer times

select_best_child(nodes)[source]: Selects the best child node to the root node in the tree according to maximum number of visits.

simulate_action(node, parent_node)[source]: Simulates the expected reward that would be received by executing the candidate action.

platforms: Set['Platform']: The platform(s) which the sensor manager is managing.

sensors: Set['Sensor']: The sensor(s) which the sensor manager is managing.

take_sensors_from_platforms: bool: Whether to include sensors that are on the platform(s) but not explicitly passed to the sensor manager. Any sensors not added will not be considered by the sensor manager or reward function.

class stonesoup.sensormanager.tree_search.MCTSRolloutSensorManager(sensors: Set[Sensor] = None, platforms: Set[Platform] = None, reward_function: Callable = None, take_sensors_from_platforms: bool = True, niterations: int = 100, time_step: timedelta = datetime.timedelta(seconds=1), exploration_factor: float = 1.0, best_child_policy: MCTSBestChildPolicyEnum = MCTSBestChildPolicyEnum.MAXCREWARD, rollout_depth: int = 1, discount_factor: float = 0.9)[source]

Bases: MonteCarloTreeSearchSensorManager

A Monte Carlo Tree Search based sensor management algorithm that implements Monte Carlo rollout for more robust action simulation. All other details are consistent with MonteCarloTreeSearchSensorManager

Parameters:

sensors (Set[ForwardRef('Sensor')], optional) – The sensor(s) which the sensor manager is managing.
platforms (Set[ForwardRef('Platform')], optional) – The platform(s) which the sensor manager is managing.
reward_function (Callable, optional) – A function or class designed to work out the reward associated with an action or set of actions. This will be implemented to evaluate each action within the rollout with the discounted sum being stored at the node representing the first action.
take_sensors_from_platforms (bool, optional) – Whether to include sensors that are on the platform(s) but not explicitly passed to the sensor manager. Any sensors not added will not be considered by the sensor manager or reward function.
niterations (int, optional) – The number of iterations of the tree search process to be carried out.
time_step (datetime.timedelta, optional) – The sample time between steps in the horizon.
exploration_factor (float, optional) – The exploration factor used in the upper confidence bound for trees.
best_child_policy (MCTSBestChildPolicyEnum, optional) – The policy for selecting the best child. Options are 'max_average_reward' for the maximum reward per visit to a node, 'max_cumulative_reward' for the maximum total reward after all simulations and 'max_visits' for the node with the maximum number of visits. Default is 'max_cumulative_reward'.
rollout_depth (int, optional) – The depth of rollout to conduct for each node.
discount_factor (float, optional) – The discount factor is applied to each action evaluated in the tree to assign an incrementally lower multiplier to future actions in the tree.

rollout_depth: int: The depth of rollout to conduct for each node.

discount_factor: float: The discount factor is applied to each action evaluated in the tree to assign an incrementally lower multiplier to future actions in the tree.

best_child_policy: MCTSBestChildPolicyEnum: The policy for selecting the best child. Options are 'max_average_reward' for the maximum reward per visit to a node, 'max_cumulative_reward' for the maximum total reward after all simulations and 'max_visits' for the node with the maximum number of visits. Default is 'max_cumulative_reward'.

choose_actions(tracks, timestamp, nchoose=1, **kwargs)

Returns a list of actions that reflect the best child nodes to the root node in the tree.

Parameters:

tracks (set of Track) – Set of tracks at given time. Used in reward function.
timestamp (datetime.datetime) – Time at which the actions are carried out until
nchoose (int) – Number of actions from the set to choose (default is 1)

Returns:

The pairs of Sensor: [Action] selected

Return type:

dict

exploration_factor: float: The exploration factor used in the upper confidence bound for trees.

niterations: int: The number of iterations of the tree search process to be carried out.

platforms: Set['Platform']: The platform(s) which the sensor manager is managing.

reward_function: Callable: A function or class designed to work out the reward associated with an action or set of actions. This will be implemented to evaluate each action within the rollout with the discounted sum being stored at the node representing the first action.

select_best_child(nodes): Selects the best child node to the root node in the tree according to maximum number of visits.

sensors: Set['Sensor']: The sensor(s) which the sensor manager is managing.

simulate_action(node, parent_node)[source]: Simulates the expected reward that would be received by executing the candidate action.

take_sensors_from_platforms: bool: Whether to include sensors that are on the platform(s) but not explicitly passed to the sensor manager. Any sensors not added will not be considered by the sensor manager or reward function.

time_step: timedelta: The sample time between steps in the horizon.

tree_policy(nodes, node_indx): Implements the upper confidence bound for trees, which balances exploitation of highly rewarding actioned and exploring actions that have been visited a fewer times

Reward Functions

class stonesoup.sensormanager.reward.RewardFunction[source]

Bases: Base, ABC

The reward function base class.

A reward function is a callable used by a sensor manager to determine the best choice of action(s) for a sensor or group of sensors to take. For a given configuration of sensors and actions the reward function calculates a metric to evaluate how useful that choice of actions would be with a particular objective or objectives in mind. The sensor manager algorithm compares this metric for different possible configurations and chooses the appropriate sensing configuration to use at that time step.

__call__(config: Mapping[Sensor, Sequence[Action]], tracks: Set[Track], metric_time: datetime, *args, **kwargs)[source]

A method which returns a reward metric based on information about the state of the system, sensors and possible actions they can take. This requires a mapping of sensors to action(s) to be evaluated by reward function, a set of tracks at given time and the time at which the actions would be carried out until.

Returns:: Calculated metric
Return type:: float

__init__()

class stonesoup.sensormanager.reward.UncertaintyRewardFunction(predictor: KalmanPredictor, updater: ExtendedKalmanUpdater, method_sum: bool = True, return_tracks: bool = False, measurement_noise: bool = False)[source]

Bases: RewardFunction

A reward function which calculates the potential reduction in the uncertainty of track estimates if a particular action is taken by a sensor or group of sensors.

Given a configuration of sensors and actions, a metric is calculated for the potential reduction in the uncertainty of the tracks that would occur if the sensing configuration were used to make an observation. A larger value indicates a greater reduction in uncertainty.

Parameters:

predictor (KalmanPredictor) – Predictor used to predict the track to a new state
updater (ExtendedKalmanUpdater) – Updater used to update the track to the new state.
method_sum (bool, optional) – Determines method of calculating reward.Default calculates sum across all targets.Otherwise calculates mean of all targets.
return_tracks (bool, optional) – A flag for allowing the predicted track, used to calculate the reward, to be returned.
measurement_noise (bool, optional) – Decide whether or not to apply measurement model noise to the predicted measurements for sensor management.

predictor: KalmanPredictor: Predictor used to predict the track to a new state

updater: ExtendedKalmanUpdater: Updater used to update the track to the new state.

method_sum: bool: Determines method of calculating reward.Default calculates sum across all targets.Otherwise calculates mean of all targets.

return_tracks: bool: A flag for allowing the predicted track, used to calculate the reward, to be returned.

measurement_noise: bool: Decide whether or not to apply measurement model noise to the predicted measurements for sensor management.

__call__(config: Mapping[Sensor, Sequence[Action]], tracks: Set[Track], metric_time: datetime, *args, **kwargs)[source]

For a given configuration of sensors and actions this reward function calculates the potential uncertainty reduction of each track by computing the difference between the covariance matrix norms of the prediction and the posterior assuming a predicted measurement corresponding to that prediction.

This requires a mapping of sensors to action(s) to be evaluated by reward function, a set of tracks at given time and the time at which the actions would be carried out until.

The metric returned is the total potential reduction in uncertainty across all tracks.

Returns:: Metric of uncertainty for given configuration
Return type:: float

__init__(predictor: KalmanPredictor, updater: ExtendedKalmanUpdater, method_sum: bool = True, return_tracks: bool = False, measurement_noise: bool = False)

class stonesoup.sensormanager.reward.ExpectedKLDivergence(predictor: Predictor = None, updater: Updater = None, method_sum: bool = True, data_associator: DataAssociator = None, return_tracks: bool = False, measurement_noise: bool = False)[source]

Bases: RewardFunction

A reward function that implements the Kullback-Leibler divergence for quantifying relative information gain between actions taken by a sensor or group of sensors.

From a configuration of sensors and actions, an expected measurement is generated based on the predicted distribution and an action being taken. An update is generated based on this measurement. The Kullback-Leibler divergence is then calculated between the predicted and updated target distribution that resulted from the measurement. A larger divergence between these distributions equates to more information gained from the action and resulting measurement from that action.

Parameters:

predictor (Predictor, optional) – Predictor used to predict the track to a new state. This reward function is only compatible with ParticlePredictor types.
updater (Updater, optional) – Updater used to update the track to the new state. This reward function is only compatible with ParticleUpdater types.
method_sum (bool, optional) – Determines method of calculating reward.Default calculates sum across all targets.Otherwise calculates mean of all targets.
data_associator (DataAssociator, optional) – Data associator for associating detections to tracks when multiple sensors are managed.
return_tracks (bool, optional) – A flag for allowing the predicted track, used to calculate the reward, to be returned.
measurement_noise (bool, optional) – Decide whether or not to apply measurement model noise to the predicted measurements for sensor management.

predictor: Predictor: Predictor used to predict the track to a new state. This reward function is only compatible with ParticlePredictor types.

updater: Updater: Updater used to update the track to the new state. This reward function is only compatible with ParticleUpdater types.

method_sum: bool: Determines method of calculating reward.Default calculates sum across all targets.Otherwise calculates mean of all targets.

data_associator: DataAssociator: Data associator for associating detections to tracks when multiple sensors are managed.

return_tracks: bool: A flag for allowing the predicted track, used to calculate the reward, to be returned.

measurement_noise: bool: Decide whether or not to apply measurement model noise to the predicted measurements for sensor management.

__init__(predictor: Predictor = None, updater: Updater = None, method_sum: bool = True, data_associator: DataAssociator = None, return_tracks: bool = False, measurement_noise: bool = False)[source]

__call__(config: Mapping[Sensor, Sequence[Action]], tracks: Set[Track], metric_time: datetime, *args, **kwargs)[source]

For a given configuration of sensors and actions this reward function calculates the expected Kullback-Leibler divergence of each track. It is calculated between the prediction and the posterior assuming an expected update based on a predicted measurement.

This requires a mapping of sensors to action(s) to be evaluated by the reward function, a set of tracks at given time and the time at which the actions would be carried out until.

The metric returned is the total expected Kullback-Leibler divergence across all tracks.

Returns:

float – Kullback-Leibler divergence for given configuration
Set[Track] (if defined) – Set of tracks that have been predicted and updated in reward calculation if return_tracks is True

class stonesoup.sensormanager.reward.MultiUpdateExpectedKLDivergence(predictor: ParticlePredictor = None, updater: ParticleUpdater = None, method_sum: bool = True, data_associator: DataAssociator = None, return_tracks: bool = False, measurement_noise: bool = True, updates_per_track: int = 2)[source]

Bases: ExpectedKLDivergence

A reward function that implements the Kullback-Leibler divergence for quantifying relative information gain between actions taken by a sensor or group of sensors.

From a configuration of sensors and actions, multiple expected measurements per track are generated based on the predicted distribution and an action being taken. The measurements are generated by resampling the particle state down to a subsample with length specified by the user. Updates are generated for each of these measurements and the Kullback-Leibler divergence calculated for each of them.

Parameters:

predictor (ParticlePredictor, optional) – Predictor used to predict the track to a new state. This reward function is only compatible with ParticlePredictor types.
updater (ParticleUpdater, optional) – Updater used to update the track to the new state. This reward function is only compatible with ParticleUpdater types.
method_sum (bool, optional) – Determines method of calculating reward.Default calculates sum across all targets.Otherwise calculates mean of all targets.
data_associator (DataAssociator, optional) – Data associator for associating detections to tracks when multiple sensors are managed.
return_tracks (bool, optional) – A flag for allowing the predicted track, used to calculate the reward, to be returned.
measurement_noise (bool, optional)
updates_per_track (int, optional) – Number of measurements to generate from each track prediction. This should be > 1.

predictor: ParticlePredictor: Predictor used to predict the track to a new state. This reward function is only compatible with ParticlePredictor types.

updater: ParticleUpdater: Updater used to update the track to the new state. This reward function is only compatible with ParticleUpdater types.

updates_per_track: int: Number of measurements to generate from each track prediction. This should be > 1.

measurement_noise: bool

__init__(predictor: ParticlePredictor = None, updater: ParticleUpdater = None, method_sum: bool = True, data_associator: DataAssociator = None, return_tracks: bool = False, measurement_noise: bool = True, updates_per_track: int = 2)[source]

Actionables

class stonesoup.sensormanager.action.Action(end_time: datetime, target_value: Any, generator: Any = None)[source]

Bases: Base

The base class for an action that can be taken by a sensor or platform with an ActionableProperty.

Parameters:

end_time (datetime.datetime) – Time at which modification of the attribute ends.
target_value (Any) – Target value.
generator (Any, optional) – Action generator that created the action.

generator: Any: Action generator that created the action.

end_time: datetime: Time at which modification of the attribute ends.

target_value: Any: Target value.

act(current_time, timestamp, init_value, **kwargs)[source]

Return the attribute modified.

Parameters:

current_time (datetime.datetime) – Current time
timestamp (datetime.datetime) – Modification of attribute ends at this time stamp
init_value (Any) – Current value of the modifiable attribute

Returns:

The new value of the attribute

Return type:

Any

class stonesoup.sensormanager.action.ActionGenerator(owner: object, attribute: str, start_time: datetime, end_time: datetime, resolution: float = None)[source]

Bases: Base

The base class for an action generator.

Parameters:

owner (object) – Actionable object that has the attribute to be modified.
attribute (str) – The name of the attribute to be modified.
start_time (datetime.datetime) – Start time of action.
end_time (datetime.datetime) – End time of action.
resolution (float, optional) – Resolution of action space

owner: object: Actionable object that has the attribute to be modified.

attribute: str: The name of the attribute to be modified.

start_time: datetime: Start time of action.

end_time: datetime: End time of action.

resolution: float: Resolution of action space

property current_value: Return the current value of the owner’s attribute.

property default_action: The default action to modify the property if there is no given action.

class stonesoup.sensormanager.action.RealNumberActionGenerator(owner: object, attribute: str, start_time: datetime, end_time: datetime, resolution: float = None)[source]

Bases: ActionGenerator

Action generator where action is a choice of a real number.

Parameters:

owner (object) – Actionable object that has the attribute to be modified.
attribute (str) – The name of the attribute to be modified.
start_time (datetime.datetime) – Start time of action.
end_time (datetime.datetime) – End time of action.
resolution (float, optional) – Resolution of action space

class stonesoup.sensormanager.action.ActionableProperty(generator_cls, generator_kwargs_mapping=None, cls=None, *, default, doc=None, readonly=False)[source]

Bases: Property

Property that is modified via an Action with defined, non-equal start and end times.

class stonesoup.sensormanager.action.Actionable[source]

Bases: Base, ABC

Base Actionable type.

Contains the core methods of an actionable sensor/platform type.

Notes

An Actionable is required to have a timestamp attribute, in order to validate actions and act. This is an abstract base class, and not intended for direct use. Attaining a timestamp is left to the inheriting type.

actions(timestamp: datetime, start_timestamp: datetime = None) → Set[ActionGenerator][source]

Method to return a set of action generators available up to a provided timestamp.

A generator is returned for each actionable property that the sensor has.

Parameters:

timestamp (datetime.datetime) – Time of action finish.
start_timestamp (datetime.datetime, optional) – Time of action start.

Returns:

Set of action generators, that describe the bounds of each action space.

Return type:

set of ActionGenerator

add_actions(actions: Sequence[Action]) → bool[source]

Add actions to the sensor

Parameters:: actions (sequence of Action) – Sequence of actions that will be executed in order
Returns:: Return True if actions accepted. False if rejected. Returns neither if timestamp is invalid.
Return type:: bool
Raises:: NotImplementedError – If sensor cannot be tasked.

Notes

Base class returns True

act(timestamp: datetime, **kwargs)[source]

Carry out actions up to a timestamp.

Parameters:: timestamp (datetime.datetime) – Carry out actions up to this timestamp.

abstract validate_timestamp() → bool[source]

Method to validate the timestamp of the actionable.

Returns:: True if timestamp is valid, False otherwise.
Return type:: bool