E Grid World With Tokens Reward Manager

From PRLT
Jump to: navigation, search

Overview

The aim of this reward manager is to induce a cooperation behavior among agents in order to coordinate agents moving in a grid. The environment related to this reward manager is like to the Grid world with objects, where in this case an "object" is a token laid on a certain cell of the grid (it tipically has a positive value in <math>\mathbb{R}</math>). Each agent moves in the grid according to its policy and, whenever it is on a cell with a token, it takes that token so it will be unavailable for other agents.

State Space Configuration

The state space consists of the position of each agent on the grid (row and column), the position of each token (token_row, token_column), the value of each token (token_value) and finally of the status of each token (token_status). The latter state variable is equal to -1 when the related token is not yet taken by an agent, else it equals the number of the agent that takes that token.

Given the previous state space configuration, it is easy to compute the sum of the tokens taken till the current time step. By having the previous token configuration (let us suppose that is stored in the array prev_token_conf), the actual one (token_conf) and the token values (token_values), we can find the tokens taken till now using the following algorithm (Notice that the token configuration is changed by the environment and not by the reward manager):

float sum = 0;
for (int t = 0; t < tokens; t++)
{
  if (prev_token_conf[t] == -1 && token_conf[t] > 0)
  {
    sum = sum + token_values[t];
  }
}

This is a simple algorithm. Obviously it must be changed according to the reward manager type that you need to implement.