Simulation Loop
The simulation loop in dojo brings everything together through an iterative process on a per block basis.
At each step, the environment emits an observation and the agents' rewards. These are processed by the policy which generates a sequence of actions. This is passed to the environment, which emits a new observation reflecting the new state of the protocol, and the agents' rewards for taking those actions.
Basic Pattern
This is the basic pattern of the simulation loop:
- Firstly, the environment emits an initial observation to the agent, which represents the state of the environment.
- Then the agent takes in the observations and makes decisions based on its policy. It also computes it's reward based on observations.
- If you are testing your strategy, this reward is simply a way of measuring your strategy performance.
- If you are training your strategy, the agent takes the reward function to optimize parameters based on the state-action-reward transition.
- The environment executes the actions and moves forward in time to the next block.
- At each step in the loop, a termination condition is checked. This condition could be a terminal state, in this case, for example when the agent runs out of money.
- The simulation loop keeps repeating this cycle until a predefined stopping condition is met.
If you want a reminder on some of the concepts here, take a longer peek at the environment, the agent or the policy as you see fit.
Example
This is a visualization of the above simulation:
Saving data
For the moment, the most comprehensive way to store data is through the dashboard. The dashboard allows you to save all data in JSON format on a per block basis. You can load it into an empty dashboard later, or read the JSON for further processing.