Basic Concepts

Dojo runs a simulation loop, combining principles from reinforcement learning and agent-based modelling. This flexible framework enables you to simulate and explore a diverse range of scenarios that reflect DeFi dynamics.

The schematic below shows the simplified representation of the agent-environment loop.

agent-environemnt loop

To make things easier, let’s consider an example of a trader trading on Uniswap V3:

  1. Environment creation: as a first step in the simulation, an environment is created. In this case UniswapV3. An environment can have one or more pools.
  2. Agent creation: you can define agents that are acting upon the environment. Crucially, you'll have to define a reward function for the agent, mapping the pool observations to a single value measuring the agent success.
  3. Run policies: Observations provide information on the state of the environment. these obervations are turned into actions via your policies. Think of the trader submitting a trade based on the current price in a pool.
  4. Process actions: the environment executes these actions on the blockchain. This can be imagined as Uniswap processing the trader's trade. The observations change and the agents compute their rewards for taking their actions.

The agent rewards are useful in two cases: static policy performance and optimizing a policy.

Dojo is still in Beta. Become an early adopter and help us shape the product in the way you need.

Static Policy Performance

If your policy is static (non-parameter based), the agent’s reward function is simply a measure of performance at every step in the simulation loop.

agent-environemnt loop

Optimizing a Policy

If you want to optimize model parameters to improve your policy performance, dojo’s feedback loop feeds the reward function back into the policy to iteratively enhance the decision-making process and generate better actions taken by your agent.

agent-environemnt loop



Environments represent DeFi protocols (e.g. Uniswap V3 or AAVE).

Environments represent the setting in which an agent interacts and possibly learns. At every simulation step (every block on the blockchain), the environment emits an observation and rewards generated by the agents in the simulation. The observation provides information on the state of the environment at the current simulation block.

The agents' policies process the observation and rewards to take a sequence of actions. The actions are executed inside the environment. In practice this means dojo makes the appropriate smart contract call to our local fork of the DeFi protocol, and emits new observations, i.e. the updated state of the chain, back to the simulation.

See here for more details.


Agents represent the state of the actors interacting with the environment (e.g. traders).

The environment has reference to all agents, each of which keeps track of their own cryptocurrency quantities. They also implement their own reward function for metric tracking, and, optionally, policy training.

Agents are NOT responsible for making decisions on how to act in the environment.

See here for more details.


Policies determine the behavior of the agents.

This is where you can get creative by implementing your own policy for interacting with the environment! For example, a basic policy could be the moving average trading strategy.

At every simulation step (i.e., every block), the policy receives an observation from the environment, with which it generates a sequence of actions to pass back. Optionally, it also receives a reward which can be used to train or fit a model contained within it.

See here for more details.