In this environment, agents need to collaborate to make a soup containing an onion and a tomato within 7 timesteps. However, the recipe requires placing the onion into the pot precisely one timestep after the tomato to get +1 reward, or the soup will be ruined, giving a reward of 0. The agents’ action space be composed of the following: rotate left, rotate right, and interact.

This environment has two strategic equivalence classes: one class has the left agent collecting an onion and the right agent collecting a tomato, while the other class has the roles switched.


Full Information

Class 1

Class 2


In the partially-observed version of the environment, the central pot blocks the agents’ vision, preventing them from observing the position or actions of their co-player. This modification greatly increases the degree of coordination required between the agents because they can not rely on observing and adapting to their co-player’s behavior.

Partial Information

Class 1

Class 2

Class 3

Class 4

Class 5

Class 6