In this environment, agents need to collaborate to make a soup containing an onion and a tomato within 7 timesteps. However, the recipe requires placing the onion into the pot precisely one timestep after the tomato to get +1 reward, or the soup will be ruined, giving a reward of 0. The agents’ action space be composed of the following: rotate left, rotate right, and interact.
This environment has two strategic equivalence classes: one class has the left agent collecting an onion and the right agent collecting a tomato, while the other class has the roles switched.
Full Information
Class 1
Class 2
In the partially-observed version of the environment, the central pot blocks the agents’ vision, preventing them from observing the position or actions of their co-player. This modification greatly increases the degree of coordination required between the agents because they can not rely on observing and adapting to their co-player’s behavior.