The Craftax Benchmark

Michael Matthews1
Michael Beukman1
Benjamin Ellis1,2
Mikayel Samvelyan3
Matthew Jackson1,2
Samuel Coward1
Jakob Foerster1

1 FLAIR, University of Oxford    2 WhiRL, University of Oxford    3 DARK, University College London

TL;DR

Existing benchmarks for open-ended learning are either too slow or too simple. Craftax is both fast and complicated. We hope that this will allow researchers without access to industrial compute to investigate learning in an open-ended environment with an ease that was not previously possible.

Introduction

Progress in reinforcement learning (RL) algorithms is driven in large part by the development and adoption of suitable benchmarks. In the effort towards increasingly general agents, there has arisen a community focused on benchmarks that exhibit more open-ended dynamics, in the form of procedural world generation, skill acquisition and reuse, long term dependencies and continual learning. This has motivated the development of environments like MALMO (Minecraft), The NetHack Learning Environment, MiniHack and Crafter. However, slow runtime has rendered them inaccessible to current methods without large-scale computational resources, limiting their practicality to the research community.

We present Craftax, a JAX-based benchmark that combines elements from Crafter and NetHack, while running orders of magnitude faster. We also present Craftax-Classic, a reimplementation of Crafter in JAX that is significantly simpler than the full Craftax environment, but will provide a starting point for those familiar with Crafter.

Craftax is Fast

Craftax is significantly faster than comparable open-ended environments.

Craftax-Classic and Craftax run 257x and 169x faster than Crafter respectively when running the PureJaxRL PPO implementation. All experiments were run on a single machine with an RTX 4090 and i9-13900K. Looking deeper into the results we can see that the main reason for the speed of Craftax is that it can be massively parallelised. This is made possible by JAX, which lets us compile down Craftax to be run on the GPU and vmap over many workers.

Craftax is Hard

Craftax contains 65 achievements split into 4 difficulties. Achievements with higher difficulty give more reward. Current RL methods failed to make significant progress when given a budget of 1 billion environment steps, with no instances of the two hardest classes of achievement being reached. For perspective, it took one of the authors (with extensive knowledge of the game mechanics) roughly 5 hours of gameplay to first achieve a `perfect' run where every achievement was completed.

Craftax is Open-Ended

Craftax contains a diverse range of different skills and tasks to perform.

Mining
Farming
Dungeon Crawling
Archery
Building
Magic

This makes it an excellent testbed for methods like exploration, continual learning and unsupervised skill discovery.

Craftax can also be used for unsupervised environment design. This could be in the form of curating promising seeds with Prioritized Level Replay.

PLR Levels

Or through evolving levels with ACCEL.

ACCEL (unrestricted swap editing)