DAOS Breaks 20TiB/s!

The Aurora supercomputer topped the IO500 production list breaking the bar of 20TiB/s with only a subset of the storage and client nodes available for the run.

Some interesting facts about this feat:

Number of MPI tasks/processes63k
Number of compute endpoints24k
Number of storage nodes642
Number of DAOS engines1284
DAOS pool size160PiB
Largest file fize8.5PiB
Total number of files177 Billion
Number of files in a single directory33 Billion
Data protection schemes16+1 erasure code
2-way replication

Unlike the IO500 research list, the production list imposes strict requirements on reproducibility and redundancy of the storage system that should have no single point of failure.

The second entry in the production list is the DAOS deployment of the SuperMUC-NG Phase 2 system at LRZ in Germany that demonstrated an incredible efficiency with 59 score points per storage node (vs 50 for Aurora, although this efficiency should go higher when the system will eventually use more compute nodes). No other parallel filesystem from the production list comes anywhere close to this efficiency.

We can’t wait to see how far the Aurora DAOS deployment will go once running at full capacity!