Initial results from Mammoth Mini: 27MB/s of permissionless data throughput

Evan Forbes, John Adler

Oct 22, 2024 • 4 min read

The Celestia core developer community recently unveiled a roadmap to massively scale data throughput towards our next major destination: 1GB blocks.

Today Celestia Labs is unveiling results from the proof-of-concept Mammoth Mini testnet that implements 88 MB blocks with an average of 27 MB/s of permissionless data throughput—a giant leap towards 1 GB blocks and beyond.

Mammoth Mini meets an emerging need for extreme data throughput, from order-of-magnitude improvements to sequencer performance, from alt-VM L2s that rival L1 counterparts, or from developers experimenting with fully-onchain worlds and verifiable web apps. In the context of payments, Mammoth Mini’s first iteration is enough for hundreds of thousands of ERC-20 transfers per second (assuming compression)—akin to many Visa payment networks running in parallel.

This is a proof-of-concept network designed to showcase state-of-the-art performance improvements to Celestia that core devs will propose for public testnets and then Mainnet in 2025.

The Mammoth Mini Testnet

As of October 2024, Celestia currently has a maximum throughput of 2 MB blocks every 12 seconds, or 0.167 MB/s. With 1 GB blocks every 12 seconds, data throughput would be ~83 MB/s. Mammoth Mini’s goal was to see how much data throughput can be achieved via initial implementations of improvements outlined in the community’s protocol roadmap.

The first Mammoth Mini testnet was initially prototyped over the course of a 3 week sprint. The code can be found in this branch: https://github.com/celestiaorg/celestia-core/tree/evan/pipeline-cat. It implements early versions of several key components of the community’s roadmap to 1GB blocks, including: compact blocks, a new high-throughput blob propagation protocol called Vacuum!, and simulations of FBSS and readily achievable optimizations to Celestia’s state machine.

Combining these improvements, Mammoth Mini’s first iteration achieves 88 MB blocks - with an average data throughput of 27 MB/s with 3s block times—an increase of more than 160x compared to Celestia’s throughput at launch!

Compact Blocks

The testnet implements the concept of compact blocks from BIP-152.

In standard block relaying, the entire block, with all the transaction data, is broadcast from one validator to another. This can be quite slow as blocks get larger—which is exactly the case for Celestia with large blocks! However, if most nodes already have the transactions of the block in their mempool, they don't have to re-download that same data again; it's sufficient to download only an identifier of those transactions. This is the core intuition behind compact blocks.

Not only does this reduce bandwidth requirements substantially—an important efficiency improvement since Celestia uses a decentralized p2p network—it also allows blocks to be effectively pre-downloaded outside of consensus, making actual block propagation blazingly fast!

Vacuum!

Compact blocks in a naive design only works well if most of the nodes have highly-synchronized mempools. If not, compact blocks need to fall back to traditional (slow) block propagation during consensus.

Vacuum! solves for this by propagating Validator Availability Certificates (VACs), signed commitments to transactions in the mempools of validators. This allows validators to coordinate pre-consensus to ensure all their mempools are highly synchronized with the highest-priority transactions.

Moreover, by leveraging knowledge of which validator nodes specifically have certain transactions, nodes can download unique data from different peers, dramatically increasing synchronization rate.

A draft specification for Vacuum! is published here.

Simulating FBSS

In the current Celestia protocol, construction of blocks and squares (encoded blocks that can be sampled over) is coupled: each block is also a square. For sampling efficiency it's important that block times be longer and squares larger.

However, there's no reason they need to be coupled, and indeed they don't! Decoupling block from square construction is what the community refers to as Fast Blocks Slow Squares (FBSS), i.e. smaller blocks that can be produced very quickly, without incurring the encoding and sampling overhead of smaller faster squares. With FBSS, we can expect to see sub-second block times with single-slot finality in the not too distant future, with a lower sampling overhead for light nodes.

While not fully implemented in the Mammoth Mini testnet, FBSS is easy to simulate. We simulate FBSS by removing square construction (splitting up transactions and blobs into shares, laying out the shares into a square, erasure coding the rows and columns, then finally computing Merkle trees of the square) from the consensus path and replacing it with the more traditional simple Merkle tree of transactions.

Simulating State Machine Optimizations

To date, the Celestia state machine hasn’t been close to the bottleneck in scaling data throughput. Optimizing it would have only provided minor benefits. As a result, Celestia’s state machine is highly unoptimized, leaving lots of low-hanging fruit, such as effectively executing the same transactions three times per block. The latest version of the Cosmos SDK allows for only executing them once.

With Mammoth Mini testnet, execution of the Celestia state machine actually became a significant portion of the runtime to fully verifying a block. Various hacks were applied to simulate the reduction in execution cost of optimizing these low-hanging issues, such as removing redundant fee payment calculations, or removing redundant unnecessary hashing.

What's Next?

Many of the improvements (e.g. FBSS) behind Mammoth Mini are already out of the research phase and well into the design phase. Others (such as Vacuum! and a reworked QUIC-based p2p stack) are in the prototype and MVP stage and are undergoing rapid iterative improvements. The core developer community hopes to propose these as improvements to Mainnet Beta in 2025 as they become ready to deploy.

If you're a protocol developer or researcher and want to get involved, please reach out to us on Twitter, post on the Celestia Forum, or submit a PR on GitHub!