Announcements

Celestia MVP release: data availability sampling light clients

Ismail Khoffi

Jun 15, 2021 • 4 min read

Today we’re thrilled to announce that we’ve shipped our minimum viable product (MVP)—a data availability sampling light client. This marks the first major milestone in our roadmap to build a universal consensus network allowing anyone to deploy their own custom blockchain in minutes.

TL;DR

Data availability sampling light clients play an important role in the security, scalability and interoperability of the Celestia ecosystem.
Celestia light clients enjoy nearly the same security as full nodes and do not rely on an honest consensus majority for state validity.
Our implementation integrates 2D Reed-Solomon erasure coding with Tendermint consensus and components of IPFS to achieve data availability sampling.
Instructions for how to play around with our light client can be found at the end of the post.

What is a data availability sampling light client?

Unlike most other blockchains, Celestia is designed strictly to provide consensus and data availability, not transaction execution. Likewise Celestia light clients do not verify transactions, they only check that each block has consensus and that the block data is available to the network. This means they do not rely on an honest consensus majority for state validity, a property typically enjoyed only by full nodes.

Due to a clever block encoding scheme, only a small random sample of block data is enough for light clients to verify with high probability that the rest of the block has been published. If any full node detects something suspicious they can notify light clients with a data availability fraud proof.

The idea for increasing light client security dates back to the original Bitcoin whitepaper. In the paper Satoshi mentions that light clients could be made more secure if full nodes sent them “alerts” when an invalid block was published. After receiving the alert, light clients would download the full block to verify the inconsistency for themselves.

Despite its early origins, this idea remained largely unexplored until around 2018 when new research proposed both a theoretical solution for maximizing light client security—even in the face of dishonest majorities—and the first practical solution to the data availability problem.

Our data availability sampling light client is the first implementation of the above research baked into the battle-tested Tendermint consensus engine.

But wait, there’s more

Beyond their security advantages, Celestia light clients play a fundamental role in the security and scalability of the network as a whole. Celestia light clients rely on security in numbers. There must be a minimum number of light clients to ensure that the original block data is recoverable from all the samples they take individually.

On the other hand as the number of light clients increases, then the size of each block can also increase without compromising the security or decentralization of the network. Larger blocks means more data throughput and more scaling.

Data sampling light clients are a key component for all rollup-based sidechains built on top of Celestia, because rollups rely on data availability for their security. Optimistic rollups require data availability so that fraud can be detected and zero-knowledge rollups require data availability so that users can know the state of the chain.

Last but not least, light clients are key components for blockchain interoperability standards like IBC. The improved security of Celestia light clients means that chains built on Celestia have much stronger security guarantees for interoperability.

Key technical features

This MVP combines several core components that our engineering team has been working on for the past few months.

We implemented a Namespaced Merkle tree (NMT) library. This is a binary Merkle tree sorted by namespaces which enables any rollup on Celestia to only download data relevant to their chain and ignore the data for other rollups. Nodes in the tree get tagged by the minimum and maximum namespace of their children. We replaced Tendermint’s regular Merkle tree with our NMT. You could think of it as a multi-tenant Tendermint where each application only needs to care about their portion of the tree.

We’ve also implemented a special encoding scheme called 2-Dimensional Reed-Solomon Merkle Tree (rsmt2d). We use this scheme to encode the block data into a square which gets erasure coded into a larger square with parity data. We then integrated this encoding mechanism together with the NMT to compute row and column Merkle roots from the erasure coded block—which we refer to as the extended block or extended square. We modified the Tendermint block header to commit to these row and column roots.

The goal of these changes is to make it statistically very unlikely for block producers to hide or withhold data. Either all of the block data is available, or data availability sampling fails.

To make sampling possible over a peer-to-peer network, we made block producers commit to the data in a way that is easy to sample from. Specifically, we wrote an IPLD plugin and modified IPFS with the goal to create an optimized network from which the light clients can sample data from. IPFS and particularly IPLD seem like a natural fit for this as all data is content addressable via a (Namespaced) Merkle tree.

We combined all this into a library which can be used by light clients and other node types to validate if a block is available. To get an overview on how this compares to the vanilla Tendermint light client without diving into the code, you should have a look at this ADR.

Taking it for a spin

Now let’s dive into some code. Follow this demo to try out our light client for yourself.

What’s next?

While the MVP is an important milestone, there’s still a lot of work in progress in order to make content discovery on the IPFS peer-to-peer layer performant in a large network. See this GitHub issue for more details. The above demo is based on directly connected nodes.

Our next immediate milestone is to fully implement other node types. Check out this GitHub issue for more details. After that we will wrap up our next larger milestone, which is Devnet. Our Devnet will enable anyone to run a Celestia full node locally, and generate and verify data availability fraud proofs.

If you’re interested to follow Celestia’s development or get involved in our community please join our Telegram channel, drop by our Discord server, star our GitHub repositories, and follow our Twitter feed.