Celestia: A Scalable General-Purpose Data Availability Layer for Decentralized Apps and Trust-minimized Sidechains

One of the highlights of this year’s Devcon 5 was optimistic rollups, a new scaling solution that uses the Ethereum base chain for data availability and enforcement of a two-way bridge of assets, with execution happening off-chain on layer two. In addition to allowing for increased throughput and decreased costs for general-purpose smart contracts, it is the first way to scale layer one sustainably, by avoiding state bloat.

Fuel Labs recently announced that their implementation of optimistic rollups is nearing completion, pending security audits. Their sidechain will enable an increase in transaction throughput on the order of 50-fold. However even with this impressive increase, it is bottlenecked by the data availability capacity of its base chain, Ethereum. If built on top of a chain specifically engineered from the ground up for data availability, systems like Fuel could easily reach tens or even hundreds of thousands of transactions per second.

To that end, we’re building the first ever scale-out data availability-focused blockchain: Celestia. At its center is the core mathematical primitive that makes sharding secure: data availability proofs using erasure codes. Using this primitive directly, rather than through sharding, allows the Celestia data availability layer to have the scaling of sharded blockchains for block verification.

Our long-term vision is to help build a blockchain ecosystem with modular data availability layers and execution engines that can be integrated together. We believe that is the next generation of scalable blockchain architectures.

Although developers will be able to build raw applications on Celestia directly, in the future one may be able to for example create Fuel sidechains using Celestia as a data availability layer. Or Cosmos zones and Tendermint chains may use Celestia as a data availability layer to enable those zones to become trust-minimized by using fraud proofs, potentially giving the Cosmos ecosystem a more uniform level of security, with reduced reliance on social governance to deal with bad zones.

In addition to increased transaction throughput, another benefit of this architecture is reduced costs for applications that need a high amount of on-chain data. For example, consider a private voting contract where the public keys of thousands of participants need to be posted on-chain.

Left to right: Ismail Khoffi, John Adler, Mustafa Al-Bassam.

First to bat on the Celestia team is Mustafa Al-Bassam, who previously co-founded Chainspace, a sharded smart contract platform that was acquired by Facebook. He has written a number of seminal papers whose contributions underpin the security of sharded blockchain systems, notably a formal fraud and data availability proofs scheme.

Also on the team is John Adler, a layer two scalability researcher at ConsenSys working on Phase 2 of Ethereum 2.0. He created the first specification for an optimistic rollup scheme, drawing inspiration from Mustafa’s earlier works on data availability.

Joining them is Ismail Khoffi, a senior research engineer who has many years of experience ranging from building academic research prototypes to bringing both blockchain and non-blockchain systems to production, including at Tendermint, Google UK, and EPFL.

The Celestia Design

The core idea of Celestia is to decouple transaction execution (and validity) from the consensus layer, so that the consensus is only responsible for a) ordering transactions and b) guaranteeing their data availability. This is the bare minimum that the consensus layer of a blockchain needs to do in order to enable useful applications, such as a cryptocurrency. (In the case of proof-of-stake protocols, a minimal consensus-critical execution layer is necessary however, in order to determine the validator set, although this may also be implemented as an optimistic rollup.)

Overview of Celestia block validity rules.

For example, one can imagine a version of Bitcoin where invalid transactions are allowed to be posted on-chain, but are simply discarded by clients when reading the blockchain to determine its state. In this model, the blockchain is simply used as an ordered messaging protocol, rather than a state machine replication protocol. Designs such as optimistic rollups and zk rollups take this idea further by using fraud or validity proofs to enable clients to ensure the validity of posted transactions or state transitions, without directly executing each transaction themselves.

This is similar to reducing consensus to atomic broadcast, which was first shown to be possible in 1996. This is a departure from the state machine replication paradigm for consensus that has been popular in distributed systems research over the past several decades, which is also followed by Satoshi Nakamoto in the Bitcoin whitepaper.

In Celestia, a block that has consensus is considered valid only if the data behind that block is available. This is to prevent block producers from releasing block headers without releasing the data behind them, which would prevent clients from reading the transactions necessary to compute the state of their applications.

Celestia reduces the problem of block verification to data availability verification, which we know how to do efficiently with sub-linear cost using data availability proofs. These proofs utilise a primitive called erasure codes, which are used in consumer technologies ranging from DVDs to QR codes to satellite communication.

The proofs require each client to sample a very small number of random chunks from each block in the chain, and only work if there are enough clients in the network such that they are collectively sampling the entire blockchain. This is similar to peer-to-peer file-sharing networks such as BitTorrent, where different peers may have different pieces of a file.

This leads to an interesting consequence: the more clients you have in the network, the greater the block size (and thus throughput) you can have securely. Note that unlike existing scale-out designs such as sharding, in Celestia the data throughput of the main chain increases with non-consensus nodes. This is a unique and exciting property, as it means nodes that are not producing blocks can contribute to the throughput and security of the network.

Current layer one scalability designs such as sharding primarily focus on scaling block production, rather than block verification. The former is useless without the latter, and we believe the latter is much more important as incentivised block producers typically have significantly more resources than ordinary nodes that simply want to verify the chain.

More Information

For more reading about Celestia, see the following resources:

We’re incredibly excited about this vision. If you are too, you can keep up-to-date with developments about Celestia by subscribing to the newsletter on our website, joining our Telegram group, or following our Twitter feed.

Thanks to Zaki Manian for feedback on this post.