Clusters: how trusted & trust-minimized bridges shape the multi-chain landscape
We live in a multi-chain world. The idea that decentralized apps will all use the same shared smart contract blockchain is dead in practice. Ethereum is pivoting to a roadmap where apps have their own rollup chains, multi-chain ecosystems such as Cosmos and Polkadot are rising in popularity, as well as alternative layer 1 chains such as Solana and Polygon. Decentralized apps today are scattered across different blockchains.
The chains which apps reside in however need a way to exchange data across chains (cross-chain interoperability), in order to allow the apps within those chains to read and write state to each other (composability). Collectively, this is generally known as cross-chain communication.
In this post, we lay out some theory and practice of how we at Celestia think about the overall cross-chain communication design space, based on the idea of clusters of chains. In a nutshell, we envision a model of the blockchain ecosystem where chains that share the same cluster can compose with each other in a trust-minimized way (intra-cluster communication). Chains across clusters compose with each other in a way that requires less secure trust-based assumptions (inter-cluster communication).
Cross-chain communication requires security trade-offs
As shown by Zamyatin et al., secure communication across blockchains without a trusted third party or a synchrony assumption is impossible. Secure in this context means atomic. For example, if Alice moves 5 coins on chain A to chain B, then this involves two transactions: (1) subtracting Alice's balance by 5 coins on chain A, and (2) increasing it by 5 coins on chain B. For cross-chain communication to be atomic, either both or none of those transactions must happen. If only one of those transactions happens (e.g. Alice's balance is subtracted on chain A, but not increased on chain B), it is not atomic.
This means composability across multiple chains will always have security tradeoffs, compared to simply using a single chain, where the atomicity of cross-contract calls is guaranteed without a trusted third party or synchrony by the block validity rules. This is because users can run fully validating nodes and reject invalid blocks in their local view.
We can consider that (informally) there are two key components to enable atomic cross-chain communication:
- Relay liveness: if a transaction happens that changes the state of chain A in a way that affects the state of chain B, then some transaction needs to be eventually submitted on chain B (whether by the user, a relayer, or some other party) to complete the transaction. For example, if Alice locks some coins on chain A to move them to chain B, there should eventually be a corresponding transaction submitted to increase Alice's balance on chain B.
- State verification: when chain A and chain B take actions based on the states of each other, they need to be sure that the information on the state they have received actually corresponds to the agreed and valid state of the chain according to the chain's transaction validity rules. Note that liveness requires state verification. For a more in-depth look at cross-chain state verification, see Zamyatin et al.
The definition of "trusted third party" is broad. A blockchain itself is a trusted third party; it simply distributes the trust amongst a majority of participants in a consensus protocol (honest majority assumption). A DAO can also be a trusted third party. In the case of a bridge between a standard sidechain (or layer 1 chain) and a parent chain for example, the consensus of the sidechain could lock or steal funds deposited into the sidechain via the parent chain. This is because to do state verification, the parent chain does not validate the transactions in the sidechain itself. Instead, it trusts that the operators of the sidechain (i.e. the consensus) only moves deposited funds on the parent chain in accordance with the transaction validity rules of the sidechain. Such a bridge is a trusted bridge because it relies on an honest majority assumption to prevent the bridge operators from stealing funds, examples include the Ethereum-Polygon and Ethereum-Solana bridges.
Rollups however do not require honest majority assumptions for state verification in order to guarantee the atomicity of deposits or withdrawals, because the main chain indirectly checks the transaction validity of the rollup using techniques such as ZK proofs or fraud proofs. Withdrawals from rollups via a parent chain however do require a trusted third party in the form of a very weak honest minority assumption for liveness: at least one relayer or aggregator must post the rollup blocks on the parent chain. In the case of rollups where any party can be a relayer or aggregator and thus the user themselves can fulfil this role, this requires a synchrony assumption (i.e. there is a synchronous network so that when the user sends a message, it will be received by the network within a certain timeframe).1
Clusters
We can therefore categorize cross-chain communications into two main categories:
- Trust-minimized cross-chain communication, which relies on either an honest minority or synchrony assumption for liveness and state verification. Many protocols have an "hybrid" model where by default users rely on an honest minority assumption, but can switch to a synchrony assumption if the honest minority assumption fails.
- Trusted cross-chain communication, which relies on an honest majority assumption for liveness and state verification.
It should be noted that bridges such as Nxtp that require a client-side synchrony assumption are considered to require an honest majority assumption for state verification. This is because even if the chains on either side of the bridge are live, clients cannot act upon the state of the chains if the chains do not have data availability. Verifying data availability is a part of state verification.3
We can define a cluster as a set of chains that communicate with each other (intra-cluster communication) with trust-minimized cross-chain communication including using trust-minimized state verification, such as fraud proofs, validity proofs, or directly validating transactions. A cluster could for example be a set of rollups connected to a parent chain (as is the case with Ethereum rollups), or standalone layer 1 chains such as Polygon or Solana.
A key property of a cluster is that each chain in the cluster can validate the state machine of each other chain in the cluster. For example, all Ethereum rollups are EVM-compatible, so that it is possible to validate the the fraud or ZK proofs of rollups within the EVM. However, it is not practically possible to validate the Solana state machine within the EVM, so Solana cannot share a cluster with Ethereum.
Clusters can also communicate with other clusters (inter-cluster communication) with trusted cross-chain communication using state validation techniques that are not trust-minimized, such as relying on a committee of 2/3 of validators to sign off on blocks. The Ethereum-Polygon bridge is an example of this.
It should be noted that clusters are sovereign, meaning that a chain in cluster A cannot bring a chain in cluster B inside the circle of cluster A without hardforking cluster A or B. For example, it's not possible to create an Ethereum rollup that creates a trust-minimized bridge between that rollup and Polygon, without changing Polygon to be implemented as a rollup (i.e. making it fraud or ZK provable) to bring it inside of the Ethereum cluster, for example. (Similarly, one country cannot impose its law on another country without an international agreement, unless it invades that country or there's a revolution.4)
The trade-off between secure composability and scalability
We established above that communication across multiple chains always requires a security trade-off, compared to communication across smart contracts on a single chain.
However, why make that trade-off at all? Why not simply host all transactions on the same chain, and have easy and secure composability for all? Unfortunately, there are theoretical limits to how scalable a single chain can be, even with today's best known scalability techniques. Expanding to multiple chain is a necessity.
Similarly, there are also limits to how big a single cluster of chains can be, including a set of rollups on a main chain. Even with rollups, it's predicted that Ethereum 2.0 will currently process about 100,000 transactions per second.
Two important core limits that restrict the size of a cluster are:
- The requirement of all chains within a cluster to understand the execution environment of each other. For example, if you have a set of EVM-based optimistic rollups that communicate with each other, they need to be able to understand the EVM in order to understand each others' fraud proofs. Likewise with ZK rollups, they need to understand the ZK proving systems of each other. If you wanted to create a rollup using a new execution environment, you have to create your own cluster or hard fork an existing cluster.
- The data availability capacity of the cluster. In order to maintain trust-minimized state verification between all chains within a cluster, each chain must verify the data availability of the blocks of every other chain within that cluster in a trust-minimized way, either by downloading the data directly or by using techniques such as data availability proofs. Even with theoretically optimal data availability proofs, there is a limit to how big a single cluster can be, due to limits of resources of block producers (i.e. the target resource requirements to run a validator).2
In practice, we can observe that the clustering model is already how the blockchain ecosystem in the real-world operates - a set of layer 1 chains and rollups with intra-cluster and inter-cluster bridges with each other (see diagram above). However, inter-cluster bridges come with serious security trade-offs - you have to trust that a set of validators won't steal your funds. Therefore, the blockchain community should ensure that intra-cluster scalability (e.g. using rollups) is maximized, so that the limits of each cluster is reached, before spinning up new clusters that rely on less secure inter-cluster communication.
Clusters in Celestia
Celestia provides a pluggable consensus and data availability layer for blockchains, including rollups. It's a blockchain where consensus and execution are decoupled as it doesn't provide an on-chain smart contract environment such as Ethereum, only consensus and data availability. The Celestia ecosystem is not a cluster itself as it doesn't enforce any specific cross-chain communication mechanism between Celestia-based chains, but it provides the core ingredient for building clusters.
As mentioned in the previous section, intra-cluster communication requires trust-minimized state verification, which requires checking the data availability of all chains within a cluster. This is because:
- in the case of an optimistic rollup, clients need to check that the rollup blocks have been published in order to be sure that full nodes have the data to generate state transition fraud proofs;
- in the case of a ZK rollup, clients need to check that the rollup blocks have been published in order to be sure that nodes can know the state of the chain (e.g. account balances); and
- in other cases where chains fully validate the transactions of each other directly, you obviously need to know what the transactions are in order to validate them.
Celestia therefore provides the core ingredient for building a cluster of chains: a data availability layer. The clusters themselves sit above this layer, on the execution layer (as shown in the diagram above). In order for the cluster to support intra-cluster communications, all chains in the cluster need to check that the blocks of each other were included in the Celestia data availability chain, and thus can do trust-minimized state verification on each other using one of the three techniques above.
To this end, an important project we're working on is optimint, a drop-in replacement for Tendermint that allows developers to build Cosmos-based chains as rollups that can use other chains such as Celestia as a consensus and data availability layer. In the future, our goal is to make it possible for rollup-based Cosmos zones to form a cluster with each other using the Inter-Blockchain Communication (IBC) protocol.
Conclusion
There has recently been a Cambrian explosion of teams working on cross-chain bridges. As the space evolves, we think it's important that the ecosystem arrives at a shared model and language for cross-chain bridges. In this post, we've identified a natural categorization for cross-chain bridges: trust-minimized bridges that form clusters, and trusted bridges.
Thanks to Juri Stricker, Vasiliy Shapovalov, Nick White, Ismail Khoffi, Alexei Zamyatin, DeFi Frog, epolynya, Dankrad Feist, Patrick McCorry, Layne, Arjun Bhuptani and Hasu for comments on this post.
Footnotes
1. It should be noted that the parent chain requires an honest majority assumption for withdrawals from the rollup to be included on the chain. However, in this model we assume that both chains have persistence and liveness (see definitions 1 and 2 in Zamyatin et al.), so that if a transaction is sent to a chain, it will eventually be included in the chain.
2. This is true even if you have data shards, because the overhead costs of downloading an additional block header per shard means that sharding doesn't scale linearly.
3. The honest majority assumption for state verification in atomic swap protocols such as Nxtp doesn't necessarily matter, if the two counterparties are users that validate the chains themselves. However, this does matter in the case of "lock-and-mint" protocols where assets are withdrawn from one chain and deposited into another, where the counterparties are two chains.
4. See this article about DAOs as Internet-native constitutions.