Q2 Engineering Update

Q2 Engineering Update

It's been a whirlwind the past few months, first and foremost with the successful launch of the Mamaki testnet. The data availability nodes on Mamaki have been working as intended while pending upgrades, such as bad encoding fraud proofs, were fully rolled out.

We conducted various other internal tests for projects we’re developing, such as sovereign rollups using Optimint and the Quantum Gravity Bridge.

In this update, we’ll explore some highlights of our engineering progress over the previous couple of months.

Mamaki testnet

With the Mamaki testnet deployment underway since last month, there were many successes and challenges that presented themselves. Most of the new functionality necessary for developers worked as intended. Developers could successfully connect to the data availability API and submit PayForData messages to the testnet. In addition, they could also retrieve data for a given namespace, all of which serve as the foundation of a scalable data availability layer for rollups.

The testnet was also met with many community members that spun up their own nodes, filling the 150 active validator limit quickly, not to mention all the light nodes and full nodes that are also participating in the network. We thank everyone that has been testing the software and providing feedback.

Although, the testnet wasn’t without some turbulence. The network experienced some instability issues, primarily caused by the new p2p stack in the latest Tendermint release which we upgraded to for the testnet.

Peering was one such problem that amplified the instability. As it turned out, the max peer connections value was hardcoded to 1000. Nodes could alter the peer settings in their config files, but that didn’t stop their peer set from exceeding the setting because it defaulted to the hardcoded value. Essentially, nodes were getting flooded with peers.

We tried various fixes that included switching to the legacy p2p stack, which unfortunately caused even more issues. In addition, the legacy and latest p2p stack had incompatibility issues which worsened the problem. It is safe to say that our core/app team spent the better part of the last month debugging and solving testnet issues.

Thankfully the Tendermint team was able to help with debugging and fix some of the stability issues. Notably, we weren’t the only Cosmos chain that experienced p2p issues after upgrading to the latest release.

Moving forward, we plan to use a more stable version of Tendermint that still contains the newly introduced prioritized mempool, whether that requires an upgrade or a downgrade. The prioritized mempool is an important feature as rollups require it to ensure they can successfully submit their blocks to Celestia by paying more gas for priority.

Quantum Gravity Bridge

During our onsite in Spain three months ago, we were able to get a first test of the Quantum Gravity Bridge successfully running on internal devnets. The bridge acts similar to a light client relay. The Celestia validator set signs an attestation of data availability, which gets relayed to the bridge contract with the corresponding data commitments and the state of the validator set.

The purpose of the bridge is to provide secure and scalable data availability to a type of rollup hybrid we call a Celestium.

While conducting our integration tests, we discovered that it was too easy for the relay to partially omit the necessary data for the update. Without the data, a Celestium doesn’t know if the data they publish to Celestia is available.

To fix this, the bridge is transitioning to a more synchronous design, which allows the contracts to enforce that all the required data is updated in order. These efforts will lead to a more secure bridge for Celestiums, culminating in a deployment on the Mamaki testnet sometime over the next couple of months.

Optimint

Optimint is targeted for use in building sovereign rollups on Celestia by replacing Tendermint when using the Cosmos SDK. Unlike current rollups, a sovereign rollup will distribute its fraud or validity proofs through the p2p network for nodes to verify locally.

We’ve conducted internal testing with sovereign rollups using Optimint. Through testing we were able to make changes that improve reliability, such as successful block submission despite a sequencer crash, and improved timeout and error handling. Improving censorship resistance is important as the Optimint sequencer will be centralized initially - efforts are ongoing for leader election and decentralization of the sequencer.

celestia-node

Since testnet launch, most of the focus on the capabilities of the data availability nodes has revolved around hardening through testing and improvements. This involves initial local tests of the nodes under varying network topologies. For example, a full storage node's ability to reconstruct a block is tested in a topology where the full storage node is not connected to enough light nodes that have enough data to reconstruct the block, but it’s connected to other full storage nodes such that collectively they have enough light node connections to reconstruct the block.

One of the next steps to increase the efficiency and reliability of block reconstruction is through improving the discoverability of full storage nodes in the p2p network. For example, a full storage node will advertise itself to the network such that light nodes can receive the advertisement and connect to them. Increasing discoverability will also provide a more favorable network topology because light nodes will aim to connect to more full storage nodes, improving sampling and reconstruction reliability.

With the launch of Mamaki testnet, bad encoding fraud proofs were pending a release. The fraud proofs are necessary to alert light nodes when a block has been incorrectly erasure coded. Since then, the fraud proofs have been fully integrated and completed. Now, the next step is to conduct large-scale network level testing.

One advantage of the testnet and its subsequent usage by community members has been the enhanced discoverability of issues and bugs. Overall, there’s been a better feedback loop with the users, which allows us to gain further understanding of potential UX improvements. Continued work is happening on the node RPC endpoints, which help users and other dependent services, such as sovereign rollups that use Optimint.

Closing

Ultimately, it's been a great couple of months filled with lots of progress. We have some important additions planned for the testnet - keep your eyes peeled.

We are actively recruiting for positions in our engineering team! View the full list of open positions.