Comparison of Data Availability Solutions

0xemre
17 min readOct 28, 2024

--

Introduction

Data availability on a blockchain means that all participants in the network can easily access and verify block data. The ability for everyone to access and verify block data directly impacts the security, decentralization, and integrity of the blockchain. But why is it important for block data to be verifiable by everyone?

In blockchains, full nodes download all past and future block data, verify its validity, and synchronize with other full nodes to contribute to the network’s integrity and decentralization. If not everyone can easily run a full node and access and verify data, the decentralization of that blockchain would be compromised. This is directly related to the requirements for running a full node. The fewer the requirements for running a full node in a network, the easier it is to participate and verify the availability of data. If running a full node requires high resources, we might end up relying on third parties when we need the data. But how can we trust the accuracy of the data provided? How can we know if the data is correct? These questions underscore the importance of the “don’t trust, verify” approach.

Another important concept is block space. In traditional blockchains, the maximum amount of data that can fit into a block is limited. For example, in Bitcoin, blocks are capped at 1 MB (4 MB for SegWit blocks). This limitation restricts the number of transactions that can fit into a block. When transaction volumes are high, competition for block space increases, and the block space might not meet the demand. As a result, the price for block space rises, leading to higher transaction fees and a poorer user experience. Data availability solutions aim to solve this issue by providing block space at a lower cost.

After Ethereum shifted to a rollup-centric roadmap, rollup solutions began exploring alternative data availability layers. Before Ethereum’s EIP-4844 upgrade, the majority of costs for rollups came from CALL DATA operations. Due to the expensive block space on Ethereum, rollup projects were constantly paying high fees. This situation accelerated the development of alternative data availability layers, such as Celestia, Avail, and EigenDA. Ethereum has attempted to address this issue by introducing data spaces called blobs which are specific to rollups.

Data Availability Solutions

Data availability solutions can be categorized into two main types: on-chain and off-chain. In some cases, such as gaming-focused solutions, it might be more advantageous to store data off-chain, while in other situations, keeping data on-chain could be crucial.

Storing data off-chain offers benefits such as reduced costs, better scalability, and faster performance, but it comes with trade-offs in decentralization and security. Off-chain data solutions may introduce risks by relying on centralized entities or third-party systems, which can compromise trust and security.

On the other hand, on-chain solutions ensure higher levels of decentralization, transparency, and security but typically lack in scalability and troughput. Moreover, on-chain solutions tend to be more expensive due to the limited block space and higher demand for on-chain storage.

Projects must carefully assess their requirements and choose the most suitable solution. Balancing between the cost, scalability, security, and decentralization is key to selecting the right data availability strategy for their needs.

Data Availability Committees (DACs)

The Data Availability Committee (DAC) consists of a set of actors whose number varies depending on the project’s approach. Committee members store the data off-chain, in local or cloud-based storage solutions, ensuring that the data remains available and accessible at all times. DACs comprise entities with real-world reputations, therefore, committee members can be held accountable by the community for their actions.

Since the data is not stored directly on-chain, the costs are much lower compared to on-chain solutions. They offer high scalability and performance. However, due to the reliance on the honest majority assumption and the absence of staked funds or slashing mechanisms in the case of a data availability attack, their security is lower. They are ideal for applications that require low security, such as gaming, metaverse, NFTs, media storage, and other similar use cases. Variants of optimistic and ZK rollups, such as validium and optimium, also use data availability committees when their data is stored off the L1. Currently, there are 15 optimium and 14 validium solutions using DACs in the market.

Data Availability Layers

Ethereum

As previously mentioned, before EIP-4844, the majority of the costs for Ethereum rollups came from CALL DATA operations. The limitation of Ethereum’s block space to 30 million gas, which translates to a maximum of around 1.8 MB per block, was increasing the data availability costs and impacting scalability for rollups. On March 13, 2024, Ethereum aimed to solve this issue by introducing the blob-carrying transaction type with EIP-4844 aka proto-danksharding.

Blobs, which are data fields specifically designed for rollups, indirectly expanded Ethereum blocks. Since each blob carries 128 KB of data, and each block can contain up to 16 blobs, blocks gained an additional approximate 2 MB of data space. Blobs are carried within the consensus layer instead of the execution layer and are not directly readable by the EVM. This also means that blobs are not stored directly within the transactions. Instead, they are represented by a versioned hash of the blob’s KZG commitment hash. This design choice significantly enhances gas efficiency by segregating the blob data from the main execution layer, reducing the high gas costs associated with permanent data storage. One of the reasons for using KZG commitments in blobs is to enable Data Availability Sampling (DAS) in the future, particularly with full danksharding.

You might think adding 2 MB to blocks may put extra strain on nodes regarding disk space, but to prevent this, Ethereum periodically deletes blob data (every 2–3 weeks). It is not mandatory to store all historical data to participate in consensus. The motivation here is that the purpose of Ethereum’s consensus protocol is not to store all historical data forever. However, as long as there is at least one honest actor who keeps all historical data, the data remains secure.

Lifecycle of a Blob

  1. Creation: Blobs are created during the transaction process and attached to blocks.
  2. Storage: Blobs are temporarily stored in beacon chain nodes, ensuring consensus nodes fully download them.
  3. Usage: Layer 2 rollups utilize these blobs to store and retrieve data, enabling efficient off-chain processing.
  4. Pruning: After a certain period, blobs are pruned to maintain network efficiency and reduce long-term storage requirements.

Impact of Blobs

After the DenCun upgrade on March 13, 2024, the average data compression cost dropped from the $300–700 range to the $3–7 range. The upgrade resulted in up to a 100x improvement in costs.

Average gas fees also fell from levels reaching as high as $0.3 to as low as $0.01. While the average decrease was around 30x, some projects saw reductions of up to 100x.

Celestia

Celestia is a modular data availability (DA) layer designed to address the data availability and consensus needs of blockchains.

Celestia is a PoS blockchain powered by the Cosmos SDK. It uses the Tendermint consensus mechanism, familiar from Cosmos. Tendermint relies on the assumption that ⅔ of the validators are honest and provides single-slot finality. Celestia’s block space is expandable, the average block time is 15 seconds, and the time to finality is 12 seconds. In Celestia, light nodes store data for 30 days by default. After 30 days, the data is pruned. However, archive nodes do not prune the data after 30 days and continue to store data from genesis onward. Archive nodes have no economic incentive to take this action. Instead, block explorers, indexers, and projects have their own motivations to store all the data.

Two key features stand out in Celestia’s data availability approach: Data Availability Sampling (DAS) and Namespaced Merkle Trees (NMTs). Data availability sampling is a mechanism for light nodes to verify data availability without having to download all data for a block. To enable DAS, Celestia uses a 2-dimensional Reed-Solomon encoding scheme. Each block’s data is split into 𝑘 × 𝑘 shares, arranged in a 𝑘 × 𝑘 matrix, and extended with parity data into a 2𝑘×2𝑘 extended matrix by applying Reed-Solomon encoding multiple times. Light nodes in Celestia randomly sample the 2𝑘×2𝑘 data shares. If they receive valid responses for each sampling query, there’s a high probability that the entire block’s data is available. To mitigate potential issues with incorrectly extended data, Celestia implements fraud proofs. These allow light nodes to reject blocks with invalid extended data by reconstructing the encoding and verifying any discrepancies.

NMTs, on the other hand, provide a namespace for the block data of rollups and applications that use Celestia’s data availability service. This ensures that rollups and applications download only the data relevant to them, improving efficiency.

Avail

Avail is a data availability layer built to meet the needs of rollups, trust-minimized applications, and sovereign rollups.

Avail is an NPoS blockchain powered by the Polkadot SDK. It relies on a hybrid consensus model based on Polkadot’s BABE and GRANDPA. BABE produces blocks with probabilistic finality, while GRANDPA is a finality gadget that achieves finality through consecutive rounds of voting by the validator nodes. GRANDPA relies on the assumption that ⅔ of the validators are honest. Avail’s block space is expandable, the average block time is 20 seconds, and the time to finality is also 20 seconds. Currently, light nodes in Avail store all historical data, but a new proposal is under discussion that suggests periodically deleting old data. Even if Avail decides to periodically delete data, rollup projects, data researchers, and other actors have their own motivations to run archive nodes and retain all historical data.

Avail uses an application ID (AppID) approach similar to Celestia’s Namespaced Merkle Trees (NMTs). This approach allows applications and rollups to fetch only the data relevant to them, marked with AppIDs, reducing the need for fetching extra data. This enables block sizes to be increased without causing applications and rollups to pull unnecessary data.

Avail uses erasure coding to add redundancy to the data, enhancing its reliability and integrity. Blocks are divided into n original chunks and extended to 2n, allowing for reconstruction from any n out of 2n chunks. Avail’s DA layer applies KZG polynomial commitments to each block, which act as cryptographic proofs of the data’s integrity, ensuring that the stored data is accurate and tamper-proof. Over ⅔ of the validators must be honest to reach consensus, ensuring robust security for the erasure-coded data. Light clients within Avail’s DA ecosystem use Data Availability Sampling (DAS) to verify block data integrity. They check KZG polynomial openings against the commitments in the block header for each sampled cell, allowing them to independently and instantly verify data availability.

EigenDA

EigenDA, unlike Celestia and Avail, is not a blockchain. It consists of a set of smart contracts on Ethereum and is secured by the crypto-economic guarantees of EigenLayer’s restaking mechanism. The idea is to create a network of operators by restaking ETH locked on Ethereum, sharing part of Ethereum’s security through inherited crypto-economic guarantees, and offering services for various use cases through the ecosystems built on top. The data availability layer, EigenDA, is one of these services.

EigenDA consists of three main actors: Operator, Disperser, and Retriever. Operators are third parties running the EigenDA node software with delegated stake. The Disperser is a service provided by EigenLabs and is responsible for interfacing between EigenDA clients, operators, and contracts. The Retriever is a service that queries EigenDA operators for blob chunks, verifies their correctness, and reconstructs the original blob for users. EigenDA clients, such as rollups, send transactions as blobs to the disperser service. The disperser divides the blobs into chunks, applies erasure coding, generates KZG commitments and proofs for each chunk, and distributes the data to the operators. Operators store the data and send signatures back to the disperser to confirm the storage. The disperser aggregates the signatures and sends them to the Ethereum-based EigenDA manager smart contract, which records them on-chain. EigenDA manager, with the help of the EigenDA registry contract, verifies the signatures and stores the results on-chain. This way, the blobs are stored off-chain and registered on-chain. EigenDA clients can use the blob ID with the rollup inbox smart contract. The inbox contract can check if a storage certificate exists via the EigenDA manager contract. If the certificate exists, the blob ID is approved; otherwise, it is not, thus ensuring data availability.

EigenDA operators act like a data availability committee (DAC) in terms of their function but differ from traditional DACs due to broader participation and the implementation of reward and slashing mechanisms. In traditional DACs, the number of participants and thresholds may be unknown, for instance, Arbitrum Nova has six participants, and five signatures are enough to confirm data availability. As of October 20, 2024, EigenDA has 221 operators who are rewarded in ETH on a weekly basis for their services. Since operators risk slashing if they behave maliciously, the system is secured through crypto-economic guarantees.

There is no consensus mechanism in EigenDA, and there are no blocks. Instead, it relies on Ethereum’s limits. For example, transactions are finalized within Ethereum’s finality time, which takes 64–95 slots, equivalent to 12–15 minutes. Unlike Celestia and Avail, Data Availability Sampling (DAS) is not implemented. Currently, EigenDA supports up to 15 MB of data per second, with the potential to scale this up to 1 GB per second in the future.

Near DA

NEAR blockchain provides data availability services for rollups and other modular blockchains.

NEAR DA leverages the consensus mechanism called Nightshade for data availability and utilizes the sharding design implemented at the protocol level. NEAR’s sharding design divides the blockchain into four parallel shards, which operate like mini blockchains. Each shard produces a small portion of a block, known as a chunk. These chunks are combined to form complete blocks. To ensure the accessibility of each data piece, erasure coding is applied to the chunks, which are then split into smaller parts. These smaller parts are distributed to block producers as OnePart messages. When a block producer receives a main chain block, it first checks if it has all the OnePart messages for each chunk in the block. Once all the OnePart messages are received, the block producer fetches the remaining parts from its peers and reconstructs the chunks for which it holds the state. In this way, data availability is ensured.

In NEAR DA, each shard can process 4 MB of data per second, and with 4 shards, NEAR DA can process 16 MB of data per second. Like Celestia, Avail, and Ethereum, NEAR DA does not guarantee that data will be available indefinitely. Instead, it guarantees that data will be retained for at least 3 NEAR epochs, which is approximately 36 hours. In practice, data is retained for 5 NEAR epochs, which is around 2.5 days. Archive nodes, however, can store all historical data, and they may have their own motivations for doing so.

The Fisherman’s Problem

Data availability layers are relatively new technologies and may be vulnerable to certain attack vectors. One of the unresolved fundamental issues in this context is the Fisherman problem.

In the Fisherman problem, consider a scenario where an actor, in this case a fisherman, requests data from a node. Upon receiving the data, the actor realizes that some of the data is missing and alerts other nodes in the network. The node storing the data immediately publishes the missing data, and when the other nodes check, they find that the data is complete. This creates a dilemma: Was the node deliberately withholding the data, or was the honest actor raising a false alarm?

The Fisherman problem can also occur in different scenarios:

  • The honest actor receives a reward for detecting missing data, even if the node withholding the data immediately publishes it: This could lead to collaboration between the honest actor and the node to generate false alarms and earn rewards.
  • The honest actor does not receive a reward for detecting blocks with missing data: This creates cost-free denial-of-service (DoS) attacks, as other nodes must download all the data, eliminating the benefits of light clients or sharding.
  • The honest actor incurs a cost or negative reward for detecting blocks with missing data: The honest actor might prefer not to detect blocks with missing data, giving nodes more freedom to withhold data.

Comparison of DA Layers

Consensus Mechanisms

The consensus mechanism is crucial in blockchain technology as it allows all participants to agree on the state of the blockchain and ensures that transactions reach finality. It establishes a protocol for validating and confirming transactions, which prevents double-spending and maintains the integrity of the distributed ledger.

Ethereum’s consensus design consists of a hybrid system made up of LMD GHOST and Casper FFG. LMD GHOST operates as the block production engine in Ethereum and provides probabilistic finality. Casper FFG, on the other hand, is a finality gadget that ensures finality guarantees.

Celestia utilizes Tendermint, a significant component of the Cosmos SDK, for consensus. Tendermint aims to achieve single-slot finality, ensuring that blocks produced in Celestia within 12 seconds are finalized immediately as they are created.

Avail adopts a hybrid approach similar to that of Ethereum, utilizing the BABE and GRANDPA consensus mechanisms from the Polkadot SDK. BABE functions as the block production engine and coordinates with validators to determine new block producers.

EigenDA does not operate with a traditional consensus mechanism. Instead, EigenDA operators behave like a Data Availability Committee (DAC) and sign certificates confirming the availability of data. However, there is no way for end users and applications to independently verify this externally.

Near DA employs the NightShade consensus mechanism, which aims for blocks to be finalized within 1–2 seconds after they are produced.

Decentralization

Ethereum is considered the gold standard in the industry for decentralization. Its consensus design allows for a high number of validators, enabling it to support over 1 million validators. This makes it the most decentralized among the existing data availability layers.

Celestia’s consensus mechanism permits around 100–200 validators. Currently, there are 100 validators in Celestia, and this limited maximum validator count presents certain constraints regarding decentralization.

Avail’s consensus mechanism can accommodate approximately 1,000 validators. As of now, Avail has 128 validators, which, while not exceptionally high, is potentially better than Celestia in terms of decentralization.

EigenDA does not have a consensus mechanism. The number of operators stands at 220, and this high count is beneficial as it increases the crypto-economic security of the system.

In Near DA, there are currently 233 validators. Although the exact maximum number of validators allowed by the consensus is not definitively known, it is estimated to be around 1,000. Given its potential and current validator count, Near is in a favorable position.

Proof Type

Proofs are an effective way to verify the availability of data. In existing systems, both validity proofs and fraud proofs are actively utilized.

Ethereum, Avail, and EigenDA utilize a type of validity proof known as KZG commitments. These proofs are detailed in previous sections. KZG commitments provide high efficiency and speed when working with large datasets. They generate commitments based on the values at certain points of a polynomial representation of the data.

Celestia is the only data availability layer that employs fraud proofs in terms of evidence. The fraud proof approach is optimistic, assuming that data is present and accessible unless proven otherwise.

Near DA adopts a sharding scaling approach and currently does not have an active proof type in its design. However, it has been indicated that KZG commitments will be implemented in the future.

Data Availability Sampling

Data availability sampling (DAS) allows validators to verify the availability of data without needing to download the entire dataset. This means that light nodes can check the availability of all data by only downloading a small portion. This approach ensures data availability guarantees without increasing node requirements, allowing for data verification even on mobile devices or through a web browser.

In Celestia and Avail, data availability sampling (DAS) is included by default. In both systems, light nodes can perform DAS, which allows end users to independently verify data availability without downloading the entire block.

On the other hand, Ethereum, EigenDA, and Near DA do not currently have DAS implemented. However, Ethereum plans to activate DAS with full danksharding in the future. Similarly, there have been discussions suggesting that EigenDA might support DAS in the future as well.

Block Time & Block Space & Time to Finality

Ethereum blocks are produced every 12 seconds, and due to its monolithic design, block space is limited. Although the block space can be expanded through social consensus decisions, this is not preferred as it can impose different loads on the network. The finalization time for blocks is approximately between 64 to 95 slots, which translates to around 12 to 15 minutes.

In contrast, Celestia produces blocks every 12 seconds and achieves finality as they are created due to its consensus design. Its modular architecture allows for expandable block space, with current blocks providing around 2 MB of data space. It is believed that block space could potentially reach up to 100 MB per block.

Avail’s blocks are produced and finalized in 20 seconds. Similar to Celestia, its modular design allows for the expansion of blocks. Currently, the blocks provide 2 MB of data space, with the capacity potentially increasing to 128 MB per block.

EigenDA does not have traditional blocks. It provides a data throughput of 15 MB per second, with plans to increase this to up to 1 GB per second. Finality for EigenDA is subject to Ethereum’s finality conditions, taking approximately 12 to 15 minutes.

Near DA produces blocks in 1.1 seconds, finalizing them within 1 to 2 seconds after production. Each shard in Near DA processes 4 MB of data per second, with a total of 4 shards, resulting in a throughput of 16 MB per second. It is assumed that the parameters of the shard design can be adjusted to change this figure. Due to its monolithic design, block space cannot be expanded easily, although it could still be considered for expansion based on the implications of such changes. The finality for blocks occurs within an average range of 1 to 3 seconds.

Conclusion

In conclusion, the analysis of various blockchain platforms highlights significant differences in their consensus mechanisms, data availability solutions, and overall architecture. Ethereum stands as a benchmark in decentralization, enabling a large number of validators while implementing a hybrid consensus model that balances performance and security. Celestia’s innovative modular design and use of Tendermint’s consensus allows for efficient block production and finality, making it a promising option for scalable applications.

Avail’s approach mirrors Ethereum’s but offers unique advantages with its data availability mechanisms and a slightly larger validator capacity, contributing to improved decentralization. On the other hand, EigenDA’s lack of a conventional consensus mechanism raises interesting questions about the future of data availability and the trade-offs of relying on operators for data certification.

Near DA showcases impressive performance with rapid block production and finality, though its monolithic design presents challenges for scalability. The inclusion of Data Availability Sampling in Celestia and Avail marks a crucial advancement in blockchain usability, allowing lightweight nodes to participate in network verification without the overhead of full data downloads.

Overall, each platform presents distinct advantages and challenges that cater to different use cases within the blockchain ecosystem. As these technologies evolve, ongoing research and development will be vital in addressing the limitations and enhancing the capabilities of blockchain networks. Future implementations, particularly regarding DAS in Ethereum and EigenDA, will be instrumental in shaping the landscape of blockchain data availability and consensus mechanisms.

--

--

0xemre
0xemre

No responses yet