Foresight Ventures: Blockchain and DApp Storage

15 min readMay 10, 2023

Author: Maggie@Foresight Ventures

Key Insights

To achieve full decentralization in Web3 applications, we need technological advancements in four areas including data availability(blockchain scalability), decentralized file systems, decentralized databases, and decentralized computing.
Data retrieval speed, incentive model & tokenomics, and the guarantee algorithm for data availability are key factors that determine whether a file/database storage protocol will be widely used or not.
The main focus for improving decentralized file systems and database protocols will be on reducing retrieval times.
The data availability layer is a promising and important method for scaling blockchain. Celestia’s technology still needs market validation, and ETH and Celestia may converge technically in the future

App Architectures of Both Web2 and Web3 Applications.

Compared to Web2 applications that often consist of a frontend, backend, and data layer with a database and file system, Web3 DApps can be simpler as they only need a frontend and a smart contract that serves as both the backend and database.

But, as these DApps lack a file system to store files, their frontend pages, pictures, and other files are still hosted on centralized servers. To achieve full decentralization, developers are now using decentralized file systems to store the required files, including frontend pages, NFT metadata, and images, for DApps.

To improve structured data storage and backend computing capabilities, we utilize data availability technology to scale blockchain. Additionally, two types of products have emerged: decentralized databases and decentralized computing.

By utilizing blockchain, developers can store financial data and other critical information related to DApps. On the other hand, decentralized databases can be utilized for storing structured data such as NFT metadata, DAO voting data, DEX order books, social data, and so on. Additionally, decentralized computing can help in scaling the backend.

Overall, to build fully decentralized, flexible, and rich Web3 DApps, four types of products and technological advancements are necessary.

Decentralized file system: Store DApp frontend web pages, NFT pictures, videos, and other files of Dapps.
Decentralized database: Store structured data like NFT metadata, DAO votes, and DEX order book.
Data Availability: Scale blockchain and store financial and important data for DApps.
Decentralized Computing Tools: Scale the backend of DApps.

1. Decentralized File System

Decentralized file storage serves as a substitute for centralized storage, facilitating the realization of serverless DApps. The demand of DApps for decentralized file systems is growing and will be a vital component of the Web3 technology stack.

Compared to using centralized storage, the main advantages of decentralized storage are the removal of trusted third parties, increased redundancy, elimination of single point of failure risks, and cheaper costs.

According to Messari’s statistics, the market cap of the top 4 decentralized file storage protocols was nearly $1.6 billion, down 83% from $9.4 billion. Over 17 million terabytes (TB) of total storage capacity, up 2% YoY, and 532,500 TB of used storage, up 1280% YoY.

Let’s take a look at the current situation of several popular decentralized storage projects. Storing data using all these decentralized storage protocols is significantly cheaper compared to AWS. While AWS charges around $23/TB/month, these decentralized storage protocols range from $0.0002 to $20/TB/month.

IPFS: IPFS is currently the most widely used protocol for storing images and metadata for NFTs. It’s great for storing frequently accessed or “hot” data. However, IPFS doesn’t have any built-in ways to incentivize storage, prove data is stored correctly, or establish agreement among participants as blockchains do. This means there’s a risk of losing data if it’s only stored on IPFS. For instance, Infura’s IPFS service deletes data that hasn’t been accessed in six months. So if you want to keep your data available for a long time, it’s best to run your own IPFS node.
Filecoin: Filecoin provides low storage costs and is mainly used for storing “cold” data, such as archival data. Filecoin doesn’t have a built-in charging mechanism for data retrieving, some miners accept low-quality data to earn rewards while refusing to facilitate data retrieval. The Filecoin community is actively addressing this issue and implementing measures to improve the overall quality of stored data.
Arweave: Arweave’s idea of permanent storage is welcome for storing DApp data. The ecosystem is developing well, there are decentralized database systems using Arweave to store database files, as well as second-layer scalability solutions based on Arweave. In Arweave, price does not account for bandwidth, some nodes only provide storage services， not retrieval.
Swarm: Bandwidth fees are charged for both storage and retrieval in Swarm. The system is highly decentralized and has high bandwidth requirements for nodes.
StorJ: StorJ is different than other protocols, it is partially decentralized and has good retrieval speed. Has proven effective for large video file sharing.
Sia: Skynet Labs, closed due to a lack of new funding, which also led to a decline in Sia’s usage.

We primarily evaluate the usability of a decentralized file storage protocol based on three factors:

Data retrieval speed. It is crucially important, because it determines the efficiency of a storage system in responding to requests from DApps, and it directly affects the user experience of DApps. Factors that may affect the speed of data retrieval include: whether there is a fee for data queries, the degree of decentralization of nodes, node quality, data forwarding logic, and facilities such as CDNs for accelerated queries.
Incentive model and tokenomics. Incentive models and token economics impact the participation of storage nodes, influencing their behavior. Currently, the mainstream pricing model consists of storage fees plus bandwidth fees, meaning that users need to pay a storage fee when storing data and a bandwidth fee when accessing it. If data queries are free, nodes often lack the motivation to provide them. Moreover, incentive models and token economics impact the earnings of miners, which can affect the number of nodes and the storage capacity of the services.
Data availability guarantee algorithm. It is an algorithm used in decentralized networks to ensure the continuous availability of data and proper service provision by nodes. Currently, the most widely used method is Proof of Random Access.

Overall, we believe that,

The products and services that leverage decentralized storage protocols are still in their early stages.
The main focus for improving storage protocols will be on reducing retrieval times.
Data retrieval speed, incentive model & tokenomics, and the guarantee algorithm for data availability are key factors that determine whether a protocol will be widely used or not.

2. Decentralized Database

Databases are widely used in applications, decentralized databases are a crucial technology for achieving full decentralization in DApps.

Decentralized databases can replace centralized databases to store structured hot data that DApps require, such as NFT metadata, DAO voting, DEX order books, social media data, etc.

There are many decentralized database projects, especially in the past two years where several innovative projects have emerged.

Ceramic: Ceramic is a project started in 2019. Data is stored and managed in units of streams, and formatted event logs are added to streams. The log will be made into a file and uploaded to IPFS. Provides GraphQL API queries. Ceramic has no incentive model like IPFS and supports data creation, reading, and updating (CRU).
OrbitDB: OrbitDB is an earlier project compared to Ceramic, that also uses the IPFS file system for file storage. It supports the storage of both NoSQL databases and files.
Tableland: The project started in 2022 and is currently in the public testing phase. The production version of Tableland will be released in 2023. Data storage requires the use of smart contracts, which define SQL statements and set usage permissions. Reading data is performed off-chain and does not require payment. Currently, the contract has been deployed on L2s such as ETH and OP.
Polybase: The project is now live on the test network. It is a NoSQL database that supports CRUD operations, with each operation incurring fees. Additionally, Polybase offers support for various file systems to store database files, including local disk, IPFS, Filecoin, Polystore, and even AWS S3. Polybase also utilizes payment channels for data query payments, reducing the frequency of on-chain transactions and avoiding query delays caused by payments.
Web3Q: Also known as EthStorage.Project start in 2022. The testnet is alive. Proposed a new URL pattern Web//access protocol for accessing data
Kwill: Kwill is a SQL database system based on Arweave, using smart contracts for payment.
KYVE: KYVE is a database system based on Arwave.

From a technical perspective:

Both SQL and NoSQL can be used as databases. The data structure of SQL requires high consistency, with stronger support for joint queries, making it more mature and efficient. The KV format of NoSQL is more suitable for Ethereum’s design pattern, supporting rich data types, and being flexible and easily scalable.
In terms of functionality, the best option is to support CRUD, but supporting UD will add complexity to the system. If the system uses local storage, historical value queries may not be supported. If using IPFS and Arweave as file systems, the database needs to be append-only, otherwise, there will be multiple versions of the same data, doubling storage costs.
When choosing an underlying file system, there are two options: 1) Store database files in decentralized file systems such as IPFS and Arweave; 2) Store them locally on nodes or in the S3 cloud. If a decentralized database project requires customized retrieval logic or optimization, using local storage or S3 is a more flexible approach.

Overall, we believe that,

The field of decentralized databases is highly worth paying attention to, with an urgent demand, while a widely accepted and used product has not emerged yet.
The maturity of decentralized databases is lower than that of decentralized file storage systems. Decentralized database tech is based on the decentralized file system and many projects are started in 2022.
The main focus for improving storageData retrieval speed, incentive model & tokenomics, and the guarantee algorithm for data availability are key factors that determine whether a protocol will be widely used or not. protocols will be on reducing retrieval times.

3. Data Availability

The concept of data availability is distinguished from decentralized file systems and databases, as elucidated on the websites of Ethereum and Celestia.

Ethereum: Data availability is the guarantee that the block proposer published all transaction data for a block and that the transaction data is available to other network participants.
Celestia: Data availability is concerned with whether the data published in the latest block is available.

While, decentralized file systems and databases mainly ensure that the data stored by users is available, but do not specifically address transactional data.

Currently, there are several data availability projects including:

Ethereum. ETH serves as the DA (data availability) layer for Layer 2 Rollup.
Celestia. Celestia is a specifically designed DA layer that only handles data availability and does not execute transactions. It sparked a trend of modular blockchains in 2022.
EigenDA and other DA products. Ensuring data availability through committees.

Ethereum

ETH Layer 2 creates and submits batches of transactions to the Ethereum network, and stores the data in an Ethereum smart contract on Layer 1. This ensures the guaranteed availability of L2 transaction data through the ETH network.

Although rollups can extend the throughput of ETH through off-chain computation, their capacity is limited by the L1 ETH blockchain data throughput. Therefore, Ethereum needs to increase its data storage and processing capabilities.

To scale up Ethereum’s DA capacity, Danksharding has been included in ETH’s roadmap and is considered one of the most important and urgent updates currently.

Danksharding is a sharding design, data availability is delegated to each shard, and each validator only needs to run a full node for its own shard, while running other shards with light client capacity.

Proto-danksharding (EIP-4844) is a preliminary implementation of Dankshading, which is expected to be implemented in the second half of 2023. It introduces a data blob stored off-chain that is mounted on ETH via transactions, as well as precompiled code for validating Blob. Each blob is approximately 125kB in size, while a block is only 90kB. Currently, at most eight blobs can be mounted per block, resulting in additional storage of 1MB. In Proto-danksharding, the data has not been sharded, and validators still need to download and directly verify the availability of all Blob data. After the implementation of EIP4844, Blob can store 10 times more data than Calldata with the same gas consumption. The data of Rollup can be stored in Blob in the future, reducing transaction fees by an order of magnitude. Once fully implemented, Danksharding will become even cheaper.

In summary, Danksharding can improve Ethereum’s data storage capacity, reduce the cost of ETH used as DA, and become a more powerful DA layer.

Celestia

Celestia is a minimal blockchain that only orders and publishes transactions and does not execute them. By decoupling the consensus and application execution layers, Celestia modularizes the blockchain technology stack and unlocks new possibilities for decentralized application builders.

Celestia is responsible for the DA layer, while ETH handles consensus and settlement, and the application chain is responsible for execution.
Celestia is responsible for both the DA layer and the consensus layer, while settlement and execution are handled by the application chain. Alternatively, settlement can use Cevmos, with execution still being the responsibility of the application chain.

Celestia integrates a 2-dimensional Reed-Solomon encoding scheme and has designed a random sampling scheme to verify the availability of data and recover it, similar to the validation method used by ETH.

And Celestia also has significant differences from ETH.

Celestia focuses on the DA layer and consensus layer, while ETH also served as a settlement layer for Rollups
Celestia does not have a Turing complete smart contract virtual machine, therefore, it does not support smart contracts.
Celestia’s sovereign rollup can fork into multiple chains, while ETH’s Rollup cannot.
Celestia doesn’t have smart contracts, bridges with sovereign rollups would mainly facilitate the movement of the DA layer token.

The ecosystem of Celestia is growing fast.

Off-chain DA

Off-chain DA mainly include

Data Availability Committees (DACs) are trusted parties that provide, or attest to, data availability. DACs are also used by some validiums.
Proof-of-stake Data Availability Committees are considerably more secure than regular DACs because they directly incentivize honest behavior. Here, anyone can become a validator and store data off-chain. However, they must provide a “bond”, which is deposited in a smart contract.

The overview of data availability products.

ETH: ETH currently serves as the data availability layer for L2 optimistic rollups and zk rollups. The adoption of EIP4844 (Proto-Danksharding) will provide additional benefits to L2. Although the storage capacity of ETH may not be as large as Celestia’s, it will become comparable once Danksharding is fully implemented.
Celestia: Celestia is designed to function as a consensus and data availability layer. The Celestia testnet went online in June 2022 and its innovative modular design has made it increasingly popular since 2022. Celestia needs to establish its own ecosystem and exist in a competitive relationship with Ethereum. Many projects are built on Celestia.
Avail: Avail was originally launched by Polygon in June of 2022. However, following the departure of its founder from Polygon, Avail has become an independent modular blockchain project and a testnet has been released. Avail is a standalone consensus and DA layer like Celestia. The Avail mainnet was planned to be bridged to Polygon and use MATIC as the base currency. Compared to Celestia tokens, MATIC is a more mature token.
EigenDA: EigenDA is an Ethereum-based DA layer that incentivizes validators to maintain the network through ETH re-staking, eliminating the need for a startup burden like that required by Celestia.
Other off-chain DA: Validium uses off-chain storage for data availability, Ethereum for consensus and settlement, and Validium rollup for execution. Validium may be phased out as Celestia and Danksharding gain widespread adoption.

In conclusion, we think,

A data availability layer is a promising and important approach to scaling blockchains.
The current DA products have their own advantages, and they all deserve continuous attention.
Celestia’s technology still needs to be verified by the market, and ETH and Celestia may also converge technically in the future.

4. Decentralized Computation

Although we have observed a few decentralized computing projects, we believe that the development of decentralized computing is still in its nascent stages. One of the major challenges faced in this area is verifying the accuracy of computations.

More Explain

Fully decentralization is not always necessary. Currently, there are three main types of DApp architectures available. Centralized services can be beneficial in situations requiring high performance and involving arbitrary complex computations.

It appears that some individuals may not have a complete understanding of the differences between the consensus layer and the settlement layer. To clarify, I will elaborate on the four functions in the blockchain using Ethereum’s ZK Rollup as an example.

After transactions occur on Layer 2, they are submitted to the Sequencer who batches and rolls them up before submitting them to the smart contract on the ETH blockchain. As the rollup is added to the ETH chain, consensus on the order of transactions is confirmed and ETH becomes the consensus layer of the Rollup. As Layer 2 transactions are stored on the ETH blockchain, ETH also serves as the DA (Data Availability) layer for Layer 2.
Layer 2 nodes perform transaction execution, alter the global state of Layer 2, and generate zero-knowledge proofs. Layer 2 serves as the execution layer.
Layer 2 submits the ZKP to ETH, where the ETH contract verifies its validity. Once the proof is accepted, the new state of Layer 2 is confirmed. ETH serves as the settlement layer for Layer 2 zk rollup.

There are other types of data-related projects, such as:

Projects that focus on indexing on-chain data, such as The Graph and Space and Time, or indexing IPFS data, such as Filecoin Indexer.
DNS networks, including LivePeer, Meson Network, Media.network, and others.
Storage node reputation markets like Filgram, Filrep, and Cidgravity, with UI/UX examples such as Web3.storage and NFT.storage.

References

Original Report: https://img.foresightnews.pro/file/Blockchain_Dapp_Storage.pdf

Data Availability: https://ethereum.org/en/developers/docs/data-availability/

Danksharding: https://ethereum.org/en/roadmap/danksharding/

EIP4844: https://eips.ethereum.org/EIPS/eip-4844

Celestia: https://celestia.org/

Scroll: https://scroll.io/blog/architecture

Alchemy: https://www.alchemy.com/best/decentralized-computing-tools

Avail: https://www.availproject.org/

EigenLayer: https://www.eigenlayer.xyz/

Messari report: https://messari.io/report/the-essential-guide-to-decentralized-storage-networks?utm_source=twitter_messaricrypto&utm_medium=organic_social&utm_campaign=guide_to_decentralized_storage

Filecoin: https://filecoin.io/

Arweave: http://arweave.org/

StorJ: https://www.storj.io/

Sia: https://sia.tech/

Ceramic: https://ceramic.network/

OrbitDB: https://github.com/orbitdb

Tableland: https://tableland.xyz/

Polybase: https://polybase.xyz/

Web3Q: https://www.web3q.io/

Kwil: https://www.kwil.com/

KYVE: https://www.kyve.network/

Space and Time: https://www.spaceandtime.io/

About Foresight Ventures

Foresight Ventures is dedicated to backing the disruptive innovation of blockchain for the next few decades. We manage multiple funds: a VC fund, an actively-managed secondary fund, a multi-strategy FOF, and a private market secondary fund, with AUM exceeding $400 million. Foresight Ventures adheres to the belief of “Unique, Independent, Aggressive, Long-Term mindset” and provides extensive support for portfolio companies within a growing ecosystem. Our team is composed of veterans from top financial and technology companies like Sequoia Capital, CICC, Google, Bitmain, and many others.

Website: https://www.foresightventures.com/

Twitter: https://twitter.com/ForesightVen

Medium: https://foresightventures.medium.com

Substack: https://foresightventures.substack.com

Discord: https://discord.com/invite/maEG3hRdE3

Linktree: https://linktr.ee/foresightventures

Disclaimer: All articles by Foresight Ventures are not intended to be investment advice. Individuals should assess their own risk tolerance and make investment decisions prudently.