With artificial intelligence (AI) seemingly destined to become central to everyday digital applications and services, anchoring AI models on public blockchains potentially helps to “establish a permanent provenance trail,” asserted Michael Heinrich, CEO of 0G Labs. According to Heinrich, such a provenance trail enables “ex-post or real-time monitoring analysis” to detect any tampering, injection of bias, or use of problematic data during AI model training.
Anchoring AI on Blockchain Aids in Fostering Public Trust
In his detailed responses to questions from Bitcoin.com News, Heinrich—a poet and software engineer—argued that anchoring AI models in this way helps maintain their integrity and fosters public trust. Additionally, he suggested that public blockchains’ decentralized nature allows them to “serve as a tamper-proof and censorship-resistant registry for AI systems.”
Turning to data availability or the lack thereof, the 0G Labs CEO said this is something of concern to both developers and users alike. For developers who are building on layer 2 solutions, data availability matters because their respective “applications need to be able to rely on secure light client verification for correctness.” For users, data availability assures them that a “system is operating as intended without having to run full nodes themselves.”
Despite its importance, data availability remains costly, accounting for up to 90% of transaction costs. Heinrich attributes this to Ethereum’s limited data throughput, which stands at approximately 83KB/sec. Consequently, even small amounts of data become prohibitively expensive to publish on-chain, Heinrich said.
Below are Heinrich’s detailed answers to all the questions sent.
Bitcoin.com News (BCN): What is the data availability (DA) problem that has been plaguing the Ethereum ecosystem? Why does it matter to developers and users?
Michael Heinrich (MH): The data availability (DA) problem refers to the need for light clients and other off-chain parties to be able to efficiently access and verify the entire transaction data and state from the blockchain. This is crucial for scalability solutions like Layer 2 rollups and sharded chains that execute transactions off the main Ethereum chain. The blocks containing executed transactions in Layer 2 networks need to be published and stored somewhere for light client to conduct further verification.
This matters for developers building on these scaling solutions, as their applications need to be able to rely on secure light client verification for correctness. It also matters for users interacting with these Layer 2 applications, as they need assurance that the system is operating as intended without having to run full nodes themselves.
BCN: According to a Blockworks Research report, DA costs account for up to 90% of transaction costs. Why do existing scalability solutions struggle to provide the performance and cost-effectiveness needed for high-performance decentralized applications (dapps)?
MH: Existing Layer 2 scaling approaches like Optimistic and ZK Rollups struggle to provide efficient data availability at scale due to the fact that they need to publish entire data blobs (transaction data, state roots, etc.) on the Ethereum mainnet for light clients to sample and verify. Publishing this data on Ethereum incurs very high costs – for example one OP block costs $140 to publish for only 218KB.
This is because Ethereum’s limited data throughput of around 83KB/sec means even small amounts of data are very expensive to publish on-chain. So while rollups achieve scalability by executing transactions off the main chain, the need to publish data on Ethereum for verifiability becomes the bottleneck limiting their overall scalability and cost-effectiveness for high-throughput decentralized applications.
BCN: Your company, 0G Labs, aka Zerogravity, recently launched its testnet with the goal of bringing artificial intelligence (AI) on-chain, a data burden that existing networks aren’t capable of handling. Could you tell our readers how the modular nature of 0G helps overcome the limitations of traditional consensus algorithms? What makes modular the right path to building complex use cases such as on-chain gaming, on-chain AI, and high-frequency decentralized finance?
MH: 0G’s key innovation is separating data into data storage and date publishing lanes in a modular manner. The 0G DA layer sits on top of the 0G storage network which is optimized for extremely fast data ingestion and retrieval. Large data like block blobs get stored and only tiny commitments and availability proofs flow through to the consensus protocol. This removes the need to transmit the entire blobs across the consensus network and thereby avoids the broadcast bottlenecks of other DA approaches.
In addition, 0G can horizontally scale consensus layers to avoid one consensus network from becoming a bottleneck, thereby achieving infinite DA scalability. With an off the shelf consensus system the network could achieve speeds of 300-500MB/s which is already a couple magnitudes faster than current DA systems but still falls short of data bandwidth requirements for high performance applications such as LLM model training which can be in the 10s of GB/s.
A custom consensus build could achieve such speeds, but what if many participants want to train models at the same time? Thus, we introduced infinite scalability through sharding at the data level to meet the future demands of high performance blockchain applications by utilizing an arbitrary number of consensus layers. All the consensus networks share the same set of validators with the same staking status so that they keep the same level of security.
To summarize, this modular architecture enables scaling to handle extremely data-heavy workloads like on-chain AI model training/inference, on-chain gaming with large state requirements, and high frequency DeFi applications with minimal overhead.These applications are not possible on monolithic chains today.
BCN: The Ethereum developer community has explored many different ways to address the issue of data availability on the blockchain. Proto-danksharding, or EIP-4844, is seen as a step in that direction. Do you believe that these will fall short of meeting the needs of developers? If yes, why and where?
MH: Proto-danksharding (EIP-4844) takes an important step towards improving Ethereum’s data availability capabilities by introducing blob storage. The ultimate step will be Danksharding, which divides the Ethereum network into smaller segments, each responsible for a specific group of transactions. This will result in a DA speed of more than 1 MB/s. However, this still will not meet the needs of future high-performance applications as discussed above.
BCN: What is 0G’s “programmable” data availability and what sets it apart from other DAs in terms of scalability, security, and transaction costs?
MH: 0G’s DA system can enable the highest scalability of any blockchain, e.g., at least 50,000 times higher data throughput and 100x lower costs than Danksharding on the Ethereum roadmap without sacrificing security. Because we build the 0G DA system on top of 0G’s decentralized storage system, clients can determine how to utilize their data. So, programmability in our context means that clients can program/customize data persistence, location, type, and security. In fact, 0G will allow clients to dump their entire state into a smart contract and load it again, thereby solving the state bloat problem plaguing many blockchains today.
BCN: As AI becomes an integral part of Web3 applications and our digital lives, it’s crucial to ensure that the AI models are fair and trustworthy. Biased AI models trained on tampered or fake data could wreak havoc. What are your thoughts on the future of AI and the role blockchain’s immutable nature could play in maintaining the integrity of AI models?
MH: As AI systems become increasingly central to digital applications and services affecting many lives, ensuring their integrity, fairness and auditability is paramount. Biased, tampered or compromised AI models could lead to widespread harmful consequences if deployed at scale. Imagine a horror scenario of an evil AI agent training another model/agent which directly gets implemented into a humanoid robot.
Blockchain’s core properties of immutability, transparency and provable state transitions can play a vital role here. By anchoring AI models, their training data, and the full auditable records of the model creation/updating process on public blockchains, we can establish a permanent provenance trail. This allows ex-post or real-time monitoring analysis to detect any tampering, injection of bias, use of problematic data, etc. that may have compromised the integrity of the models.
Decentralized blockchain networks, by avoiding single points of failure or control, can serve as a tamper-proof and censorship-resistant registry for AI systems. Their transparency allows public auditability of the AI supply chain in a way that is very difficult with current centralized and opaque AI development pipelines. Imagine having a beyond-human intelligence AI model; let’s say it provided some result but all it did was alter database entries in a central server without delivering the result. In other words, it can cheat more easily in centralized systems.
Also, how do we provide the model/agent with the right incentive mechanisms and place it into an environment where it can’t be evil. Blockchain x AI is the answer so that future societal use cases like traffic control, manufacturing, and administrative systems can truly be governed by AI for human good and prosperity.
What are your thoughts on this interview? Share your opinions in the comments section below.