Opinion by: Michael O’Rourke, founder of Pocket Network and CEO of Grove
Open data is currently a major contributor toward building a global emerging tech economy, with an estimated market of over $350 billion. Open data sources often rely, however, on centralized infrastructure, contrary to the philosophy of autonomy and censorship resistance.
To realize its potential, open data must shift to decentralized infrastructure. Once open data channels start using a decentralized and open infrastructure, multiple vulnerabilities for user applications will be solved.
Open infrastructure has many use cases, from hosting a decentralized application (DApp) or a trading bot to sharing research data to training and inference of large language models (LLMs). Looking closely into each helps us better understand why leveraging decentralized infrastructure for open data is more utilitarian than centralized infrastructure.
Affordable LLM training and inference
The launch of the open-source AI DeepSeek, which wiped out $1 trillion from the US tech markets, demonstrates the power of open-source protocols. It’s a wake-up call to focus on the new world economy of open data.
To begin with, closed-source, centralized AI models have high costs for training LLMs and generating accurate results.
Unsurprisingly, the final stage of training DeepSeek R1 cost just about $5.5 million, compared to over $100 million for OpenAI’s GPT-4. Yet, the emerging AI industry still relies on centralized infrastructure platforms like LLM API providers, which are essentially at odds with emerging open-source innovations.
Hosting open-source LLMs like Llama 2 and DeepSeek R1 is simple and inexpensive. Unlike stateful blockchains requiring constant syncing, LLMs are stateless and only need periodic updates.
Recent: Here’s why DeepSeek crashed your Bitcoin and crypto
Despite the simplicity, the computational costs of running inference on open-source models are high, as node runners need GPUs. These models can save costs as they don’t require real-time updates to continuously sync.
The rise of generalizable base models like GPT-4 has enabled the development of new products through contextual inference. Centralized companies like OpenAI won’t allow any random network support or inference from their trained model.
On the contrary, decentralized node runners can support the development of open-source LLMs by serving as AI endpoints to provide deterministic data to clients. Decentralized networks lower entry barriers by empowering operators to launch their gateway on top of the network.
These decentralized infrastructure protocols serve millions of requests on their permissionless networks by open-sourcing the core gateway and service infrastructure. Consequently, any entrepreneur or operator can deploy their gateway and tap into an emerging market.
For example, someone can train an LLM with decentralized computing resources on the permissionless protocol Akash, which enables customized computing services at 85% lower prices than centralized cloud providers.
The AI training and inference market has immense potential. AI companies spend approximately $1 million daily on infrastructure maintenance to run LLM inference. This takes the service obtainable market, or SAM, to roughly $365 million annually.
As the data suggests, the market conditions indicate a massive growth potential for decentralized infrastructure.
Accessible research data sharing
In the scientific and research domain, data sharing combined with machine learning and LLMs can potentially accelerate research and improve human lives. Access to that data has been walled in by the high-cost journal system, which selectively publishes the research that its board approves of and is broadly inaccessible behind expensive subscriptions.
With the rise of blockchain-based zero-knowledge ML models, data can now be shared and computed trustlessly, and privacy can be preserved without revealing sensitive data. Thus, researchers and scientists can share and access research data without de-anonymizing potentially restricted personally identifiable information.
To sustainably share open research data, researchers need access to a decentralized infrastructure that rewards them for access to that data, cutting out the middleman. An incentivized open data network can ensure that scientific data remains accessible outside the walled garden of expensive journals and private corporations.
Unstoppable DApp hosting
Centralized data hosting platforms such as Amazon Web Services, Google Cloud and Microsoft Azure are popular among app developers. Despite their easy accessibility, centralized platforms suffer from a single point of failure, affecting reliability and leading to rare but plausible outages.
There are various instances in tech history when Infrastructure-as-a-Service platforms have failed to provide uninterrupted services.
For example, in 2022, MetaMask temporarily denied access to users from specific geographical regions because Infura blocked them after some US sanctions. Although MetaMask is decentralized, its default connections and endpoints depend on centralized tech like Infura to access Ethereum.
This wasn’t an isolated incident, either. Infura clients also faced an interruption in 2020, while Solana and Polygon experienced an overloading of centralized remote procedure calls (RPCs) during peak traffic.
It is difficult for one company to handle diverse developer needs in a thriving open-source ecosystem. There are thousands of layer 1s, rollups, indexing, storage and other middleware protocols with niche use cases.
Most centralized platforms, like RPC providers, keep building the same infrastructure, which creates friction, slows growth metrics, and affects scalability because protocols focus on rebuilding the foundation instead of adding new features.
On the contrary, the massive success of decentralized social network applications like BlueSky and AT Protocol signals users’ quest for decentralized protocols. Moving past centralized RPCs into accessing open data, such protocols remind us of the need to build and work on decentralized infrastructure.
For example, a decentralized finance protocol can source onchain price data from Chainlink to stop depending on centralized APIs for price feeds and real-time market data.
There are roughly 100 billion serviceable RPC requests in the Web3 market, costing $3–$6 per million requests. Thus, the total addressable market size of Web3 RPC is $100 million–$200 million annually. With the steady growth of new data availability layers, there can be over 1 trillion RPC requests daily.
It is imperative to pivot toward decentralized infrastructure to stay in sync with open data transfers and tap into the open-source data market.
Open data requires decentralized infrastructure
We’ll see generalized blockchain clients offloading storage and networking to specialized middleware protocols in the long term.
For example, Solana led the decentralization movement when it first started to store its data on chains such as Arweave. No wonder Solana and Phantom were once again the primary tools for handling the massive TRUMP presidential memecoin traffic, a key moment in financial and cultural history.
In the future, we’ll see more data flow through infrastructure protocols, creating dependencies on middleware platforms. As protocols become more modular and scalable, it’ll make space for open-source, decentralized middleware to integrate at the protocol level.
It is unfeasible to have centralized companies function as intermediaries for light client headers.
Decentralized infrastructure is trustless, distributed, cost-effective and censorship-resistant. As a result, decentralized infrastructure will be the default choice for app developers and companies alike, leading to a mutually beneficial growth narrative.
Opinion by: Michael O’Rourke, founder of Pocket Network and CEO of Grove.
This article is for general information purposes and is not intended to be and should not be taken as legal or investment advice. The views, thoughts, and opinions expressed here are the author’s alone and do not necessarily reflect or represent the views and opinions of Cointelegraph.