How the Decentralization State of Crypto Opens Up a Huge Data Pool & Why This Would Change the User-Data Relationship? | by Siaw Hui | Sep, 2022

Data by the users for the users

We are all too familiar with the experience of feeling spooky seeing suggested recommendations on some apps after a certain topic was discussed in a conversation (as if our smartphone is eavesdropping on us 24/7) & having an algorithm at work flooding us with issues related to our previous view/watch isn’t great.

Web2 tech companies atrociously hoard our data for free & use it back on us — at the expense of our attention span, the bombardment of ads, and recommendations.

It’s all big data at play.

In the current digital world, the abundance of data is akin to the new gold.

The irony of the users who produce the data are being sold to with these data that are collected free of charge — with ads, content and whatever means that the current world can make out of this gold mine.

The familiarity residing with the frustration of “skip ads” and dopamine influx day-in and day-out — that’s the gameplay of the current social media towards the users.

For Web2 companies, what a time to live in — to be able to sit on top of such a huge pile of a gold mine, to be able to hoard data from users & sell products/services back to them.

However, as grandiose the idea of big data changing the world, in reality, the data that we see today remain gated, behind each company that helps to generate that.

As companies are bound to act in the best interest of shareholders’ profitability & at the same time bearing the risk of breaching anti-competition law, any form of partnership or collaboration with competitors is not in the best interest of the companies, resulting in each of them protecting their data as a form of competitive advantage.

So, how can Web2 “big data” changes things when accessibility & coverage remain missing from all the data pool we see today?

The advancement of technology, adoption of smartphones & addiction to social media have all but created the influx of digital data, especially on individuals’ behaviors which are not possible a few decades ago.

The thought that big tech like Google perhaps know us better than ourselves sound scary & interesting but it is only possible with many of its products being widely used in their respective domain.

However, as much as Google is able to track certain data, they do not have visibility towards the bigger picture with the inclusivity of transactional-based data like e-commerce transactions or banking data.

So, this creates data clusters that are neither accessible nor has the coverage to show the bigger picture of how each user, segment, industry or landscape is performing.

But what if, there are data out there that is able to provide not just coverage or free accessibility, but could even provide real-time data that makes it even more valuable?

Crypto data that is transparent, accurate & real-time are only possible with blockchain, the underlying digital ledger technology that makes all the other crypto projects to be able to be in or at least in the progress of getting to the state of decentralized.

Any applications that run on Ethereum (the main dominant blockchain in the crypto space), will have their data captured on the Ethereum blockchain.

This signifies on-chain data.

On-chain data opens up a whole new avenue for the data world, given that it’s a free “gold” readily available for us to mine.

These data have to be extracted and processed into easily readable formats. Ethereum ETL is an open-source project that allows users to convert blockchain data into convenient formats such as CSV.

Transactions that happen on the blockchain, will be captured on-chain, which is irreversible & transparent. In short, on-chain data will be able to provide data that previously was locked in each of the banks/institutions/companies like:

  • Transactional data between seller & buyers
  • Amount of asset (bitcoin, ether, etc) a user holds in his/her wallet
  • Amount of asset that a crypto project hold

Given that the crypto space has been booming, having on-chain data that provides the bigger picture of how the landscape is performing would help the users to determine where they want to stash their assets & which project is performing.

Below are some examples of data for each sector within the crypto space.


Decentralized Finance (DeFi for short) has been gaining traction in the past 2 years, in which 2020–2021 has been deemed as DeFi Summer. The total amount of DeFi assets locked as of 20th Aug 2022 amounted to $60.58 billion.

Various data on DeFi from Token Terminal & The Block


The year 2021 to early 2022 has seen the NFT (short for the non-fungible token) market garnering huge interest, adoption & development, thanks to collections like Bored Ape Yacht Club, CryptoPunks or Azuki.

The boom in the NFT market has contributed to a massive influx of data on NFT transactions & NFT projects (i.e. floor prices, transactions between users, the volume of a particular NFT project) and the next phase of NFT will touch heavily on the utility of NFT.

Various data on NFT from Nansen, Dune Analytics, Dappradar


Play-to-earn or GameFi (short for Game + Finance) changes the way the game works as the game components are able to be converted to a crypto token, or winning games can result in earning crypto tokens, which can be sold for fiat.

In short, a game in crypto allows the withdrawal of benefits out of its gaming ecosystem, which most Web2 do not practice.

GameFi saw a leapfrog in adoption and interest through Axie Infinity, which act as an income replacement during the Covid-19 but the tokenomics for most GameFi are not sustainable & it will be interesting to see how the GameFi industry innovates in the next coming years.

GameFi data from Footprint Analytics


DAO, which is short for Decentralized Autonomous Organization has become a means to manage community, decision, voting & treasury fund within the group of communities in the most transparent way as the fund & decisions are all recorded on the blockchain.

This is exactly how we define bottom-to-top decision making, as communities create proposals for a suggested decision within a DAO and the communities vote for it to happen, instead of having a core team within a DAO who decides everything.

DAO data acquired from DeepDao & Openorgs

Users’ Transactions & Assets Data

The largest chunk of data within the crypto space would be the users’ transactional data, which contributed to all the data sets mentioned.

By looking into each user’s wallet, we are able to identify the assets they accumulated & their activities. By understanding these two key components across numbers of users, especially those who are holding large numbers of assets, we are able to gauge the market interest.

Users’ Wallet Address Details obtained from

While having all these data proved that big data with transparency, accuracy & coverage is possible, however, on-chain data coverage is still not sufficient. The inclusivity of all the other data, for example, weather data, financial data & economic data would help to provide the complete bigger picture alongside the on-chain data, creating a holistic view.

The most notable application that helps to link these data together would be Chainlink.

Chainlink as the bridge between off-chain & on-chain data to provide a holistic view

As great as on-chain data proved to be, it is a double-edged sword. The issue pertaining to privacy & security will be one of the key barriers that prevent individuals from adopting crypto.

The world that opens up transparency will inevitably push the button on privacy matters. We are all used to having a centralized institution to govern our data, presumably it is safely guarded.

In reality, it is nothing more than exposing ourselves to a huge risk, putting all the trust in the centralized institution to govern these data or asset

The ever-changing & evolving landscape of crypto is both exciting & challenging. The fast pace of adoption and new innovation pushes forward with potential solutions to solve all these problems that mushroomed from all the innovation in the space.

We have seen that on-chain data reveals pretty much all the sensitive data like your wallet information without revealing much about the users (unless you include your Ethereum Name Service tagged to the wallet). Of course, privacy issues surrounding this may make many (especially the non-crypto individuals) uncomfortable, knowing that everyone can peek into their wallet freely, which may deter them from adopting crypto.

CoinJoin is a strategy to provide anonymity & protect the privacy of Bitcoin users when they conduct transactions by using a process called coin mixing on the Tor network. Wallets supported are Wasabi Wallet & Samourai Wallet.

Other wallets like Sparrow Wallet & Mercury Wallet leverage different technology to push for stronger privacy & security for Bitcoin wallets.

Stealth addresses for NFTs on the Ethereum blockchain proposed by Vitalik Buterin, the co-founder of Ethereum, is a push for more anonymity in the space.

The implementation of anonymity in the crypto space in order to protect the users while at the same time pushing forth the value of transparency that befits the crypto culture is not an easy feat but eventually it will unfold to see if the right balance could be achieved.

Be the first to comment

Leave a Reply

Your email address will not be published.