‘You’ have always been the currency of the internet, but ‘how’ may change
From ads to data, how LLMs will change how businesses look for you online
We’ve all had that uncanny feeling that our phones are listening to us. You buy one vacuum cleaner and you continue to see ads for vacuum cleaners for weeks. You simply think ‘hey I could use a new watch’ and suddenly your Instagram feed is full of modern timepieces.
Advertising has been a critical monetization strategy virtually since the Internet’s inception. Documentaries, such as The Social Dilemma from Netflix, dramatize online advertising as a corporate stalking enterprise. While these visualizations of villainous detectives scouring the internet for you are over-exaggerated, the meta commentary is mostly accurate; advertisers are paying and competing to reach you — your eyeballs, your attention, and ultimately your buying power.
Therefore, the data that has been most valuable is your demographic information, your engagement, and your purchase history to predict whether you could be the advertisers’ next customer. Businesses pay display surfaces (like Google Search, YouTube, and many other websites) to reach you wherever you are on the web (and yes sometimes to follow you around a bit in retargeting campaigns).
However, this may evolve with the proliferation of generative AI.
In its recent S-1, Reddit disclosed that over 10% of its revenue comes from selling its human curated content to other companies, who in turn use it for purposes such as LLM training data. These models need human-provided data for the same reason you look at reviews instead of the product-marketing-speak on most businesses’ websites — the human perspective. This perspective is so important to folks searching the internet that Google even launched a new feature aptly called ‘Perspectives’, so the searcher can filter out all the prettified, SEO-optimized results and find what people are actually saying. Tomasz Tunguz, a venture capitalist at Theory, predicts that this trend will continue, and the revenue streams from user data like Reddit’s may reduce the need for ads on creator websites.
Additionally, with the proliferation of AI-generated content, much of the content on the internet may not even be created by humans. An article in Scientific American anticipates a problem with this — training new models on AI-generated data will “poison” them by introducing errors in a loop of cursed recursion. Even data provided by supposed humans today on crowdsourcing platforms, like Amazon’s Mechanical Turk (“mTurk”), have recently been found to have ChatGPT’s touch, which is highly accurate and much faster for an individual microtask. Therefore, original sources of human data will dwindle in the broad content of the Internet, posing a problem for the LLM trainers that require it.
While advertising revenue has traditionally fueled digital platforms, the exploding landscape of AI may be setting us up for a paradigm shift. In this era of AI-generated content, authenticity is paramount, and the value of genuine human data is poised to play an even more significant role. Those detectives are still searching for you — they may just have found new places to look.
Do you think revenue streams from human user data will replace ads as the currency of the internet? If you are an AI researcher or developer, where do you get your trusted data? As a consumer, how do you feel about your organic content being sold as a monetization strategy instead of ads?