It should be clear that I’m an AI optimist. I think AI can make life for humanity much, much better than it’s ever been. I used to be a naive optimist, thinking that AI would almost automatically be good, because it would lack the biases and limitations we evolved with. I’m now a much more cautious AI proponent, but I still think we should build it.
That’s why it’s so sad for me to see the backlash that’s currently happening around the world against AI. People are so negative about the current crop of LLMs and image generators that many would like to stop the whole endeavor. This is understandable in some ways, but would cause us to miss out on all the potential benefits. Even more sad is that this backlash could have been avoided if the corporations building frontier AI models had done things differently. This includes OpenAI, Microsoft, Google, and even Anthropic.
Each thing the general public is quite reasonably upset about corresponds to a greedy, short-sighted decision the AI companies made:
- Consent: AI companies scraped the web (or used existing scrapes like Common Crawl), hoovering up gargantuan amounts of writing, images, and code created by hordes of human creators for various reasons. The companies used this data to train their models without asking any of the creators if they were okay with this.
- Attribution: The way AI models are trained, the content is stored without making a note of where it came from or who created it. This means it’s impossible to determine whose work contributed to a model’s output, unless of course the prompt specifically said something like “Write in the style of <insert name of living writer trying to make a career> …” or “Create an image in the style of <insert name of living artist trying to make a career>…”.
- Compensation: No AI company has even made an attempt or proposal to offer any form of compensation to human creators. While compensating individual contributors for each image or text generation might be impractical on technical grounds, the companies could have at least created a pool of money that could have been split up in some way in proportion to how much each creator’s work was represented in the training data. Not only didn’t they do this, they hide the contents of the training data to help make it hard for people to know if they’ve been ripped off.
- Direction: Many people are understandably upset that AI is now writing poems and painting pictures, while humans are still roofing, cleaning toilets and manning fast-food drive-thrus. This makes it seem like the robots are having all the fun and we’re their slaves. The companies released chatbots and image generators because that’s what they had available that might be profitable. There’s also a lot of churn and confusion on whether AI is here to augment us or replace us. If AI is here to automate labor and replace us, then it’s a capitalist tool (until such time as we have buy-in from everyone and UBI to smooth the transition). If it’s here to work for us and augment us so our minds and lives can soar like never before, then it’s our friend. AI companies could have focused on broadly beneficial applications, but instead it’s augmenting a few elites at the expense of everyone else. For the companies, it’s all about gaining first mover advantage in order to lock in functional monopolies via network effects.
How can we get out of this mess and turn AI development in a direction that all of humanity will support and embrace? I personally don’t see how it could happen unless it begins with a mea culpa from the big players like OpenAI, Microsoft, and Google. If they don’t admit to the missteps above, nobody will believe anything they say about our positive future with AI. Perhaps some companies might become willing to do this when they realize that pissing off all of the artists in the world wasn’t a smart move.
But apologizing and promising to do better obviously isn’t enough. For-profit companies have a dismal track record of over-promising and under-delivering. Here’s what I’d love to see the companies do to back up their words:
- Dataset transparency: The companies must publish detailed descriptions of the data they previously used to train today’s leading models. This will open them up to lawsuits, so I wouldn’t blame them if they did this later, after some of the other measures below have allowed them to gain some trust or even retire the models that were trained on public data.
- Opt-in only consent: We need to throw away today’s approach of scraping public web data without consent. Companies need to start over with an approach where they build datasets intentionally from content owners who’ve agreed to donate or sell their stuff. This not only solves the consent problem, it’s likely to create much smaller, higher quality datasets that could radically reduce the price and environmental impact of training a frontier model, while improving its output quality.
- Synthetic data: We now have AIs that are capable enough to create realistic data that can be used to train the next model. If done with care, this can create higher quality data that has less legal and moral baggage. However, the consent problem would just be expanded to the new models if the data generating models are the kind that were trained on non-consensual data.
- Positive direction: Companies need to step up their PR game and develop products that show their words aren’t just empty platitudes. We hear promises of educational and medical AIs that will radically advance the standard of living for every human on earth. Can we please accelerate these projects over digital girlfriends (we’re not ready), spam and deep fake generation, and abundant generation of crappy marketing content? This would require the AI-As-A-Service providers to exercise much more control over what kinds of apps can be built on top of their models.
The last point about controlling what apps are built with AI would directly and significantly cut into the profits of the AI giants. So, this is where the rubber meets the road. If companies aren’t willing to do this, then we know we can’t trust them to be on our side and develop AI that truly benefits all of humanity. If OpenAI, Microsoft, and Google won’t do it, then I hope an upstart will take the high road. We can’t stop AI from happening, and we shouldn’t want to. But we need to be on the lookout for organizations that are doing it right and throw our support behind them when they emerge. This is how we can transform the AI backlash into AI guidance that helps steer the ship where we all want to go.