AI Is An Orange Juicer

AI Is An Orange Juicer

An analogy of which I am inordinately fond is that of orange juice: a delicious beverage that is created by taking a whole thing, an orange—which presents you with a balance of sweet enjoyment and fibrous goodness—and dividing the two, leaving you with an insulin spike in a glass. You take something that contains within it balancing elements making it complete (and preventing its consumption to excess) in favour of extracting only the bit that’s both the core attraction and the corrupting factor.

This is what I found myself thinking of when I saw this video of a guy which had been shared in a Slack I'm a member of. He calls himself an "AI SEO Expert"—now there's a sadder short story than the one Hemingway wrote—and he's automated his content production by asking Perplexity to summarise what's new in AI over the last few days, getting ChatGPT to rewrite it, then posting it to Linkedin et al.

As usual, if we leave aside any concerns about the ethics of using the AI models themselves, this feels bleak as anything. It is the kind of thing that people would've made jokes about a few years ago when LLMs were first blowing up. He says it saves him hours a week, but surely understanding what's going on in AI is... useful to him? Outsource the initial research bit, maybe, but isn't that kind of thing valuable for keeping up with the field you're allegedly expert in? Don't you want to maintain some quality control over what you post, either?

That he's admitting to it also feels very strange—but the audience concern he anticipates is not "why am I following you if all your stuff is AI-generated?", it's "but doesn't Google downrank AI-generated content? And here we're reminded: he's not just an AI guy, but an AI SEO guy. SEO itself is the understandable but fairly unfortunate process of deforming the web to better conform to the tools people use to access it. This is like the next step down from SEO; SEO tended to at least be bottlenecked somewhat by requiring people to generate the stuff.

But because the corpuses of data used to train the models need huge, huge amounts of structured data, one of the things you find is that they're heavily trained on data that's already search-engine optimised!

It contains less about how humans see the world than it does about how search engines see the world. It is a dataset that is powerfully shaped by commercial logics.

I know this is basically the standard complaint/jeremiad people have been having about this stuff for ages, but it's somewhat astonishing that someone touting himself as the vanguard of this stuff is so willing to say "yes, I am not even slightly involved in the creation of stuff that has my name to it; no, I am not ashamed".