It’s not about identifying AI or even spam, but about extracting useful information. Are the claims made in a source backed by other sources? Do they violate information from trusted sources? That’s all stuff that an AI can reason about and then discard the source as junk or condense it down to the useful information in it.
Basically you completely skip browsing the Web yourself and just use the AI to find you what you want. Think of it like some IMDB or Wikipedia, but covering everything and written and curated by AI. When the AI doesn’t already know some fact, it goes crawling the Web and finding it out for you, expanding its knowledge base in the process.
Or see the ship computer from StarTrek, you don’t see the people there browsing the Web, you see them getting data in exactly the format they need and they can reformat and filter it as needed.
At the moment there are still some technical hurdles, the AI systems we have are all still a little to stupid for this. But that seems to be the direction we are heading, things like summarizer bots already do a pretty good job and ChatGPT is reasonably good at answering basic questions and reformatting it the way you need it. Only a matter of time until it gets good enough that you couldn’t do a better job yourself.
You’re looking at it in a flawed manner. AI has already been making up sources and names to state things as facts. If there’s a hundred websites for claiming the earth is flat and you ask an ai if the earth is flat, it may tell you it is flat and source those websites. It’s already been happening. Then imagine more opinionated things than hard observable scientific facts. Imagine a government using AI to shape opinion and claim there was no form of insurrection on Jan 6th. Thousands of websites and comments could quickly be fabricated to confirm that it was all made up. Burying the truth into obscurity.
You have plenty of literature that can act as ground truth. This is not a terribly hard problem to solve, it just requires actually focusing on it. Which so far simply hasn’t been done. ChatGPT is just the first “look, this can generate text”. It was never meant to do anything useful by itself or stick to the truth. That all still has to be developed. ChatGPT simply demonstrates that LLM can process natural language really well. It’s the first step in this, not the last.