“Is this AI written?” is a difficult/impossible question. “Did you write this?” is not. Running the language model against a text and recording its “amount of surprise per token” for all the released GPT x.y variants is something they definitely can do.
The issue is that AI detection and AI training are very similar tasks. Anything that can be used reliably to detect an AI written article can also be used to improve it’s training, and so becomes obsolete.
Meanwhile, a lot of people write in a manner that “looks” like an AI wrote it. This leads to the FAR more serious problem of false positives. Missing an AI written paper at school or university level isn’t a big deal. A false positive could ruin a young person’s life however. It’s the same issue the justice system faces.