It’s not a 20% failure rate when the chatbot routes calls to a human agent whenever it’s more than x% unsure about what to say.
AI solutions still get the 80% “bottom of the barrel” menial tasks perfectly well.
It wont know it doesn’t know. At the current state of AI, it doesn’t seem to have almost any sense of what is right and wrong or a way to validate that - even when you tell it, it is wrong. Maybe there are systems that can but I am not aware of them.
The current state of AI chatbots, assigns a “confidence level” to every piece of output. It signals perfectly well when and where they should look for more information… but humans have been pushing them to “output something, anything”, instead of excusing itself for not knowing something, or running some additional processes in order to look for the missing information.
As of this year, Copilot has been running web searches to complement its lack of information, and Gemini is running both web searches, and iteratively self-checking its own answer in order to refine it (see “drafts”). It also seems like Gemini might be learning from humanity’s reactions to its wrong answers.
From my understanding, AI is a essentially a statistical method so naturally it will use a confidence level. Its hard for me to take the leap of faith to confidence level will correlate to accuracy. Seems to me it would be more dependent on its data set. If its data contains a commonly held belief, that is incorrect, would it not have a high confidence level on an answer with that incorrect info? If we use a highly authoritative data set, that will be very limited and we’d be back to more of a keyword system than a LLM. I am sure with time, we’ll be in more of a middle ground where accuracy will be better but what will that be? 5% 3% 10%?
I’ll freely admit I am not an expert in this at all.
I thought confidence levels were for image recognition? How do confidence levels work for transformer LLMs?