Artificial Intelligence

Just How Accurate Is ChatGPT?

A look at whether ChatGPT is the ultimate bullsh*tter.

Posted February 6, 2023 | Reviewed by Davia Sills

Key points

A bullsh*tter is someone who shares information with little regard for the accuracy of what is being shared.
ChatGPT is constrained to sharing information that it can curate from various sources to which it has access.
ChatGPT is unaware of the accuracy of its own conclusions.
ChatGPT represents a more sophisticated type of bullsh*tter, and we should not automatically assume its claims or arguments are accurate.

Source: Gerd Altmann/Pixabay

Many people who spend time online have heard of ChatGPT. Users can input prompts to the system and receive responses nearly instantaneously, though there is variability in the accuracy of those responses (Ramponi, 2022). But does ChatGPT have the capacity to know or even estimate the likely accuracy of its response? And if it does not, would that make ChatGPT as much an expert at bullsh*tting as it is an expert information curator?

To explore these questions, it is wise to first begin by defining what I mean by "bullsh*t."

Though we often use the term bullsh*t to mean many different things, Frankfurt (2005) argued that bullsh*t concerns a lack of regard for the truth, which differentiates it from lying (i.e., when someone knows the truth but intentionally misrepresents it). Essentially, a bullsh*tter is someone who shares information with little regard for the accuracy of what is being shared. Frankfurt (2005) further contended that:

Bullsh*t is unavoidable whenever circumstances require someone to talk without knowing what he is talking about. Thus the production of bullsh*t is stimulated whenever a person’s obligations or opportunities to speak about some topic are more excessive than his knowledge of the facts that are relevant to that topic. (p. 19)

So, a bullsh*tter may make what seems like a coherent argument, but that argument is not necessarily confined to actual facts. In other words, a bullsh*tter speaks (or writes) with little regard for whether the information is accurate. So, the issue here is whether ChatGPT is any more aware than a bullsh*tter about the accuracy of the information it shares.

In a recent conversation with ChatGPT, Chamorro-Premuzic (2023) reported that in response to a query about whether ChatGPT wished it could perform human-like activities like burping, the AI's response indicated that its "abilities are limited to processing data and providing responses to user input." So, ChatGPT is constrained to sharing information that it can curate from various sources to which it has access. This means it cannot offer inferences, evidence, or claims that exceed its knowledge base.

But that doesn't mean it is incapable of bullsh*tting.

ChatGPT is more constrained than a human bullsh*tter because it cannot simply make stuff up (i.e., it must have data from which to draw to offer its responses), whereas a human bullsh*tter is not guaranteed to possess such a constraint. Being unable to make stuff up, though, doesn't mean ChatGPT has any awareness of the accuracy of the information provided.

Instead, Bender and Shah (2022) referred to ChatGPT and other large language models as nothing more than stochastic parrots, meaning that they "do not have any understanding of what they are producing, any communicative intent, any model of the world, or any ability to be accountable for the truth of what they are saying" (para. 3). So, in a nutshell, AI systems like ChatGPT might be best described as more constrained bullsh*tters.

We might think of ChatGPT as having access to a really, really large box of Legos; it has access to millions of Legos of all different shapes and sizes. It can put those Legos together in various ways based on the inputs received from the user. And in keeping with the Lego metaphor, ChatGPT was trained using an extensive array of building instructions.

So, when a user inputs a query, ChatGPT predicts, based on the instructions it was trained on, how to put the Legos together in a way that provides the user with what s/he is looking for (such as building a castle or a boat). As long as it can predict with some degree of accuracy which Legos to put together in which ways, it is likely to produce what appears to be a coherent response. But it doesn't actually know anything, nor is it actually thinking before it responds.

Artificial Intelligence Essential Reads

AI Chatbots for Mental Health: Opportunities and Limitations

Is "Digital Thought" the Next Cognitive Domain?

So, although it may provide a coherent response, coherence is not synonymous with accuracy.

Many human bullsh*tters are themselves very good at producing coherent arguments that, to a non-expert, sound quite plausible but are often fully made up. Just because a response is coherent and sounds plausible doesn't make it accurate. In fact, a good bullsh*tter knows how to produce what sounds like educated responses without knowing how accurate those responses are. And although subject matter experts may be able to assess the credibility of the claims made by human bullsh*tters, the general public often is not, especially when the topic is more complex (or seemingly so, as in the case of pseudoscientific B.S.) or esoteric.

And this is no different with ChatGPT. Unless a user queries a topic for which ChatGPT fails at providing a coherent response (the response is obviously incoherent or unintelligible), a user who lacks subject matter expertise has no way of knowing how accurate the response is.

ChatGPT is also largely incapable of offering a truly informed assessment about the weight of the evidence supporting conflicting conclusions (except for arguments where there is a fair amount of consensus, such as flat-Earth theory), meaning that the user is likely to receive something more akin to a both-sides or wishy-washy response (such as the response to Chamorro-Premuzic's query as to whether Stalin, Hitler, or Mao had the capability of making ethical decisions).

As another example, I decided to ask ChatGPT about GMO foods. When asked about the strength of the evidence supporting the safety of GMO foods, ChatGPT argued that "The evidence supporting the safety of genetically modified organisms (GMOs) and GMO-derived foods is strong and consistent." But when asked whether GMO foods are safe, it argued that "The safety of genetically modified organisms (GMOs) and GMO-derived foods is a subject of ongoing debate and scientific study." So, the conclusions it provided varied based on the framing of the input, even though the inputs were semantically similar.

Much of the text that ChatGPT produced in support of these differing conclusions, though, was exactly the same regardless of which of the two inputs were entered, with the supporting text actually more aligned with (i.e., was inductively stronger for) the first conclusion than the second one. As such, it is possible that the way conceptually similar problems are framed actually affects the conclusions produced by the AI (which would be an interesting avenue of future research).

From all this, the only reasonable conclusions I can draw are that ChatGPT is neither (1) aware of the accuracy of its own conclusions or the strength of the arguments on which those conclusions rely nor (2) compelled or constrained to demonstrate conclusion consistency from semantically similar but differently framed inputs^[1]. Therefore, I would conclude that while ChatGPT may have some constraints around its bullsh*tting, it represents a more sophisticated type of bullsh*tter, and we should not automatically assume its claims or arguments are accurate. As such, users would be advised to keep in mind Bender and Shah's (2022) caution that "fluency [or coherence] does not, despite appearances, entail accuracy, informational value, or trustworthiness" (para. 7).

References

Footnotes

[1] Admittedly, this is one example and may not be generalizable to other topics, but I am skeptical that I just happened to stumble upon an isolated example.