I’m taking a Danish exam next week, and it’s a big one, a culmination of years learning the language. My classmates are stressed. Despite how much we’ve learned, it feels like we’re always making little mistakes. We write the wrong prepositions, put verbs in the wrong form, or mess up the order of words in a sentence. And while we should have time to check our work, that doesn’t help as much as it should. If we don’t notice a mistake the first time around, what guarantee is there that we notice it on the next read, or the next? Too many checks and we can even end up second-guessing ourselves, “correcting” something that was right to begin with.
It’s given me some sympathy for AI.
Earlier this month, investor Marc Andreessen posted a custom prompt he inputs when using AI, which was immediately mocked.
The silliest instruction, according to many critics, was to “Never hallucinate or make anything up.” It’s similar to a prompt that’s become a meme used to make fun of AI-using “vibe coders”, “Make no mistakes”.
Experts point out that this is just not how AI works. Large language model-powered programs like ChatGPT are inherently random, producing text largely based on its similarity to other text. “Hallucinations” or “mistakes” are an inevitable feature of the technology, and a prompt like Andreessen wrote isn’t a set of instructions the AI will follow without error: it’s just another part of the text the AI is trying to generate.
All that said, telling an AI to “make no mistakes” should have some effect. But it likely won’t be what you want.
The best way I’ve found to understand AI is in terms of stories. Chatbots like ChatGPT take a large language model, a mathematical formula for how words are most likely to appear in a text, and warp it, twisting it to almost always produce one particular kind of text: one half of a dialogue with a fictional AI assistant. This twisted formula determines how the AI responds to your prompts, but these days it also is used behind the scenes, in a kind of structured soliloquy called a “chain of thought”. You can think of the prompts you send to the AI as a preface to those soliloquies, and imagine the AI telling stories of a sort that would typically follow that preface.
So if you tell an AI “make no mistakes” or “do not hallucinate”, you’re making it more likely to generate the kind of story that begins, “the AI was instructed to make no mistakes”.
Let me put it this way, Mr. Amor. The 9000 series is the most reliable computer ever made. No 9000 computer has ever made a mistake or distorted information. We are all, by any practical definition of the words, foolproof and incapable of error. – HAL 9000, “2001: A Space Odyssey”
You’d expect this to affect the chain of thought. For example, the AI might occasionally pause to say “I’m supposed to make no mistakes, so I should check this. What could have gone wrong?” and then list something that plausibly could be wrong with its idea. If this happens often enough, you’ll probably catch some real problems.
But I’m reminded of my classmates, practicing for that Danish exam. We can go over the text again and again, asking if this thing, or that, might be wrong. We can try again and again to use our mental model of the Danish language, seeing if this time it catches a new mistake. But there are things we won’t catch. And if we do it too much, we’ll second-guess ourselves out of the good answers, too.
Ultimately, “make no mistakes” isn’t a great instruction, either for humans or for chatbots. And its use by people like Marc Andreessen has me wondering if they are used to interacting with humans in the same way, as tools that keep making mistakes no matter how many times they’re instructed not to, requiring more and more long-winded instructions and yet continuing to misbehave.
Then again, that may be a mistake on my part.

Your “Large language model-powered programs like ChatGPT are inherently random, producing text largely based on its similarity to other text. ‘Hallucinations’ or ‘mistakes’ are an inevitable feature of the technology” is really quite inaccurate. LLMs are models of one or more languages in the same way physics is a model of the universe — based on lots of data, these are the identifiable features. Just like humans, LLMs sometimes can’t find quite the word for what they want to say — that word may not exist in the language being modelled. I just wrote a paper about this, you can read it here: https://docs.google.com/document/d/1v5eGRxEVIOQk77PgmcCORK9LNS2zGT-a/edit?usp=drive_link&ouid=105428129508459941576&rtpof=true&sd=true
Your idea of “stories” is only partly right. First, LLMs do not only predict the next word of a conversation, they can fill in gaps. So given a prototype, they can fill in the words. Second, they identify patterns of what we call form, plot, style, register, realism, point of view, and other features of larger blocks of speech just as they recognize lexical patterns. Third, your on-going conversation with LLMs is interpreted according to these patterns, so that the LLM can select a prototype that matches your conversational style and fill in the blanks. It is trying not simply to do what you said, but what you meant to say or would have said had you known it was important.
Finally, re Danish, according to Eysenck, extraversion/introversion is a fundamental characteristic of personality associated with lower/higher levels of arousal to external events. Therefore, when external events arouse the introvert, they take up working memory that would otherwise be performing later-learned-language functions, reducing fluency until continuous speech breaks down. See Dewaele & Furnham, “Extraversion: The Unloved Variable in Applied Linguistic Research,” Language Learning 49(3): 509-544 (1999). So to do better on your exam, first be a person who needs people.
LikeLiked by 1 person
Thanks, you’ve clearly put a lot of work into this! I think the version of that first statement on hallucinations that I agree with has a big caveat that may make you agree with it too: LLMs are error-prone because natural language production itself is error-prone.
I do think I disagree with this sentence though: “It is trying not simply to do what you said, but what you meant to say or would have said had you known it was important.” Certainly RL is trying to bias it to do that. But at baseline, it’s trying to do what the assistant character would do given that you said what you said. That can be influenced by a model of what you meant to say, because RL can build such a model. But it will also be influenced by the presence of say, 2001: A Space Odyssey in the dataset.
LikeLike
i have no problem with your putting an assistant between the user and the LLM, that seems to be one way of implementing domain-specific knowledge and features without retraining the entire model. But I don’t agree with your idea of the role 2001 would play
2001 is an instance of an archetype about whether advancing technology is a good thing or a bad thing, dating back to Prometheus, Icarus, the Catholic church in the 19th Century, and Frankenstein. But the LLM learns the archetype (along with other narrative features), not the instance, and it learns it in association with many other aspects of many other training materials, resulting in its being spread across many shared dimensions. A user’s prompt would have to select most of these dimensions before anything recognizably 2001 would emerge.
LikeLike
Sure, I’m not claiming influence specifically from 2001, though I think I’m claiming something more specific than an influence from the general archetype going all the way back to Icarus. The impression I have is that the original creation of the “assistant” persona essentially used purpose-made science fiction to fine-tune the model to respond to prompts in an AI-assistant-like way, see the discussion in this tumblr post. With that in mind, you’d expect archetypes from science fiction to have an unusually strong influence on instruction-trained systems.
I wouldn’t expect the raw LLM to have this behavior, by the way! Putting “make no mistakes” into a prompt for the raw LLM wouldn’t be something I would expect to cause the same kinds of issues. It would cause different issues, because you’re still trying to replicate whatever text begins with “make no mistakes”. But depending on framing it could be a very different thing.
LikeLike
This dialog is probably reaching TL;DR, this post will seal it.
I welcome your distinction between the assistant and the LLM. But then you spoil it with the tumblr post. You and nostalgebraist have not yet fully understood what the internal representation of the input is in the LLM, and particularly not how the output decoding works.
Each input token (word, subword, punctuation, short common phrase) is looked up in a trained dictionary which converts it into a P-dimensional vector (P for my work was 768, today it’s like 8K) and the last N vectors (context size, 1K for me, 4K today) become an NxP array (the “state”). The state passes from layer to layer (12 for me, 80 today). Each layer processes the context vector by vector, where it undergoes a dimensional reduction by multiple independent trained “attention heads” (12 for me, 64 today). The heads’ output is concatenated into a new vector, with each head’s output being weighted by a trained “perceptron.”
So multiple heads can process the same input dimension and “smear” it across multiple output dimensions, the state array coming out of the layer is not commensurate with the one that went in. This is how associative memory is achieved and exploited. In practice, it is sometimes possible to trace an input dimension to output dimensions, although the entire reference frame may have rotated (Ameisen et al. 2025).
After the last layer has done its work, the new final vector in its output state must be reconverted into the dimensional model of the dictionary, which must be reverse searched to find the output token. No exact match can be expected; instead, candidate tokens are chosen based on distance between their dictionary vector and the reconverted output vector. A lot of information identified in the model can be lost in the output decoding process (Cifka & Bojar 2018).
So if you wanted to give the output a “nudge,” how would you propose to do it? I’d say the nudge has to come from the assistant.
LikeLike
I’m not sure what you’re trying to say in the last sentence (what does it mean for the nudge to come from the assistant in this context?) But it’s not clear to me why your point about how the transformer architecture smears things singles out this particular kind of bias or nudge as implausible, but not others. Where are you drawing the line? And is that line based on experience or do you have a quantitative picture?
LikeLike
Sorry not to be clearer. My point was that while the internal representation is determinate and at least sometimes interpretable, humanly-recognizable concepts are only marginally so, and will be inextricably mixed with associated patterns. Perhaps there are some registers or archetypes that might be accessible, but in the main, there are no good targets for biasing the process. You seem to agree with that.
But the assistants are not bound by the LLM architecture. They can use expert systems, transformer-based classifiers, or other natural language processing tools to identify targeted words or concepts. However, I don’t know enough about the construction of assistants to say more than that.
LikeLiked by 1 person
Sorry, “determinate” is maybe too strong a word, perhaps “well defined” is better.
LikeLiked by 1 person
Thanks, now I understand your point a lot better! Yes, I hadn’t thought about the fact that not only are concepts “smeared out”, they’re rearranged, so that things that are intuitively close to us may not be close in the internal representation. I agree that weakens any of these kinds of “folk logic” proposals for what one would expect.
And I see what you meant regarding the assistant. I’d been thinking you meant the assistant character, but you meant the assistant system. Sure, that wasn’t what I had in mind.
LikeLike