Grounded Simulation: faithful, not just fluent
LLM “synthetic users” produce fluent interviews that teams find easy to distrust. Grounded Simulation is a first-principles architecture for keeping them faithful, and auditable.
The fluency trap
Ask a large language model to role-play a user and it will happily talk for hours. The transcripts read well. That is exactly the problem. Fluent does not mean faithful, and a synthetic interview that sounds right is more dangerous than one that obviously fails, because someone will ship a decision on the back of it.
Three failure modes show up again and again. The simulation is self-fulfilling: prime the model with a hypothesis and it obligingly confirms it. It is one-dimensional: a flattened caricature of a person rather than a contradictory, situated one. And it is undefendable: when a stakeholder asks “how do you know?”, there is no chain of evidence to point at. So teams either distrust the output or, worse, trust it anyway.
The thesis: ground the simulation
Grounded Simulation is the architecture I have been building and writing up to address this. The premise is simple to state and hard to do well: a synthetic study should be grounded in behavioral science and cognitive models, not in the vibes of a clever prompt. The model is one component inside a method that constrains it, never the method by itself.
The goal isn’t a model that sounds like a user. It’s a study you can audit in thirty minutes.
Four mechanisms that keep it honest
In practice, and this is the architecture behind the research platform I work on at Articos, faithfulness comes from structure, not from a better persona prompt:
- Hypothesis-blind persona generation. Personas are generated without exposure to the question under test, which removes the most common path to a self-fulfilling answer.
- Protocols grounded in the literature. Study design draws on cognitive models and a corpus of peer-reviewed work rather than ad-hoc instructions, so the simulated behaviour has a reason to resemble the real thing.
- Evidence-chained themes. Every theme links back to the quotes, questions, and hypotheses that produced it, including how many times it was refuted, not just supported.
- Confidence, scored not asserted. Findings carry a confidence signal so a reader can weight them, and an audit layer makes the whole derivation inspectable.
Why auditability is the point
Most debates about synthetic users argue about accuracy in the abstract. I think that is the wrong frame. The useful question is whether a team can defend a decision made with the research, defending it to a skeptical PM, to a leadership review, to themselves in six months. Auditability is what makes that possible. It is also what separates a research instrument from a confident-sounding chatbot.
This connects directly to how I think the rest of AI product work should go: the system produces a strong, inspectable starting point, and a human stays in the loop to judge it. Magical, but accountable.
Status & reading
The formal write-up, Grounded Simulation: A First-Principles Architecture for LLM-Based Synthetic UX Research (SSRN, abstract ID 6503241), is in the process of being published. I’ll link the live paper here once it is indexed; until then, treat this essay as the plain-language version of the argument.