What Happens When You Show AI 100 Brand Directions
Quick answer: When you run a Constellations perception test through AI models and ask them to complete it the same way a human panel would, something specific happens. A year ago, AI couldn’t decide at all, every result came back an even split, a complete wash. Today the models give semi-intelligent answers, and on simple, predictable prompts they do okay. But on the nuanced, abstract prompts, the kind branding actually depends on, they’re wildly off from the human data. The dangerous part isn’t that AI gets it wrong. It’s that AI now produces a compelling, convincing rationale for an answer that doesn’t match what real people see.
Want to see what your audience thinks about
YOUR creative?
We’ll Run a test… Free.
AI has seen your brand direction.
And 999,000 others.
We showed AI 100 brand directions. Most looked the same to it. This article breaks down what the machine noticed—and what your audience likely does too.
You can run the same kind of test on your own work, before it looks like the other 999,000.
We gave the test to the machines
The test has been run through AI at least half a dozen times. Each run asked the model to complete the same number of responses the human panel produced, anywhere from 20 to 600 depending on the test. The setup was simple: same prompt, same images, same task. The only variable was whether a person or a model was doing the responding.
The point wasn’t to embarrass the AI. It was to find out whether a machine could reproduce what a human audience does when it reacts to a concept. If it could, perception testing would just be a thing you could simulate. If it couldn’t, that tells you something important about what the test is actually capturing.
“Random dots” was last year’s story
The original soundbite was that AI produced random dots. We asked a human audience what “American” looks like and 4 different AI models. The human audience had clear collective preference with statistically significant groupings. The AI models presented no clustering, no patterns. A year ago that was literally true. The models couldn’t make up their minds. Everything came back evenly split, which on a perception map looks like a complete wash, no signal anywhere.
That’s not where things stand now, and it’s worth being precise about it, because the easy version of this story is already out of date.
AI can cluster now. It has gotten meaningfully better at taking the test over the past year. It now produces semi-intelligent answers instead of an even smear. On tests you’d regard as easier to forecast, the ones with a fairly predictable right answer, it does okay. Not fantastic, but okay.
The trouble shows up exactly where branding lives: the nuanced, abstract prompts. The kind of question a real brand exercise asks. On those, the model is wildly off from the human data. And it’s off while sounding completely sure of itself.
The compelling-rationale problem
Here’s the part that should make any creative director uneasy.
When AI is wildly off, it’s wildly off with a very compelling rationale. It will explain its decision, describe the pattern it claims to see, and lay out reasoning that would be genuinely convincing, if you didn’t have the human data sitting next to it for comparison.
A year ago, a model that couldn’t decide was at least honestly useless. You looked at the even split and knew you had nothing. Today’s models are more precarious, not less, because they hand you a confident answer with a story attached. Take that answer at face value, skip the human panel, and you’ve got a synthetic audience telling you a beautifully argued thing that real people don’t actually feel.
Then the campaign underperforms, and you’re left scratching your head, guessing why it didn’t land, holding a rationale that sounded airtight. The risk was never that AI couldn’t generate an answer. The risk is that it generates a persuasive one that’s disconnected from the people you’re trying to reach.
There are full comparison reports behind this, human data run against Claude, ChatGPT, and Gemini on the same tests. The pattern holds across them. The machines can produce a confident reading. They can’t reliably produce the human one.
Why this matters to a creative director
If you only take one thing from this, take this: AI being able to cluster doesn’t make it safe to substitute for your audience. It makes it more dangerous, because the failure mode got harder to spot.
A creative director’s job is to make work that moves a specific group of people. The only way to know whether it does is to ask those people. A synthetic audience can now imitate the format of that answer convincingly enough to fool you. That’s not a reason to use it. It’s a reason to be more careful, and to keep a human panel as the thing you check the machine against, not the other way around.
A note on information theory, since it explains the whole thing
Claude Shannon’s work is where this gets interesting, and it’s worth being careful with, because it gets misrepresented constantly.
Shannon’s basic idea is that communication is a balance between redundancy and entropy, or in plainer terms, the expected and the unknowable. Neither one is good or bad. They’re both necessary. Redundancy is the familiar part that gives you context to understand something. Entropy is the unexpected part that carries new information. Pure redundancy is the same thing over and over, and it gets rejected, it doesn’t even qualify as communication. Pure information, with no redundancy at all, is unintelligible noise. Communication is the balance of the two.
Apply that to a visual and you get a useful, concrete idea: the entropy of how an audience responds. If everyone reads an image roughly the same way, the response is low-entropy, semantically narrow, the image communicates decisively. If responses are all over the place, the image is high-entropy, genuinely ambiguous. “This visual is unclear” stops being a hunch and becomes something you can put a number on. You could even show a client that their hero image carries more perceptual ambiguity than a competitor’s, meaning the competitor is communicating more decisively. That’s the kind of number that lands.
Here’s the part the gurus and motivational speakers always skip, and it was a genuine pet peeve of Shannon’s: his theory deliberately had nothing to do with meaning. He said outright that the meaning of a message was irrelevant to the engineering problem he was solving. He cared whether a transmission was received, not what it meant.
Which is exactly why this is interesting rather than a misappropriation. Constellations measures meaning to an audience, the thing Shannon set aside. He wasn’t wrong to set it aside, that wasn’t his problem to solve. But the tools he built for measuring the spread of a signal turn out to give you a real, calibrated way to measure how clearly a piece of creative communicates. Shannon would probably have found that worth a look. He just would have insisted you not pretend he’d already done it.
Frequently asked questions
Can AI take a perception test now? Yes. A year ago AI couldn’t decide and produced an even split, a complete wash. Today it clusters and gives semi-intelligent answers. On simple prompts it does okay. On the abstract, nuanced prompts that branding depends on, it’s wildly off from human data.
If AI can cluster, why not use it instead of human panels? Because the failure mode got harder to catch. AI now produces a confident, compelling rationale for an answer that doesn’t match what real people feel. Take it at face value and you risk a campaign that underperforms for reasons your synthetic audience never warned you about.
What does this prove about the test itself? That it captures something specific to human perception, the actual response of real people to a concept, that a model can imitate in form but not reliably reproduce in substance.
How does Shannon’s information theory connect to this? Shannon gives you a way to quantify how spread out an audience’s responses are, which translates to how clearly a visual communicates. His theory deliberately ignored meaning, which is exactly the gap a perception test fills, by measuring what an image means to an audience rather than just whether a signal got through.
Design Meets Data—Stay in the Loop
We’re just getting started. Subscribe below to get more studies, reflections, and visual data insights straight to your inbox
Mailing List
Sign up to participate in interactive visual surveys and receive exclusive analysis reports on timely, trending topics—all from a visual perception perspective. You'll also get product updates, creative case studies, and smart ways to sharpen your visual strategy.


