A friend asked for help turning a vague corp-dev idea into something concrete. Two hours later we had sixteen structurally coherent prototypes — and a growing suspicion that most of them only looked right at first glance.

Sixteen interfaces. Two hours. One uncomfortable question.

This is not a product review. It is an investigation into what happens when you hand a vague, real-world product idea to an AI design tool — and then ask a trained designer to tell you which of the results are actually good designs, and which ones are just wearing the right aesthetic clothes.

Chapter 01
The spark

It started, as many things do, with a late-night message. A friend who works in corporate development pinged me with an idea: a tool to replace the patchwork of Excel sheets, email threads, and Bloomberg tabs that make up his daily M&A workflow. He wanted something that could hold a strategic thesis, track acquisition targets underneath it, and surface signals — news, earnings, analyst reports — without requiring him to go find them.

9:41LTE 100%

Friend

Today 2:13 PM

For a given company, you (manually or automatically) set its targets based on a target industry, then your daily output is a) a dashboard of news and financial data (if available) for your coverage portfolio, b) an email/text/slack notification for especially big news OR for your sweetheart must-have targets, c) big formal update on earnings releases because they have a lot of data and move the markets significantly, and d) [hard] some kind of aggregated synthetic index of whether prices (ev/ebitda) are going up or down.

Delivered

What the tool must produce — daily

Portfolio Dashboard

News and financial data for your coverage universe. Every target, every morning, in one view.

Smart Alerts

Email, text, or Slack when especially big news breaks — or when a sweetheart must-have target moves.

Earnings Briefings

Formal, data-dense update on earnings releases. They move markets; they deserve their own format.

d) hard

Pricing Index

A synthetic, aggregated signal of whether EV/EBITDA multiples across your coverage are trending up or down.

Vague, but concrete enough to start. Two hours later we had sixteen visually coherent prototypes — legible enough to hold a real design conversation. Which is when things got interesting, because the conversation revealed how many of them were wearing the right clothes without understanding the job.

Co-written with Claudiu Hutanu (LinkedIn), a product designer who provides critique on how to use AI driven design tools effectively to generate quality outputs.

Chapter 02
Why speed changes everything

If you’ve ever hired a designer and waited three weeks for a first-look deck, this process breaks the equation. Not because it replaces the designer — it doesn’t — but because it collapses the time between “I have a vague idea” and “I can have a real conversation about it” to a single afternoon.

That compression is the actual story here. Sixteen directions in two hours sounds like a productivity gain. It is also sixteen opportunities to mistake visual plausibility for operational validity — before anyone has asked the hard questions. Speed produces more decisions, faster. It doesn’t make the decisions better.

We rejected three directions before the designer even saw them — not for visual reasons, they looked fine — but because the interaction model was identical to another direction and the visual register was internally contradictory. That judgment took a human. The brief was excellent. The output still needed someone to read it critically.

Total time from idea to 16 working prototypes

Distinct design directions generated

Rounds of /grill-me to close the brief

Aesthetic registers across all directions

The most useful thing we found wasn’t what the AI did well. It was learning to see the gap between visual credibility and operational validity — and understanding how easily one masquerades as the other.

Chapter 03
How we ran it

The process had three distinct phases, and the sequence mattered. You can’t skip to generation — the quality of your output is entirely determined by the quality of your brief. Here’s exactly how those two hours unfolded.

Three sentences to get started

My friend wrote three sentences describing the primary screen. Vague, but enough to trigger the next step.

~5 min

/grill-me — the non-negotiable step

A Claude Code skill that interviews you one question at a time until every ambiguity is closed. Nine rounds in, we had a named user, a primary task, real data constraints, and explicit “must not” rules. Most of the quality in the final output traces back here.

~40 min

Build the reference library

We defined four aesthetic registers — Institutional, Modern SaaS, Hybrid, and Canvas-specific — and assigned references within each. Every direction would declare one register and at most two references. This is what keeps sixteen outputs from collapsing into one.

~15 min

Generate 16 directions in one pass

A single prompt, the full brief, 16 self-contained HTML prototypes. Quality visibly fades after direction 10 — the model hedges, blends registers, avoids commitment. That’s expected. Directions 1–10 show you what’s possible. Directions 11–16 show you where the model runs out of distinct positions.

~50 min

Designer critique

Claudiu walked every direction and scored each on four axes: visual credibility, workflow realism, information density, and actual usability. This is where the faking became visible.

~30 min

The prompt

My friend said he wanted to make something. Three sentences describing the primary screen. Vague, but enough to start a design sprint.

I asked for three sentences that let a stranger picture the screen. Good enough to grill.

`/grill-me` — the key step

/grill-me is a Claude Code skill that interviews you one question at a time until every open branch is closed. Nine rounds in, the spec was a structured document: audience, tasks, three mandatory screens, data model, what the design must not do.

The skill does not write your spec. It refuses to let you start the next step until the open questions are closed. Most of the quality achieved in the final deliverables traces back here.

Exhibit: The full design brief from /grill-me

The friend’s brief above was a paragraph. The document below is what came out of nine rounds of /grill-me — the structured prompt that drove the design-generation steps. Open it if you want to inspect; skip it if you don’t.

Show the full spec document

Design brief: M&A Intelligence Tool for Corporate Development

Produce exactly 16 differentiated design directions for an internal tool used by in-house Corporate Development (M&A) teams at major companies. Single-pass response. Quality may fade across later directions; accepted.

For each direction, deliver:

A design.md (≤400 words) following the format conventions in VoltAgent/awesome-design-md — use it as methodology only, not as aesthetic moodboard. Must cover: register, canvas metaphor, aesthetic refs and what’s borrowed, color tokens, type scale, component inventory.
A self-contained HTML artifact rendering 3 screens stacked vertically with anchor links. Real fonts, real colors, real-looking fake data.

Plus one top-level index artifact: an HTML page listing all directions in a comparison table with columns for thesis line, register (i/ii/iii), canvas metaphor, primary aesthetic reference, link to each direction’s HTML artifact.

Product spec

This is a tool for VPs and Heads of Corporate Development at $10B+ companies.

Core job:

Determine where the company should grow over the next 3–5 years.
Build strategic theses.
Map acquisition targets to those theses.

Strategy is the spine. Monitoring (news, earnings, financials) is the daily output of having a strategy, not the spine itself. This tool should not feel like Bloomberg/CIQ/AlphaSense “monitoring-first.” It should feel like strategy-first with monitoring overlaid. The moat here is the strategy lens: capabilities → adjacencies → theses → targets, with monitoring tagged into context.

Data model (directed graph)

Capabilities (upstream) — the company’s existing strengths, technologies, platforms.
Adjacencies — markets the company could grow into.
Theses — strategic bets (e.g. “electrify commercial fleets,” “vertically integrate batteries”). Each thesis carries: rationale (which capabilities justify it), right-to-play assessment, and a build/buy verdict (organic via incubation, organic via capex, or inorganic via M&A).
Targets — companies tracked under each thesis (relevant only when verdict = inorganic).

Edges are typed (justifies, serves, blocks). One capability can support multiple adjacencies. One target can serve multiple theses. Theses are created top-down by the user (no emergent/AI-suggested theses in v1).

Monitoring layers onto this graph: news, earnings, analyst research, banker decks, manual intel tag into the relevant nodes.

Primary user + daily ritual

Persona: VP / Head of Corp Dev. 40s. 20 minutes/day in the tool. Reports to the CFO. The screen is visible to the CEO walking past — must read as serious work.
Daily ritual: open app → instantly see what moved overnight on the strategy map. Activity is rendered into the canvas — nodes with new news glow, theses with reporting targets get badges, capabilities with new tech announcements shift color.

Secondary persona

Corp Dev analyst uses chronological side feed for long sessions.
Every direction must include a side rail/feed (position and style can vary).

Mandatory variation axes

At least 3 in Register (i) — institutional
At least 3 in Register (ii) — modern SaaS
At least 6 in Register (iii) — hybrid
No two directions may share a canvas metaphor
No direction blends more than two aesthetic references

The reference library

The reference library is not a moodboard. It is a constraint document. Each aesthetic anchor defines two things: what you are borrowing (structural vocabulary, type scale, color logic) and — critically — what you are not borrowing. “Inspired by Bloomberg” without that second clause produces directions that inherit Bloomberg’s visual weight without the operational depth that earns it.

To generate distinct and unique designs, we prompted Claude Design to work within four registers, one per aesthetic territory:

Institutional / data-dense – Bloomberg-style finance-terminal density
Modern SaaS (Linear, Notion, Pitch, Height) – keyboard-first restraint
Hybrid (Stripe Dashboard, Vercel, Modern Treasury, Causal) – rigorous data with type discipline
Canvas-specific (Foundry, Kumu, Obsidian graph, tldraw) – for the interaction model

Register	Description	References
Institutional	Finance-terminal density. Serious, dense, monospace-driven.	Bloomberg Terminal, Palantir Foundry, Koyfin, Capital IQ
Modern SaaS	Keyboard-first restraint. Clean type, command-palette culture.	Linear, Notion, Pitch, Height
Hybrid	Institutional data with SaaS type discipline.	Stripe Dashboard, Vercel, Modern Treasury, Causal
Canvas	Interaction model orthogonal to register — applies across all three.	Palantir Object Explorer, Kumu.io, Obsidian graph, tldraw

The design references

Every direction declares one register, picks at most two references, and spells out in plain terms what it takes from each reference. That is what keeps sixteen directions from collapsing into one.

One additional input: awesome-design.md as methodology, not moodboard. If your company or brand already has a style guide or brand reference, you should use that. Since we had neither, we defaulted to awesome-design.md for some inspiration which greatly sped up our design process.

For reference, DESIGN.md is a new concept introduced by Google Stitch. It is a plain-text design system document that AI agents read to generate consistent UI. Awesome-design.md is a public repository which houses a curated collection of design system documents extracted from very famous and widely used public websites.

We also established a shared fictional company — Acme Mobility Corp, a $40B industrial conglomerate — and held it constant across all sixteen directions. If each direction gets different content, you can’t tell whether one prototype works because the design is better or because the story is easier. Holding the company constant isolates design as the variable.

Claudiu — designer’s note on the constraint set

30-second read: the register system is the right instinct, but it stops short. What’s missing is an hour of manual user-journey work upfront, plus two harder asks of the AI: rank every element by priority, and list the established UX patterns before picking a layout. The shape of the role is shifting — the designer becomes a creative director with an AI team underneath.

One register, two refs max — keep it. Commitment is what stops AI output from feeling anonymous. Averaging across everything is the failure mode.
Spend an hour mapping the journey by hand. Who shows up, what they’re doing, what breaks the happy path, what the screen looks like on a quiet day versus a loud one. Without this map, every downstream design decision is unanchored.
Force a priority ranking per screen. Ask the AI to declare primary action, secondary action, reference-only. Not what’s on the screen — what the eye should land on first, second, never. Without the ranking, the model defaults to equal weight, which is hierarchy refused.
Ask for the pattern menu before the layout. Make the AI list the established UX patterns for this class of problem — monitoring with alerting, decision support under pressure, exception-based workflows — with tradeoffs named. Pick from the menu. That turns the AI from a layout generator into a research summarizer, which is the role it’s actually good at.
The role shift is the real point. I see myself here as a creative director with an AI team underneath. The team generates as many solutions as the brief can absorb. My job is the brief, the critique, and the pick — judgment about which direction actually solves the problem. The AI scales the options. The designer still owns the decision.

One caveat with our approach, which we were aware of and acknowledged up front, is that quality decays toward the back. Designs twelve through sixteen are noticeably weaker. That is fine. The point of generating sixteen distinct designs is to see the shape of the space, not to ship the median.

Abstract 4x4 grid of sixteen miniature dashboard sketches, each in a different layout style — One brief, sixteen lenses.

Chapter 04
The result — 16 directions

Start with the index, then move through each direction in turn. Click Expand on any card to view the full dashboard inside the article, or Open standalone for the full viewport — the layouts are designed for the whole screen.

Claudiu — first look at the gallery

30-second read: walking the gallery as a creative director reviewing a team’s output, what’s clear is that each direction picks a recognizable design pattern and renders it competently. The judgment work — does this pattern actually fit the problem — is where most of them fall apart. Ledger and Atlas guide the eye the worst. Funnel reads cleanest at first glance. None of them is shippable. AI surfaces patterns; it doesn’t nail UX. And at this scale, doing it thoughtfully is cheaper than doing it sixteen times over.

The role this puts me in is creative director, not designer. Sixteen directions on the table, generated by a team I didn’t have to staff. My job is to sort them by pattern, judge the fit, and pick what’s worth iterating. The generation is cheap. The judgment is the whole value.
Ledger and Atlas have the worst information architecture. Ledger is a dense table where every cell carries equal weight — nothing tells the eye where to land or what the next action is. Atlas dresses a geographic map in editorial typography, which looks credible from across the room but actively misleads navigation: physical location encodes nothing about strategic priority. Both fail the same test — the eye doesn’t know where to go.
Funnel is the one that reads at a glance. Left-to-right stage progression is a pattern people already know. Cards are discrete. The columns name themselves. You understand the screen in two seconds, which is the bar for a tool used under time pressure. It’s the only direction where first-glance comprehension is doing the work the design promises.
None of these is ready for build. Even Funnel is a pattern, not a product. The pattern works; the UX details — what happens on hover, how a card moves stages, what state the screen takes on a quiet day versus a loud one — aren’t resolved in any direction. Shipping any of these as-is would mean shipping the surface of a design without the spine.
At this scale, thoughtful beats brute force — and it’s the more sustainable bet. Generating sixteen directions is fast, but every direction carries a real cost: compute to produce it, and far more expensive, human attention to review it. Half the gallery is the model hedging — mixing registers, softening commitments — and reviewing those weak directions burns the scarcest resource in the room, which is the reviewer’s judgment. A tighter brief that produces six defensible directions costs less to generate and far less to critique. For explorations this size, doing it deliberately isn’t just better design practice; it’s the lower-footprint, more sustainable way to work.
This is the honest limit of AI-driven design today. The model is fluent in the pattern library — kanban, sankey, force-directed graph, 2×2, radial — and renders each one competently. What it doesn’t do is pressure-test whether a pattern actually carries the user through the task. Pattern fluency is not UX. That’s still the human job, and the gallery makes the gap visible.

What broke immediately, across the set

▼

Fake information density

Dense layouts that look data-rich but contain the same three data points arranged differently.

▼

Shallow workflow models

Every direction displays data. None model what the analyst actually does with it, or when, or why.

▼

Identical interaction patterns

Visual registers varied widely. Interaction models did not — click, view, scroll across all sixteen.

▼

Semantic-free graph edges

Graph metaphors drawn correctly, but edges encode nothing. Proximity means nothing. Weight means nothing.

▼

Simulated intelligence

Conviction scores and EV/EBITDA indices looked computed. They were hand-placed. The methodology doesn’t exist.

▼

Quality decay after direction 10

The back half hedges. References blur. The model ran out of distinct positions before the brief ran out of slots.

#	Direction	Register	Canvas Metaphor	Aesthetic Anchor	Link
01	Ledger	Institutional	Structured ledger / table-as-canvas	Capital IQ + Bloomberg	Open →
02	Atlas	Hybrid	Geographic/spatial map of theses	Stripe Dashboard + Koyfin	Open →
03	Bridge	Hybrid	Sankey: capability → thesis → target	Modern Treasury + Vercel	Open →
04	Constellation	Institutional	Force-directed graph / node-link	Palantir Foundry + Koyfin	Open →
05	Conviction	Modern SaaS	Kanban of theses by stage	Linear + Pitch	Open →
06	Heatfield	Institutional	Hex/grid adjacency map	Bloomberg Terminal + Palantir	Open →
07	Roadmap	Hybrid	Timeline: theses vs. market trends	Causal + Stripe Dashboard	Open →
08	Funnel	Hybrid	Pipeline funnel: Explore → Closed	Vercel Dashboard + Causal	Open →
09	Briefing	Modern SaaS	Document hierarchy with graph view	Notion + Height	Open →
10	Signal	Hybrid	Radial starburst: capability-rooted	Modern Treasury + Linear	Open →
11	Council	Modern SaaS	Swimlanes by strategic horizon	Height + Pitch	Open →
12	Quilt	Hybrid	Tiled/mosaic canvas overview	Obsidian graph + Whimsical	Open →
13	Treasury	Hybrid	EV/EBITDA index + financial view	Koyfin + Modern Treasury	Open →
14	Compass	Hybrid	Directional target prioritization	Causal + Kumu.io	Open →
15	Arena	Hybrid	Competitive landscape matrix	tldraw + Excalidraw	Open →
16	Loom	Hybrid	Woven/thread narrative canvas	Whimsical + Pitch	Open →

Score summary — all 16 directions across 4 evaluation axes

01 Ledger

02 Atlas

03 Bridge

04 Constellation

05 Conviction

06 Heatfield

07 Roadmap

08 Funnel

09 Briefing

10 Signal

11 Council

12 Quilt

13 Treasury

14 Compass

15 Arena

16 Loom

Strong (7–10) Moderate (5–6) Weak (1–4)

Loading preview…

Expand →

01Institutional

Ledger

Table-as-canvas · Capital IQ discipline

✓ Worked

Restrained typography held throughout
Numbers had clear visual priority
Density felt earned, not forced

✗ Failed

No interaction model — clicking a row leads nowhere
Displays data but never directs attention
Every metric treated as equally important

Visual credibility

8/10

Workflow realism

4/10

Info density

9/10

Actual usability

5/10

↗ Standalone

The only direction where restraint was a deliberate design choice rather than a default.

Loading preview…

Expand →

02Hybrid

Atlas

Geographic map · Stripe + Koyfin

✓ Worked

Spatial orientation was immediately readable
Color-coded regions surfaced status at a glance

✗ Failed

M&A strategy doesn’t map to geography — metaphor is decorative
Physical location encodes nothing about strategic fit

Visual credibility

7/10

Workflow realism

3/10

Info density

6/10

Actual usability

4/10

↗ Standalone

Geography is decorative here — the metaphor adds visual distinctiveness without adding analytical value.

Loading preview…

Expand →

03Hybrid

Bridge

Sankey flow · Modern Treasury

✓ Worked

Sankey capability→target is conceptually correct
Flow structure matches the underlying data model

✗ Failed

Static — no interaction on edges or nodes
Flow proportions were invented, not derived

Visual credibility

8/10

Workflow realism

5/10

Info density

7/10

Actual usability

5/10

↗ Standalone

Conceptually the most correct flow model; the failure is that it never made the flow interactive.

Loading preview…

Expand →

04Institutional

Constellation

Force-directed graph · Palantir

✓ Worked

Node-link structure correctly expresses the graph model
Dark institutional palette held the register

✗ Failed

Node proximity encodes nothing meaningful
Edge weight is decorative — what does thickness mean?

Visual credibility

9/10

Workflow realism

3/10

Info density

7/10

Actual usability

3/10

↗ Standalone

Most visually authoritative. Lowest operational value. The gap between the two is the whole lesson.

Loading preview…

Expand →

05Modern SaaS

Conviction

Kanban · Keyboard-first, presentation-grade type

✓ Worked

Stage-based kanban correctly models thesis progression
Card anatomy was consistent across columns

✗ Failed

All thesis cards carry equal visual weight — no urgency signal
Restrained aesthetic undercuts financial seriousness

Visual credibility

8/10

Workflow realism

6/10

Info density

5/10

Actual usability

6/10

↗ Standalone

Works as a lightweight tracking view; fails as a strategic decision-making surface.

Loading preview…

Expand →

06Institutional

Heatfield

Hex grid · Terminal density

✓ Worked

Grid density communicates coverage breadth
Color-field signals status at a glance

✗ Failed

Hexagons are arbitrary — adjacency encodes nothing strategic
Most visually convincing with the weakest operational logic

Visual credibility

9/10

Workflow realism

2/10

Info density

8/10

Actual usability

3/10

↗ Standalone

The strongest argument for why visual credibility and workflow realism are different scores.

Loading preview…

Expand →

07Hybrid

Roadmap

Timeline · Causal + Stripe

✓ Worked

Correctly surfaces the temporal dimension others ignored
Now/Next/Later labels added real structural clarity

✗ Failed

Assumes theses have fixed timelines — they don’t
Gantt implies execution scheduling, not strategic monitoring

Visual credibility

7/10

Workflow realism

5/10

Info density

6/10

Actual usability

5/10

↗ Standalone

Gets the temporal dimension right; gets the M&A iteration loop entirely wrong.

Loading preview…

Expand →

08Hybrid

Funnel

Pipeline stages · Minimal chrome

✓ Worked

Reads cleanest of all sixteen at first glance
Cards are discrete; columns name themselves

✗ Failed

Borrows from sales CRM logic, not corp dev reality
M&A is iterative — the funnel implies one-way flow

Visual credibility

7/10

Workflow realism

4/10

Info density

6/10

Actual usability

5/10

↗ Standalone

Reads cleanest at first glance — but it is a pattern, not a product.

Loading preview…

Expand →

09Modern SaaS

Briefing

Doc hierarchy · Notion + Height

✓ Worked

Document metaphor suits analysts who write and annotate
Sidebar navigation was genuinely navigable

✗ Failed

Turns active strategy into passive reading
Document frame is excellent for storage, weak for decisions

Visual credibility

7/10

Workflow realism

6/10

Info density

5/10

Actual usability

6/10

↗ Standalone

Excellent for analysts who write and file; weak for analysts who decide and act.

Loading preview…

Expand →

10Hybrid

Signal

Radial starburst · Modern Treasury + Linear

✓ Worked

Radial layout made capability→thesis visible instantly
Important number was isolated — genuinely hard to do

✗ Failed

Doesn’t scale beyond 6–8 theses before becoming unreadable
Radial metaphor reads as decorative after first impression

Visual credibility

8/10

Workflow realism

5/10

Info density

7/10

Actual usability

6/10

↗ Standalone

The conviction index framing is right. The radial metaphor earns its place here.

Loading preview…

Expand →

11Modern SaaS

Council

Swimlanes · Height + Pitch

✓ Worked

Swimlanes by horizon correctly surfaces temporal grouping
Multi-view affordance suited the data model

✗ Failed

Swimlanes too wide — whitespace undermined seriousness
Restrained palette felt underpowered for financial decisions

Visual credibility

7/10

Workflow realism

6/10

Info density

4/10

Actual usability

6/10

↗ Standalone

Technically sound. Aesthetically understated to the point of feeling unserious.

Loading preview…

Expand →

12Hybrid

Quilt

Mosaic canvas · Obsidian + Whimsical

✓ Worked

Tiled layout communicated coverage breadth clearly

✗ Failed

Weakest metaphor — tiles encode no relationships
A grid with aesthetic variation dressed up as a canvas
The model should have stopped here

Visual credibility

6/10

Workflow realism

2/10

Info density

5/10

Actual usability

3/10

↗ Standalone

The weakest direction. Mosaic adjacency encodes nothing; this is decoration with a concept label.

Loading preview…

Expand →

13Hybrid

Treasury

EV/EBITDA index · Koyfin + Modern Treasury

✓ Worked

Most operationally grounded of all sixteen
EV/EBITDA index framing was the right question to ask

✗ Failed

Index methodology invented — conviction score is undefined
Tables imply precision they haven’t earned

Visual credibility

9/10

Workflow realism

6/10

Info density

9/10

Actual usability

6/10

↗ Standalone

The most operationally grounded direction in the set. Also the most data-honest.

Loading preview…

Expand →

14Hybrid

Compass

2×2 matrix · Causal + Kumu.io

✓ Worked

2×2 correctly surfaced the conviction × urgency trade-off
Quadrant metaphor was immediately readable by any executive

✗ Failed

Quadrant position was manually placed — no formula drives it
Matrix avoids the hard problem: how do you score conviction?

Visual credibility

7/10

Workflow realism

5/10

Info density

5/10

Actual usability

5/10

↗ Standalone

The 2×2 is the correct executive framing — if the axes are real. They’re not, yet.

Loading preview…

Expand →

15Hybrid

Arena

Competitive landscape · tldraw + Excalidraw

✓ Worked

Competitive map metaphor is familiar to any exec audience

✗ Failed

Deliberate informality undercuts the institutional register needed
Canvas placement is editorial opinion, not data
Signs of quality decay — commitment fading

Visual credibility

6/10

Workflow realism

4/10

Info density

5/10

Actual usability

4/10

↗ Standalone

The competitive landscape framing is the right question; the execution hedges too many references.

Loading preview…

Expand →

16Hybrid

Loom

Thread/narrative canvas · Whimsical + Pitch

✓ Worked

Thread narrative is a genuinely novel frame for M&A strategy

✗ Failed

Most abstract metaphor in the set — farthest from workflow reality
Aesthetic is mixed — registers bleeding into each other
Evidence of model running out of committed positions

Visual credibility

5/10

Workflow realism

3/10

Info density

4/10

Actual usability

3/10

↗ Standalone

The most conceptually original. Also the least resolved. The metaphor didn’t survive the data.

Chapter 05
What it nailed

✓ Genuinely worked

Held its declared register in every direction
Typographic hierarchy more consistent than expected
Did not invent charts for unavailable data
Constraint quality drove output quality, directly

✗ Quietly faked

Interaction model — screens display, don’t direct
Graph hierarchy without meaningful edge semantics
Density mistaken for sophistication
Back-half directions hedge rather than commit

Constraint fidelity was genuinely impressive. Every direction held its declared register for all three screens. Ledger stayed institutional without hedging toward SaaS at screen two. Atlas held its geographic metaphor without collapsing into a table when the data got complex. Junior designers drift on long briefs. Claude did not.

Typographic hierarchy was more consistent than expected. Maintaining clear visual priority — most important number first, supporting data second, metadata last — is something human designers get wrong under time pressure. It also handled the data model honestly: it did not invent charts for data the spec said was unavailable. That restraint is harder than it sounds, and several directions showed genuine discipline about it.

The strongest single moment across all sixteen: Treasury (13) framed conviction as an EV/EBITDA-relative index rather than a raw score. That’s the correct mental model for the domain, and the model arrived there without being told.

The relationship between constraint quality and output quality is direct and non-negotiable. Every weakness in the output traced back to an ambiguity we hadn’t closed. The directions that held up were the ones where the brief had already answered the hard questions.

Chapter 06
What it faked

The interaction model. Every direction is a dashboard that receives attention. None of them direct attention. A real design for this use case would be opinionated about what the analyst should look at first on an earnings day versus a quiet Tuesday. On a day when three coverage targets report earnings, what changes? Nothing visible in any of the sixteen directions. The screens look different but behave identically: display everything, let the user decide. That is not a design decision — it is the absence of one, rendered in sixteen aesthetic registers.

The model also faked hierarchy inside the canvas metaphors. Constellation uses a node graph, but the nodes do not encode meaningful relationships — they encode the data schema. What does proximity mean here? What does edge weight mean? The model drew the graph without answering those questions.

Density mistaken for sophistication. Walking into sixteen directions, your eye immediately sorts by density — the terminal-register directions pull attention because they fill the frame. That is a trap. Heatfield (06) was the most visually arresting direction in the set and scored a 2/10 on workflow realism. It looked exactly like something that would appear in a Palantir product demo. It had no meaningful operational logic behind it. The directions that held up on second look were the ones with the most restraint. Funnel (08) is the one that reads cleanest at first glance — left-to-right stage progression is a pattern people already know, and the columns name themselves.

The weaker directions in the back half are not weaker because AI ran out of ideas. They are weaker because the constraint set got looser — the model hedging, mixing registers, trying to please two aesthetic masters at once. A designer would have stopped at ten.

Deep Dive
Why AI generates believable interfaces

The more useful question isn’t “what did it get wrong?” It’s: why does it look right? Understanding the mechanics behind visual plausibility separates a designer who can use these tools critically from one who gets fooled by them.

Pattern synthesis over semantic understanding

AI design tools don’t understand your product. They’ve learned, at statistical scale, which visual patterns occur together in credible interfaces — dense tables with monospace type, radial graphs with dark palettes, kanban boards with restrained SaaS aesthetics. The model synthesizes these patterns fluently. What it doesn’t do is model the operational logic underneath them. The result is interfaces that carry the aesthetic grammar of real products without the semantic content.

Learned visual priors do the heavy lifting

The reason a Bloomberg-register direction looks credible is that Bloomberg Terminal is a real, authoritative product. The visual vocabulary — function-code density, monospace everywhere, restrained color — carries authority by association. The AI has learned this association and applies it faithfully. A direction that adopts Bloomberg’s visual grammar inherits Bloomberg’s credibility signal, regardless of whether the underlying data model warrants it. This is design-token mimicry: syntactically correct, semantically empty.

Interaction hallucination

Every direction renders a screen. None of them render a workflow. When you look at Constellation — the force-directed graph — the nodes and edges look like they belong in Palantir Foundry. A senior analyst could sit in front of it and feel the familiarity. But ask what happens when you click a node. Ask what the edge thickness means. Ask what changes on an earnings day versus a quiet Tuesday. The model has no answer because it never modeled those questions. The interface is a rendering of interface aesthetics, not a solution to a problem.

The plausibility trap

The dangerous output is not the obviously bad direction — that gets rejected immediately. It’s the one that’s visually convincing but operationally shallow. That one passes the first filter, enters the shortlist, and gets built. This is the failure mode that matters.

Chapter 07
The pitfalls, stated plainly

Density is not design. Most of the sixteen directions arrange the same components differently. The model is strong at varying surface treatment — color, type, density, rhythm — and weak at varying interaction model, information architecture, or what happens after a click. The test: if you cannot describe what is different about two directions in one sentence — not how they look, but what decision they represent for the user — they are not actually different directions. By that test, at least four of the sixteen are duplicates in different visual registers.

The model decides what is important if you do not. A canvas with sixteen widgets is a list, not a design. Half of design is deciding what to leave out. “Simple” is not a constraint. “The analyst’s primary task is X; everything else is secondary and should not appear on the first screen” is a constraint.

You need a designer to read the output. Not to polish — to read. Someone who can say “this is faking hierarchy” or “the interaction model is missing” or “this graph has no semantic edges.” Claudiu caught three things in Constellation that we had entirely missed: the node proximity encoded nothing, the edge weight was decorative, and the implied interaction (click a node, see what?) was never specified. Without that critique, Constellation would have made our shortlist. It looked exactly right. It was operationally empty.

Accept quality decay after direction 10. The back half of any large-batch generation will hedge. The model has fewer distinct positions available and starts mixing registers to fill the brief. This is not a failure of the tool — it is a property of the problem. The correct response is to use the first eight to ten directions to identify the strongest thesis, then generate variations within that thesis rather than continuing to explore the full space.

Chapter 08
How to run this yourself

Step	What to do	Why it matters	Time
01	Write three sentences that let a stranger picture the primary screen.	Not what the product does — what the user sees and does in one good session.	5 min
02	Run `/grill-me` until every branch is closed.	You need a named user, a primary task, real data, and at least two explicit “must not” constraints. If you don’t have those four things, the model will invent them.	30–60 min
03	Build your reference library before generating.	Pick a register in plain English first. Then pick two concrete references maximum and state what you’re borrowing and what you’re not. Twenty minutes of setup improves every direction.	20 min
04	Set a hard number and accept decay.	Generating eight directions finds the shape. Sixteen shows the edges. The back half is evidence, not failure.	1 hr
05	Get a designer for the critique.	There is no substitute for someone who has shipped real products and knows the difference between hierarchy that serves a task and hierarchy that encodes a schema.	30 min

The AI accelerates ideation. The judgment still lives with the human — and the dangerous part isn’t that AI generates bad design. It’s that it generates believable design faster than most teams can critically evaluate it. Sixteen directions in two hours sounds like a productivity gain. It is also sixteen opportunities to mistake visual plausibility for operational validity before anyone has asked the hard questions.

AI Driven Design (and Pitfalls) with Claude Design.

Sixteen interfaces. Two hours. One uncomfortable question.

Chapter 01The spark

Chapter 02Why speed changes everything

Chapter 03How we ran it

The prompt

/grill-me — the key step

The reference library

The design references

Chapter 04The result — 16 directions

Chapter 05What it nailed

Chapter 06What it faked

Deep DiveWhy AI generates believable interfaces

Pattern synthesis over semantic understanding

Learned visual priors do the heavy lifting

Interaction hallucination

The plausibility trap

Chapter 07The pitfalls, stated plainly

Chapter 08How to run this yourself

Get new thoughts as I publish them.

Chapter 01
The spark

Chapter 02
Why speed changes everything

Chapter 03
How we ran it

`/grill-me` — the key step

Chapter 04
The result — 16 directions

Chapter 05
What it nailed

Chapter 06
What it faked

Deep Dive
Why AI generates believable interfaces

Chapter 07
The pitfalls, stated plainly

Chapter 08
How to run this yourself