By Dan Baciu & Claudiu Hutanu · April 1, 2026 · 28 min read

AI Driven Design (and Pitfalls) with Claude Design.

AI Driven Design (and Pitfalls) with Claude Design

A friend asked for help turning a vague corp-dev idea into something concrete. Two hours later we had sixteen working design directions and a clearer view of where AI helps design.

A friend asked for help turning a vague corp-dev idea into something concrete. Two hours later we had sixteen structurally coherent prototypes — and a growing suspicion that most of them only looked right at first glance.

Sixteen interfaces. Two hours. One uncomfortable question.

This is not a product review. It is an investigation into what happens when you hand a vague, real-world product idea to an AI design tool — and then ask a trained designer to tell you which of the results are actually good designs, and which ones are just wearing the right aesthetic clothes.

Chapter 01
The spark

It started, as many things do, with a late-night message. A friend who works in corporate development pinged me with an idea: a tool to replace the patchwork of Excel sheets, email threads, and Bloomberg tabs that make up his daily M&A workflow. He wanted something that could hold a strategic thesis, track acquisition targets underneath it, and surface signals — news, earnings, analyst reports — without requiring him to go find them.

9:41LTE 100%
Friend
Today 2:13 PM

For a given company, you (manually or automatically) set its targets based on a target industry, then your daily output is a) a dashboard of news and financial data (if available) for your coverage portfolio, b) an email/text/slack notification for especially big news OR for your sweetheart must-have targets, c) big formal update on earnings releases because they have a lot of data and move the markets significantly, and d) [hard] some kind of aggregated synthetic index of whether prices (ev/ebitda) are going up or down.

Delivered

What the tool must produce — daily

a)
Portfolio Dashboard
News and financial data for your coverage universe. Every target, every morning, in one view.
b)
Smart Alerts
Email, text, or Slack when especially big news breaks — or when a sweetheart must-have target moves.
c)
Earnings Briefings
Formal, data-dense update on earnings releases. They move markets; they deserve their own format.
d) hard
Pricing Index
A synthetic, aggregated signal of whether EV/EBITDA multiples across your coverage are trending up or down.

Vague, but concrete enough to start. Two hours later we had sixteen visually coherent prototypes — legible enough to hold a real design conversation. Which is when things got interesting, because the conversation revealed how many of them were wearing the right clothes without understanding the job.

Co-written with Claudiu Hutanu (LinkedIn), a product designer who provides critique on how to use AI driven design tools effectively to generate quality outputs.

Chapter 02
Why speed changes everything

If you’ve ever hired a designer and waited three weeks for a first-look deck, this process breaks the equation. Not because it replaces the designer — it doesn’t — but because it collapses the time between “I have a vague idea” and “I can have a real conversation about it” to a single afternoon.

That compression is the actual story here. Sixteen directions in two hours sounds like a productivity gain. It is also sixteen opportunities to mistake visual plausibility for operational validity — before anyone has asked the hard questions. Speed produces more decisions, faster. It doesn’t make the decisions better.

We rejected three directions before the designer even saw them — not for visual reasons, they looked fine — but because the interaction model was identical to another direction and the visual register was internally contradictory. That judgment took a human. The brief was excellent. The output still needed someone to read it critically.

2h
Total time from idea to 16 working prototypes
16
Distinct design directions generated
9
Rounds of /grill-me to close the brief
3
Aesthetic registers across all directions

The most useful thing we found wasn’t what the AI did well. It was learning to see the gap between visual credibility and operational validity — and understanding how easily one masquerades as the other.

Chapter 03
How we ran it

The process had three distinct phases, and the sequence mattered. You can’t skip to generation — the quality of your output is entirely determined by the quality of your brief. Here’s exactly how those two hours unfolded.

01
Three sentences to get started
My friend wrote three sentences describing the primary screen. Vague, but enough to trigger the next step.
~5 min
02
/grill-me — the non-negotiable step
A Claude Code skill that interviews you one question at a time until every ambiguity is closed. Nine rounds in, we had a named user, a primary task, real data constraints, and explicit “must not” rules. Most of the quality in the final output traces back here.
~40 min
03
Build the reference library
We defined four aesthetic registers — Institutional, Modern SaaS, Hybrid, and Canvas-specific — and assigned references within each. Every direction would declare one register and at most two references. This is what keeps sixteen outputs from collapsing into one.
~15 min
04
Generate 16 directions in one pass
A single prompt, the full brief, 16 self-contained HTML prototypes. Quality visibly fades after direction 10 — the model hedges, blends registers, avoids commitment. That’s expected. Directions 1–10 show you what’s possible. Directions 11–16 show you where the model runs out of distinct positions.
~50 min
05
Designer critique
Claudiu walked every direction and scored each on four axes: visual credibility, workflow realism, information density, and actual usability. This is where the faking became visible.
~30 min

The prompt

My friend said he wanted to make something. Three sentences describing the primary screen. Vague, but enough to start a design sprint.

I asked for three sentences that let a stranger picture the screen. Good enough to grill.

/grill-me — the key step

/grill-me is a Claude Code skill that interviews you one question at a time until every open branch is closed. Nine rounds in, the spec was a structured document: audience, tasks, three mandatory screens, data model, what the design must not do.

The skill does not write your spec. It refuses to let you start the next step until the open questions are closed. Most of the quality achieved in the final deliverables traces back here.

The reference library

The reference library is not a moodboard. It is a constraint document. Each aesthetic anchor defines two things: what you are borrowing (structural vocabulary, type scale, color logic) and — critically — what you are not borrowing. “Inspired by Bloomberg” without that second clause produces directions that inherit Bloomberg’s visual weight without the operational depth that earns it.

To generate distinct and unique designs, we prompted Claude Design to work within four registers, one per aesthetic territory:

  • Institutional / data-dense – Bloomberg-style finance-terminal density
  • Modern SaaS (Linear, Notion, Pitch, Height) – keyboard-first restraint
  • Hybrid (Stripe Dashboard, Vercel, Modern Treasury, Causal) – rigorous data with type discipline
  • Canvas-specific (Foundry, Kumu, Obsidian graph, tldraw) – for the interaction model
Register Description References
Institutional Finance-terminal density. Serious, dense, monospace-driven. Bloomberg Terminal, Palantir Foundry, Koyfin, Capital IQ
Modern SaaS Keyboard-first restraint. Clean type, command-palette culture. Linear, Notion, Pitch, Height
Hybrid Institutional data with SaaS type discipline. Stripe Dashboard, Vercel, Modern Treasury, Causal
Canvas Interaction model orthogonal to register — applies across all three. Palantir Object Explorer, Kumu.io, Obsidian graph, tldraw

The design references

Every direction declares one register, picks at most two references, and spells out in plain terms what it takes from each reference. That is what keeps sixteen directions from collapsing into one.

One additional input: awesome-design.md as methodology, not moodboard. If your company or brand already has a style guide or brand reference, you should use that. Since we had neither, we defaulted to awesome-design.md for some inspiration which greatly sped up our design process.

For reference, DESIGN.md is a new concept introduced by Google Stitch. It is a plain-text design system document that AI agents read to generate consistent UI. Awesome-design.md is a public repository which houses a curated collection of design system documents extracted from very famous and widely used public websites.

We also established a shared fictional company — Acme Mobility Corp, a $40B industrial conglomerate — and held it constant across all sixteen directions. If each direction gets different content, you can’t tell whether one prototype works because the design is better or because the story is easier. Holding the company constant isolates design as the variable.

One caveat with our approach, which we were aware of and acknowledged up front, is that quality decays toward the back. Designs twelve through sixteen are noticeably weaker. That is fine. The point of generating sixteen distinct designs is to see the shape of the space, not to ship the median.

Abstract 4x4 grid of sixteen miniature dashboard sketches, each in a different layout style
One brief, sixteen lenses.

Chapter 04
The result — 16 directions

Start with the index, then move through each direction in turn. Click Expand on any card to view the full dashboard inside the article, or Open standalone for the full viewport — the layouts are designed for the whole screen.

What broke immediately, across the set

Fake information density

Dense layouts that look data-rich but contain the same three data points arranged differently.

Shallow workflow models

Every direction displays data. None model what the analyst actually does with it, or when, or why.

Identical interaction patterns

Visual registers varied widely. Interaction models did not — click, view, scroll across all sixteen.

Semantic-free graph edges

Graph metaphors drawn correctly, but edges encode nothing. Proximity means nothing. Weight means nothing.

Simulated intelligence

Conviction scores and EV/EBITDA indices looked computed. They were hand-placed. The methodology doesn’t exist.

Quality decay after direction 10

The back half hedges. References blur. The model ran out of distinct positions before the brief ran out of slots.

# Direction Register Canvas Metaphor Aesthetic Anchor Link
01LedgerInstitutionalStructured ledger / table-as-canvasCapital IQ + Bloomberg
02AtlasHybridGeographic/spatial map of thesesStripe Dashboard + Koyfin
03BridgeHybridSankey: capability → thesis → targetModern Treasury + Vercel
04ConstellationInstitutionalForce-directed graph / node-linkPalantir Foundry + Koyfin
05ConvictionModern SaaSKanban of theses by stageLinear + Pitch
06HeatfieldInstitutionalHex/grid adjacency mapBloomberg Terminal + Palantir
07RoadmapHybridTimeline: theses vs. market trendsCausal + Stripe Dashboard
08FunnelHybridPipeline funnel: Explore → ClosedVercel Dashboard + Causal
09BriefingModern SaaSDocument hierarchy with graph viewNotion + Height
10SignalHybridRadial starburst: capability-rootedModern Treasury + Linear
11CouncilModern SaaSSwimlanes by strategic horizonHeight + Pitch
12QuiltHybridTiled/mosaic canvas overviewObsidian graph + Whimsical
13TreasuryHybridEV/EBITDA index + financial viewKoyfin + Modern Treasury
14CompassHybridDirectional target prioritizationCausal + Kumu.io
15ArenaHybridCompetitive landscape matrixtldraw + Excalidraw
16LoomHybridWoven/thread narrative canvasWhimsical + Pitch

Score summary — all 16 directions across 4 evaluation axes

Direction
Visual credibility
Workflow realism
Info density
Usability
01 Ledger
8
4
9
5
02 Atlas
7
3
6
4
03 Bridge
8
5
7
5
04 Constellation
9
3
7
3
05 Conviction
8
6
5
6
06 Heatfield
9
2
8
3
07 Roadmap
7
5
6
5
08 Funnel
7
4
6
5
09 Briefing
7
6
5
6
10 Signal
8
5
7
6
11 Council
7
6
4
6
12 Quilt
6
2
5
3
13 Treasury
9
6
9
6
14 Compass
7
5
5
5
15 Arena
6
4
5
4
16 Loom
5
3
4
3
Strong (7–10) Moderate (5–6) Weak (1–4)
Loading preview…
Expand →
01Institutional
Ledger
Table-as-canvas · Capital IQ discipline
✓ Worked
  • Restrained typography held throughout
  • Numbers had clear visual priority
  • Density felt earned, not forced
✗ Failed
  • No interaction model — clicking a row leads nowhere
  • Displays data but never directs attention
  • Every metric treated as equally important
Visual credibility
8/10
Workflow realism
4/10
Info density
9/10
Actual usability
5/10
↗ Standalone

The only direction where restraint was a deliberate design choice rather than a default.

Loading preview…
Expand →
02Hybrid
Atlas
Geographic map · Stripe + Koyfin
✓ Worked
  • Spatial orientation was immediately readable
  • Color-coded regions surfaced status at a glance
✗ Failed
  • M&A strategy doesn’t map to geography — metaphor is decorative
  • Physical location encodes nothing about strategic fit
Visual credibility
7/10
Workflow realism
3/10
Info density
6/10
Actual usability
4/10
↗ Standalone

Geography is decorative here — the metaphor adds visual distinctiveness without adding analytical value.

Loading preview…
Expand →
03Hybrid
Bridge
Sankey flow · Modern Treasury
✓ Worked
  • Sankey capability→target is conceptually correct
  • Flow structure matches the underlying data model
✗ Failed
  • Static — no interaction on edges or nodes
  • Flow proportions were invented, not derived
Visual credibility
8/10
Workflow realism
5/10
Info density
7/10
Actual usability
5/10
↗ Standalone

Conceptually the most correct flow model; the failure is that it never made the flow interactive.

Loading preview…
Expand →
04Institutional
Constellation
Force-directed graph · Palantir
✓ Worked
  • Node-link structure correctly expresses the graph model
  • Dark institutional palette held the register
✗ Failed
  • Node proximity encodes nothing meaningful
  • Edge weight is decorative — what does thickness mean?
Visual credibility
9/10
Workflow realism
3/10
Info density
7/10
Actual usability
3/10
↗ Standalone

Most visually authoritative. Lowest operational value. The gap between the two is the whole lesson.

Loading preview…
Expand →
05Modern SaaS
Conviction
Kanban · Keyboard-first, presentation-grade type
✓ Worked
  • Stage-based kanban correctly models thesis progression
  • Card anatomy was consistent across columns
✗ Failed
  • All thesis cards carry equal visual weight — no urgency signal
  • Restrained aesthetic undercuts financial seriousness
Visual credibility
8/10
Workflow realism
6/10
Info density
5/10
Actual usability
6/10
↗ Standalone

Works as a lightweight tracking view; fails as a strategic decision-making surface.

Loading preview…
Expand →
06Institutional
Heatfield
Hex grid · Terminal density
✓ Worked
  • Grid density communicates coverage breadth
  • Color-field signals status at a glance
✗ Failed
  • Hexagons are arbitrary — adjacency encodes nothing strategic
  • Most visually convincing with the weakest operational logic
Visual credibility
9/10
Workflow realism
2/10
Info density
8/10
Actual usability
3/10
↗ Standalone

The strongest argument for why visual credibility and workflow realism are different scores.

Loading preview…
Expand →
07Hybrid
Roadmap
Timeline · Causal + Stripe
✓ Worked
  • Correctly surfaces the temporal dimension others ignored
  • Now/Next/Later labels added real structural clarity
✗ Failed
  • Assumes theses have fixed timelines — they don’t
  • Gantt implies execution scheduling, not strategic monitoring
Visual credibility
7/10
Workflow realism
5/10
Info density
6/10
Actual usability
5/10
↗ Standalone

Gets the temporal dimension right; gets the M&A iteration loop entirely wrong.

Loading preview…
Expand →
08Hybrid
Funnel
Pipeline stages · Minimal chrome
✓ Worked
  • Reads cleanest of all sixteen at first glance
  • Cards are discrete; columns name themselves
✗ Failed
  • Borrows from sales CRM logic, not corp dev reality
  • M&A is iterative — the funnel implies one-way flow
Visual credibility
7/10
Workflow realism
4/10
Info density
6/10
Actual usability
5/10
↗ Standalone

Reads cleanest at first glance — but it is a pattern, not a product.

Loading preview…
Expand →
09Modern SaaS
Briefing
Doc hierarchy · Notion + Height
✓ Worked
  • Document metaphor suits analysts who write and annotate
  • Sidebar navigation was genuinely navigable
✗ Failed
  • Turns active strategy into passive reading
  • Document frame is excellent for storage, weak for decisions
Visual credibility
7/10
Workflow realism
6/10
Info density
5/10
Actual usability
6/10
↗ Standalone

Excellent for analysts who write and file; weak for analysts who decide and act.

Loading preview…
Expand →
10Hybrid
Signal
Radial starburst · Modern Treasury + Linear
✓ Worked
  • Radial layout made capability→thesis visible instantly
  • Important number was isolated — genuinely hard to do
✗ Failed
  • Doesn’t scale beyond 6–8 theses before becoming unreadable
  • Radial metaphor reads as decorative after first impression
Visual credibility
8/10
Workflow realism
5/10
Info density
7/10
Actual usability
6/10
↗ Standalone

The conviction index framing is right. The radial metaphor earns its place here.

Loading preview…
Expand →
11Modern SaaS
Council
Swimlanes · Height + Pitch
✓ Worked
  • Swimlanes by horizon correctly surfaces temporal grouping
  • Multi-view affordance suited the data model
✗ Failed
  • Swimlanes too wide — whitespace undermined seriousness
  • Restrained palette felt underpowered for financial decisions
Visual credibility
7/10
Workflow realism
6/10
Info density
4/10
Actual usability
6/10
↗ Standalone

Technically sound. Aesthetically understated to the point of feeling unserious.

Loading preview…
Expand →
12Hybrid
Quilt
Mosaic canvas · Obsidian + Whimsical
✓ Worked
  • Tiled layout communicated coverage breadth clearly
✗ Failed
  • Weakest metaphor — tiles encode no relationships
  • A grid with aesthetic variation dressed up as a canvas
  • The model should have stopped here
Visual credibility
6/10
Workflow realism
2/10
Info density
5/10
Actual usability
3/10
↗ Standalone

The weakest direction. Mosaic adjacency encodes nothing; this is decoration with a concept label.

Loading preview…
Expand →
13Hybrid
Treasury
EV/EBITDA index · Koyfin + Modern Treasury
✓ Worked
  • Most operationally grounded of all sixteen
  • EV/EBITDA index framing was the right question to ask
✗ Failed
  • Index methodology invented — conviction score is undefined
  • Tables imply precision they haven’t earned
Visual credibility
9/10
Workflow realism
6/10
Info density
9/10
Actual usability
6/10
↗ Standalone

The most operationally grounded direction in the set. Also the most data-honest.

Loading preview…
Expand →
14Hybrid
Compass
2×2 matrix · Causal + Kumu.io
✓ Worked
  • 2×2 correctly surfaced the conviction × urgency trade-off
  • Quadrant metaphor was immediately readable by any executive
✗ Failed
  • Quadrant position was manually placed — no formula drives it
  • Matrix avoids the hard problem: how do you score conviction?
Visual credibility
7/10
Workflow realism
5/10
Info density
5/10
Actual usability
5/10
↗ Standalone

The 2×2 is the correct executive framing — if the axes are real. They’re not, yet.

Loading preview…
Expand →
15Hybrid
Arena
Competitive landscape · tldraw + Excalidraw
✓ Worked
  • Competitive map metaphor is familiar to any exec audience
✗ Failed
  • Deliberate informality undercuts the institutional register needed
  • Canvas placement is editorial opinion, not data
  • Signs of quality decay — commitment fading
Visual credibility
6/10
Workflow realism
4/10
Info density
5/10
Actual usability
4/10
↗ Standalone

The competitive landscape framing is the right question; the execution hedges too many references.

Loading preview…
Expand →
16Hybrid
Loom
Thread/narrative canvas · Whimsical + Pitch
✓ Worked
  • Thread narrative is a genuinely novel frame for M&A strategy
✗ Failed
  • Most abstract metaphor in the set — farthest from workflow reality
  • Aesthetic is mixed — registers bleeding into each other
  • Evidence of model running out of committed positions
Visual credibility
5/10
Workflow realism
3/10
Info density
4/10
Actual usability
3/10
↗ Standalone

The most conceptually original. Also the least resolved. The metaphor didn’t survive the data.


Chapter 05
What it nailed

✓ Genuinely worked

  • Held its declared register in every direction
  • Typographic hierarchy more consistent than expected
  • Did not invent charts for unavailable data
  • Constraint quality drove output quality, directly

✗ Quietly faked

  • Interaction model — screens display, don’t direct
  • Graph hierarchy without meaningful edge semantics
  • Density mistaken for sophistication
  • Back-half directions hedge rather than commit

Constraint fidelity was genuinely impressive. Every direction held its declared register for all three screens. Ledger stayed institutional without hedging toward SaaS at screen two. Atlas held its geographic metaphor without collapsing into a table when the data got complex. Junior designers drift on long briefs. Claude did not.

Typographic hierarchy was more consistent than expected. Maintaining clear visual priority — most important number first, supporting data second, metadata last — is something human designers get wrong under time pressure. It also handled the data model honestly: it did not invent charts for data the spec said was unavailable. That restraint is harder than it sounds, and several directions showed genuine discipline about it.

The strongest single moment across all sixteen: Treasury (13) framed conviction as an EV/EBITDA-relative index rather than a raw score. That’s the correct mental model for the domain, and the model arrived there without being told.

The relationship between constraint quality and output quality is direct and non-negotiable. Every weakness in the output traced back to an ambiguity we hadn’t closed. The directions that held up were the ones where the brief had already answered the hard questions.

Chapter 06
What it faked

The interaction model. Every direction is a dashboard that receives attention. None of them direct attention. A real design for this use case would be opinionated about what the analyst should look at first on an earnings day versus a quiet Tuesday. On a day when three coverage targets report earnings, what changes? Nothing visible in any of the sixteen directions. The screens look different but behave identically: display everything, let the user decide. That is not a design decision — it is the absence of one, rendered in sixteen aesthetic registers.

The model also faked hierarchy inside the canvas metaphors. Constellation uses a node graph, but the nodes do not encode meaningful relationships — they encode the data schema. What does proximity mean here? What does edge weight mean? The model drew the graph without answering those questions.

Density mistaken for sophistication. Walking into sixteen directions, your eye immediately sorts by density — the terminal-register directions pull attention because they fill the frame. That is a trap. Heatfield (06) was the most visually arresting direction in the set and scored a 2/10 on workflow realism. It looked exactly like something that would appear in a Palantir product demo. It had no meaningful operational logic behind it. The directions that held up on second look were the ones with the most restraint. Funnel (08) is the one that reads cleanest at first glance — left-to-right stage progression is a pattern people already know, and the columns name themselves.

The weaker directions in the back half are not weaker because AI ran out of ideas. They are weaker because the constraint set got looser — the model hedging, mixing registers, trying to please two aesthetic masters at once. A designer would have stopped at ten.


Deep Dive
Why AI generates believable interfaces

The more useful question isn’t “what did it get wrong?” It’s: why does it look right? Understanding the mechanics behind visual plausibility separates a designer who can use these tools critically from one who gets fooled by them.

Pattern synthesis over semantic understanding

AI design tools don’t understand your product. They’ve learned, at statistical scale, which visual patterns occur together in credible interfaces — dense tables with monospace type, radial graphs with dark palettes, kanban boards with restrained SaaS aesthetics. The model synthesizes these patterns fluently. What it doesn’t do is model the operational logic underneath them. The result is interfaces that carry the aesthetic grammar of real products without the semantic content.

Learned visual priors do the heavy lifting

The reason a Bloomberg-register direction looks credible is that Bloomberg Terminal is a real, authoritative product. The visual vocabulary — function-code density, monospace everywhere, restrained color — carries authority by association. The AI has learned this association and applies it faithfully. A direction that adopts Bloomberg’s visual grammar inherits Bloomberg’s credibility signal, regardless of whether the underlying data model warrants it. This is design-token mimicry: syntactically correct, semantically empty.

Interaction hallucination

Every direction renders a screen. None of them render a workflow. When you look at Constellation — the force-directed graph — the nodes and edges look like they belong in Palantir Foundry. A senior analyst could sit in front of it and feel the familiarity. But ask what happens when you click a node. Ask what the edge thickness means. Ask what changes on an earnings day versus a quiet Tuesday. The model has no answer because it never modeled those questions. The interface is a rendering of interface aesthetics, not a solution to a problem.

The plausibility trap

The dangerous output is not the obviously bad direction — that gets rejected immediately. It’s the one that’s visually convincing but operationally shallow. That one passes the first filter, enters the shortlist, and gets built. This is the failure mode that matters.


Chapter 07
The pitfalls, stated plainly

Density is not design. Most of the sixteen directions arrange the same components differently. The model is strong at varying surface treatment — color, type, density, rhythm — and weak at varying interaction model, information architecture, or what happens after a click. The test: if you cannot describe what is different about two directions in one sentence — not how they look, but what decision they represent for the user — they are not actually different directions. By that test, at least four of the sixteen are duplicates in different visual registers.

The model decides what is important if you do not. A canvas with sixteen widgets is a list, not a design. Half of design is deciding what to leave out. “Simple” is not a constraint. “The analyst’s primary task is X; everything else is secondary and should not appear on the first screen” is a constraint.

You need a designer to read the output. Not to polish — to read. Someone who can say “this is faking hierarchy” or “the interaction model is missing” or “this graph has no semantic edges.” Claudiu caught three things in Constellation that we had entirely missed: the node proximity encoded nothing, the edge weight was decorative, and the implied interaction (click a node, see what?) was never specified. Without that critique, Constellation would have made our shortlist. It looked exactly right. It was operationally empty.

Accept quality decay after direction 10. The back half of any large-batch generation will hedge. The model has fewer distinct positions available and starts mixing registers to fill the brief. This is not a failure of the tool — it is a property of the problem. The correct response is to use the first eight to ten directions to identify the strongest thesis, then generate variations within that thesis rather than continuing to explore the full space.

Chapter 08
How to run this yourself

Step What to do Why it matters Time
01 Write three sentences that let a stranger picture the primary screen. Not what the product does — what the user sees and does in one good session. 5 min
02 Run /grill-me until every branch is closed. You need a named user, a primary task, real data, and at least two explicit “must not” constraints. If you don’t have those four things, the model will invent them. 30–60 min
03 Build your reference library before generating. Pick a register in plain English first. Then pick two concrete references maximum and state what you’re borrowing and what you’re not. Twenty minutes of setup improves every direction. 20 min
04 Set a hard number and accept decay. Generating eight directions finds the shape. Sixteen shows the edges. The back half is evidence, not failure. 1 hr
05 Get a designer for the critique. There is no substitute for someone who has shipped real products and knows the difference between hierarchy that serves a task and hierarchy that encodes a schema. 30 min

The AI accelerates ideation. The judgment still lives with the human — and the dangerous part isn’t that AI generates bad design. It’s that it generates believable design faster than most teams can critically evaluate it. Sixteen directions in two hours sounds like a productivity gain. It is also sixteen opportunities to mistake visual plausibility for operational validity before anyone has asked the hard questions.

← All thoughts