Codewords by Agemo · Case study

8 bets,
drawn 26 times.

A GenAI startup that lets people build software by describing what they want, in plain words. Codewords had no precedent — and no shared mental model with users. We worked with the team to design a product that could prove or disprove eight beliefs about how users would behave, in the wild. Every screen we shipped traces back to one of those bets — each one drawn, redrawn, and pressure-tested across 26 iterations.

Agemo × Joyus Discipline: Research, Strategy, Interface Year: 2024

The product was new. The behavior wasn't.

Codewords is not a new behavior — it's a new way of completing one. People already Google for code, ask ChatGPT for snippets, hire contractors for one-off scripts. The question wasn't "will people want this?" It was "how do we get them to switch?"

who Codewords picked first

Twelve user types, on a single proficiency axis.

The team mapped users along a spectrum — from people who'd never write a line of code (Tool Consumers) to people who'd build pipelines of pipelines (Tool Chainers). Twelve user types, sorted into four tiers by proficiency.

Generalists PMs & designers Founder / Designer / PM Mgmt consultant
Code-aware Indie hacker / microsaas Prototyper Expert / tech consultant
Engineers Front-end engineer Back-end engineer Data scientist
Pipeline builders Niche tooling builder Automator Deployer

H4 Completeness over Simplicity. Codewords picked the right side — design for engineers and pipeline builders, even at the cost of casual users.

Role Builder Consumer
where the synthesis came from

The interviews said the same thing, in different words.

Eleven user-research conversations. The same theme, repeated by everyone from a founding engineer at Swiftkey to a Series A fintech PM: the product was powerful but didn't tell you what it could do, what to try, or why to trust it.

"
If you weren't here to walk me through it, I would have dropped off already.
Amanda Pun — Staff PM, Series A fintech
"
I'm taking it at face value so my expectations are super high. You need to make sure you don't disappoint me if you don't set expectations from the get go.
Jon Reynolds — co-founder, Swiftkey
"
What I find a bit problematic is that it's very open-ended. It obviously cannot do everything. But I'm not sure what I can try, what I cannot try. If I try something and I'm unlucky then I'll give up.
Pier — Software engineer, Google
"
The prompt helper is genius.
Marie Brayer — Semi-technical VC
"
I did not modify the new prompt because I was scared to break anything.
Lucas — Technical founder
"
The main goal is to get things done — we'll worry about the quality later.
Keshvi R. — Head of Product, Balderton
"
I didn't know what to do tbh, so I was just checking out the changelogs.
Adnann — Designer
"
For the first time today, I did use CW and it saved me 30 minutes in my day.
Anonymous user
our synthesis

"Most of the feedback is early in the user journey. The lack of onboarding, guidance, information and positioning across the overall product is a huge handicap damaging UX. The recurring themes revolve around usability ('what can I do?') and guidance ('what should I do?'). The tool seems powerful but it is lacking the integration into the user's workflow."

eight bets

Each belief, designed to be falsifiable.

The synthesis named four gaps — onboarding, guidance, positioning, integration. The eight hypotheses below are the bets the team made about how to fill them. Each one is written as a falsifiable claim with a measurable test. Click any card to see how they would have known they were wrong.

hypothesis 2, made tangible

The adoption journey, anchored to verbatim user voice.

The team's diagnosis of the existing product: users went straight from "Learn capabilities" to "Create" — skipping the trust-building middle. The redesign needed to give users somewhere to be while they figured out whether to invest. Three states. Each one with the actual quotes the team had captured from interviews.

1
Curiosity

Low intent — "What does this do?"

How do I ...

Can I use a tool to let me ...

Can I find something similar to help me with the problem I have now

2
Aspirational

Hope for a solution — "When and how can this do something for me?"

Can I try this for something simple ...

Can I trust this to be consistent for ...

It's cool this can do ...

Could this integrate or replace my existing ...

3
Problem-solving

High intent — "What of my problems can be consistently solved?"

How can this fit into my workflow for ...

How can I modernize this existing workflow with ...

How can I make this function into something I can integrate in ...

"Today journey largely skips over middle, goes from 'learn capabilities' to 'create.' Trust built with user needed this journey to happen." — annotation directly on the journey board.

naming + positioning

The tagline went through 50+ permutations before it stopped moving.

Late in the process, the team built a bracket-grammar template — a way to vary the verb, the object, and the punchline independently. The slot machine below is the actual generator from "Thinking fifth." Each refresh = a real candidate the team considered.

Build a function in minutes without a team of engineers.

The new way to code.

↑ candidate 1
final pick
"Write code with words."
the build flow, mapped

Every fail-state, planned before it became a bug.

Drawn before any production code shipped. Each green node is a success path; each orange node is a fail-state that needed graceful handling. The flow watches as it draws — that's how the whole team checks "did we miss anything?" in a single glance.

Codewords user Pick template / Blank Write prompt Add integrations Check & enhance Function builds Not possible UI populated End — try again Run + feedback
what shipped

The Architect screen — iteration 26, where every interaction traces back to a hypothesis.

Across 26 design rounds the team kept revising this screen until each of the eight hypotheses had landed somewhere on it. This is what shipped. Hover any marker to see which hypothesis that interaction implements — visible here: H5 H6 H4 H8.

26 of 26 · the version that shipped
Codewords Architect interface
H5 · Defer to the humanThe AI proposes a spec; the user accepts, edits, or asks for more. Confirmation is required at every step — never assumed.
H6 · Draft state mattersThe spec lives in the editor as a draft you can iterate on, not a frozen brief. Refinement is the primary action, not an exception.
H4 · Completeness over SimplicityDetailed parameter visibility, architect chat, and the spec link are all available at once. Density is intentional — the audience is power users.
H8 · Users want it doneThe Generate action lives where it would be expected — output is one click away even when the spec is still rough.
what landed

From a Google search to a conversation.

Three signature interactions — each one is an answer to a specific user complaint from the interviews above.

A conversational Architect

Replaces "blank page paralysis" with a chat. Answers Lucas's "I was scared to break anything" by making proposed changes explicit and reversible.

Spec as a first-class draft

The function spec is editable, not just visible. Implements H6 (Draft state matters) and answers Pier's "I'm not sure what I can try."

Explore + Remix as way-stations

Builds the trust-building middle the journey was missing. Answers Adnann's "I didn't know what to do, so I was just checking out the changelogs."

what mattered

What we learned, that you can take into any product.

  1. Write down what you believe before you draw — and write down how you'd know you were wrong. Hypothesis docs aren't research busy-work; they're the only thing that prevents post-hoc rationalization.
  2. For new paradigms, the interface IS the onboarding. Every ChatGPT comparison, every tooltip miss, every "I didn't know what to do" comes back to this.
  3. Density vs. simplicity is a positioning choice, not a usability one. If your audience is power users, designing for casual ones loses both.
  4. Drafts are the work, not the prelude. Build editable state into your primary surface, not into a separate "edit" mode.
  5. When users have no model, give them somewhere to be while they build one. Explore and Remix beat empty home pages.
see the research cut → see the interface cut →

two takes on the same project, in 30 seconds each

your move

Building something new? Be our friends.

hello@joyus.studio · we read everything next case study: convegenius