Open‑ended prompts produce debt. Locked prompts produce verifiable output. Notes from a year of building real software with Claude Code, with the prompt patterns that worked and the ones that didn't.
I am not an engineer by training. I am a healthcare professional, epidemiologist, and MBA student. Over the last year I have shipped a React Native mobile product, a locally‑run video production pipeline, and a couple of internal data tools. All of it was built with Claude Code as the primary implementer and me as the architect and reviewer.
Along the way I learned something that contradicts most AI‑coding advice: the goal is not to write better prompts. The goal is to write better specifications.
A prompt is a request. "Build me an authentication flow." "Add a login screen." "Make this faster."
A specification is a contract. Inputs, outputs, constraints, edge cases, success criteria, explicit non‑goals. When the spec is good, the prompt writes itself: "Implement the attached spec."
Most AI‑coding failure comes from the first. An open‑ended prompt gives the model a huge search space and no way to know when it is done. It will do something, because models always do something, but that something may or may not be what you actually need. Worse, the output looks plausible enough that you might not notice the mismatch until two weeks later when the assumption it baked in is wrong.
Rule: Any prompt longer than two sentences should be a spec, not a prompt.
Here is a template I use for any non‑trivial feature. It is short. That is the point.
# Feature spec: <name>
## Purpose
What problem this solves, for whom. One sentence.
## Inputs
What data, files, config, user actions this consumes.
## Outputs
What it produces. Format. Where it goes.
## Constraints
- Must use X
- Must not break Y
- Performance budget: Z
## Edge cases
- What happens when input is empty
- What happens when external dependency fails
- What happens on cancel/retry
## Non-goals
Explicit list of things this does NOT do.
## Verification
How I (human) will check this works. Specific test cases.
You can write that in 10 minutes. The feature takes an hour. Without the spec, the feature takes four hours because half of it is wrong on the first pass, wrong in different ways on the second pass, and only converges when you finally articulate what you actually wanted.
Claude Code can run agentic sessions: long‑running, multi‑step, you‑come‑back‑to‑a‑finished‑result. These sessions are the most powerful and the most dangerous mode. Powerful because they can do real work unattended. Dangerous because if the decisions are not locked in the prompt, the agent will improvise, and improvised decisions compound across an 8‑hour session into something that takes longer to untangle than to rebuild.
My rule: every decision that could be open gets closed before the session starts. File paths. Data formats. Library choices. Error‑handling strategy. Stop conditions. Explicit non‑goals.
For the video production pipeline I built, the spec handed to the agent included things like:
scene-<NN>-<slug>.mp4. Do not rename.Every one of those is a decision that, if left open, the agent would make differently on a different day. Closing them means the output on Monday matches the output on Friday. Reproducibility is a side effect of locked decisions.
The other rule I learned the hard way: do not chain steps inside one agentic session. Do one step, verify it works on the actual device or environment, then start the next step with the result as locked context.
The temptation to ask the agent "now do step 2 as well" is enormous. Do not give in. Every step that runs without human verification compounds error risk. A one‑step session that ends in a correct state is worth ten multi‑step sessions that end in "almost correct, but actually there is a subtle issue in step 4 that propagated through step 5, 6, 7."
Agentic throughput is a trap. Reliability matters more than speed. A correct step is worth ten almost‑correct ones.
For a single feature, one chat works. For a project that spans weeks, one chat always ends in context collapse. The model starts forgetting the spec, conflating earlier decisions, or helpfully re‑litigating things you locked last Tuesday.
What works: parallel chats by phase.
Artifacts flow between them. Research output becomes input to architecture. Architecture output becomes input to Claude Code. Code output becomes input to review. Each chat stays scoped and short, which keeps its judgment intact.
The biggest architectural trap is the urge to add complexity that sounds good on paper. A third data source. A new caching layer. A more sophisticated model for a recommendation. All of these feel like upgrades.
The rule: do not add complexity until the instrumentation on the simpler version tells you that you need it.
On one of my apps, a third data source was proposed early. I deferred it. The reasoning was simple: until I have telemetry showing that the two‑source baseline is producing wrong answers, adding a third source is a guess. I instrumented the two‑source version, ran it for weeks, and found the actual failure modes were elsewhere (a normalization bug, not missing data). The third source would have been new code and new maintenance debt for a problem I did not have.
This is why AI tooling is so dangerous when you use it casually. The cost of generating new complexity is low. That does not mean the complexity is free. It just means you spent the low cost at generation time and will spend the high cost forever at maintenance time.
I work in health. Every claim that affects user‑facing behavior has to be grounded in a primary source. When I build features that nudge behavior (eating, medication, activity), I do not let the model paraphrase from a blog post. The reference material comes from:
The AI helps me synthesize, not interpret. It pulls the relevant sections from long documents. It cross‑references between guidelines. It flags where two guidelines disagree. The clinical judgment about which guideline applies to which patient in which context stays mine.
A short list of the patterns that changed my output the most.
None of this is clever. All of it is boring discipline. That is exactly what makes it work.
The most surprising thing about building with AI this past year is how little the "AI part" matters once the workflow is right. Claude Code is a tool. Like any tool, it rewards preparation and punishes improvisation. The people who are getting real work out of it are not writing magic prompts. They are writing clear specs, closing decisions early, and verifying every step.
Spec first. Locked decisions. Instrumented before expanded. Verified before advanced.
Shipping with Claude Code? I'd like to hear what you're building.
Get in touch