Building Dev Tooling Through Conversations with Claude

Table of Contents

Established tech companies often have teams whose job, at least in part, is to write and maintain tooling for internal developers.

When done well, developer tooling provides a streamlined way of doing common things and helps steer people towards standard ways of using the platform. It also makes ad-hoc testing a lot easier, which can help surface bugs or highlight awkward APIs.

I missed having a nice developer experience, but it’s hard to prioritise tooling at a startup like Quidkey. I figured with agentic coding tools (specifically Claude Code) we might be able to replicate at least some of the nice developer experience without having to dedicate a chunk of time. So I decided to write a unified CLI for interacting with Quidkey’s platform - creatively called qk (pronunciation TBD - maybe “cookie”?)

We now use this internally for managing per-env config, testing APIs, debugging transactions, managing users and roles, and a bunch of other useful things.

# Creating a user on my local env
qk user create --env local --email [email protected] --role global:admin-read-only

# Fetch a timeline of a transaction in prod (i have changed the UUID just in case :P)
qk transaction get --env prd 25e0f9e6-03d0-4121-8529-64255a6fa87d --with-timeline

# Add a config value to the console service in our test env 
qk config add console --env tst --key POSTHOG_PROJECT_API_KEY

# Manually try and progress a transaction in dev to awaiting payout via the API
qk api --env dev put /transactions/7843408b-990b-49c6-9c2f-ce7a113c0669/status '{ "status": "converted" }'

In this post, I’m hoping to show through examples how I’ve been using Claude to develop qk, and in particular how some of the fundamental design choices and architecture were driven by some fairly small-seeming feature requests with some follow-up questions.

To help with this, I (well actually Claude Code) dug into my history to unearth some of the prompts I used. I’ve included a bunch of these verbatim as examples throughout in the hope they give a flavour for how qk has been developed.

If you’re wondering about why I talk to Claude Code in a weird/quirky/deeply irritating way, there isn’t really any reason other than habit - I started doing it because it amused me and it’s kind of stuck. I seem to get good enough results, so I’m not changing now :P

As a first example, the default colour scheme clashed horribly with my zsh theme - a dreadful embarrassment I had to fix before writing a blog post:

❯ Hm - do we have themes for the `qk` colours?

  I'd love to have it match my terminal theme, catppuccin mocha.

and a minute later, there it was:

qk service info command rendered with the catppuccin mocha colour theme — `qk service info` with the catppuccin mocha theme applied, complete with failed deploy to add a bit of colour.

Beautiful! I even deployed a broken build to dev several hours ago in anticipation of testing my new theme. How’s that for planning! ;)

The Goal #

I want qk to become the default tool engineers reach for when they need to do something on our platform. It should therefore be:

Discoverable: qk should tell you what it can do by providing decent online help and - probably the thing I’ve found the most useful - good autocompletion. If I type qk <tab> or qk config add --service <tab> and it tells me what’s possible, I don’t need to interrupt my flow to go and look things up.
Ergonomic: qk should be nice for developers to use - it should take care of background faff like authentication; have sensible defaults where appropriate; provide good error messages and help with debugging when things go wrong. It should present information so it’s easy to read and interact with, using clickable links for modern terminals, and colour and typography where appropriate.
Extensible by anyone on the team: It should be easy for people to add their own commands, and provide a framework that gives features like authentication, autocompletion and help for “free”. It should also be clear how commands are organised so people know where to put things.
Usable by tooling: qk should also be nice for other tools to use, so should support machine-readable JSON output and an MCP, so scripts and Claude can use it too.

There are a few concerns I have, particularly as we grow the team.

Hiding the platform behind a facade can lead to developers losing some intuition about how things really work (i.e. mechanical sympathy). I don’t think this is a big risk at our current size (we are all the platform team right now), but it’s something to consider as we grow.
There’s also a risk that qk accrues a pile of seldom-used or half-baked features.
There’s a danger that the higher-level convenience commands become a de-facto substitute API. This could mean we don’t use our API as a customer would, and end up papering over awkward and poorly designed APIs rather than fixing them properly. (On the other hand, having the CLI as an internal-facing public API client might help prompt us to fix things!)

Every time we add a new command, we really should think “why do I need this?” and consider what it looks like under the hood.

I’ve so far found that I tend to only add things when something has genuinely frustrated me multiple times, so this serves as a bit of a natural gate to feature accrual. It might be interesting to add some light-weight telemetry - both to find out whether qk really is being used, and also as a way of finding any parts that are broken or that nobody ever uses.

I also hope we can use some of the qk features as inspiration for what higher-level concepts might be useful on SDKs we provide to customers.

The Initial Technology Choices #

At Quidkey we mostly use TypeScript, and for consistency and familiarity, I wanted qk to be written in TypeScript as well. I don’t have a strong background in TypeScript, and even less in writing CLIs, so I didn’t have much idea where to start.

In the past I’ve often found that a good way of making a decision is to copy people who are smarter than me. It turns out this aligns quite well with Claude, which excels in taking existing patterns and adapting them to new situations.

I knew that Claude Code itself was written in TypeScript. Unfortunately the prompts have been lost to time, but the core technology choices for qk basically came from me asking it to investigate the tech stack for Claude Code and just use that.

As such, we ended up using Ink (React for CLIs) together with Pastel (an Ink-based CLI framework that does file-system command routing, parses arguments, and hands them to React components for display. It uses Commander.js and Zod under the hood), with Ink UI providing a bunch of useful components - TextInput, ConfirmInput, Spinner etc.

Another couple of examples of copying existing tools are

Shell autocomplete - this is super important for providing a good developer experience, and it’s quite fiddly to get right. I asked Claude to work out how gh does it and we ended up with something pretty similar ($(qk completion zsh) to install into the shell backed by a hidden qk __complete command that takes a partial command line and returns completions).
MCP support - again copied from gt (the graphite CLI), and started with one prompt:
```
❯ 👋 I'd like to expose `qk` as an MCP — similar to how graphite does it with its `gt` command. Can you help with this please?
```
qk mcp starts a simple stdio MCP server, which can be used with Claude Code via claude mcp add --scope user --transport stdio qk -- qk mcp - this fell out almost for free from the fact we support json output for all commands.

It’s important to be careful here: we want to run the MCP with absolutely no prod access. We enforce this via tightly scoped credentials, and for belts and braces qk itself refuses to interact with prod when running as an MCP.

Claude Code invoking qk through MCP to create a payment — Claude Code driving `qk` through its MCP interface

Using Claude #

When developing qk I tend to build things incrementally, sometimes giving Claude my thoughts on future direction to help drive the approach. I explicitly ask about consolidation and re-use where I think it matters. This often leads to fairly small refactors:

❯ 👋 Morning! I've noticed for flags that are accepted by many (or all!) commands like `--env` and `--ref` - the argument is repeated verbatim many times.

  Could we (and should we) factor this out?

❯ what do you think about lifting `--json` so it's an immediate argument to `qk` rather than an argument to each command?

but it’s also driven much of the underlying architecture. One foundational example of this is when I added --json output support. What started as a fairly simple task led, through a series of prompts, to a fundamental restructuring.

❯ 👋 Hello! We've been working on `qk` - a CLI tool intended to make it a good experience for our developers to do things on our platform.

  I'd also like to make it a good experience for *tooling* to do things on our platform - and as such, it'd be great if we could introduce a JSON output format that tools will be able to easily read and understand.

  It's likely this will form the basis for an MCP; and also allow `qk` to be used in scripts etc.

  Can you take a look and think about how this could be achieved? Feel free to explore other tools that have this functionality as inspiration.

Claude looked at how gh, kubectl, docker and Graphite handled JSON output, and came back with what seemed like a sensible plan. I wanted to make sure it was well-structured, so I fed back:

❯ So it feels almost like we want a very strong separation between the *qk cli API* and the *qk ink front-end* - with the qk cli API effectively being a *library* that could in principle have multiple different display front-ends (e.g. web interface, our lovely inkjs experience, a simple json wrapper that just serialises responses, or MCP-compatible output)

  Does this make sense?

Up to this point, all the functionality of qk was coupled heavily to the React components. This prompt created a plan for refactoring into 3 layers:

src/commands/<name>.tsx   # output:   React/Ink presentation; --json fast path
src/api/<domain>/...      # commands: high-level, sometimes multi-step
src/lib/<infra>/...       # plumbing: HTTP client, keychain, gcloud, op

The API layer is a light wrapper over our shared API types (generated with OpenAPI TS) that automatically handles authentication. It exposes the API directly, and can be used with qk api <method> <url> <body>.

The command layer provides an ergonomic interface for common (often multi-step) operations. It uses the API layer as a library. qk payment create, for example, needs to:

Look up a merchant by name, email, or UUID (fuzzy match, then confirm with the user)
Map arguments and default values to payment creation endpoint request fields
Call the payment creation endpoint

At the presentation layer, every command is a React/Ink component. This adds all the pretty output bells and whistles - colour, formatting, spinners, interactive prompts etc. When --json is set, we render a tiny <JsonOutput> component instead of the normal UI. It accepts a Promise (typically the command-layer function), awaits it, prints the result as JSON to stdout, and exits. The command-layer logic is the same in either mode; only the output differs.

qk payment create rendered with the interactive Ink UI — `qk payment create` — the default interactive Ink UI

qk payment create with --json output — `qk payment create --json` — same command, with json output

This split has paid off a few times since. For example, recently we replaced a chunk of hardcoded service metadata with dynamically-loaded catalog-info.yml files. The change was confined to src/lib/, with nothing in the rendering or commands layer needing to be updated.

Another example is how authentication evolved. It started super simple - just piggy-backing on the GCP and GitHub CLIs to fetch info about service deployment. It was frustrating having to re-authenticate with gcloud manually:

❯ 👋 Hello! When I run `qk service info core` I often have to re-authenticate with `gcloud auth login --update-adc`

  I wonder if we could automatically re-authenticate the user when necessary? Can you think about this please?

Claude presented a plan, and I fed back some additional context:

❯ It should be reusable. The intent is that `qk` will evolve into a one-stop-shop for engineers to do useful platform-related things - these will often involve GCP. For some further context, I also intend to use google identity tokens to obtain access tokens for other services - for example, Quidkey admin APIs. Would this change your approach?

This did change the approach from a one-off auth shim to something that could easily be reused by any command. Later, I wanted to add context switching:

❯ 👋 Hello! When obtaining or refreshing a token from the API, we can specify an optional *context* - which is a `partner:<id>` or `merchant:<id>` that effectively becomes the "default" scope for operations.

  The context can be requested on token creation or refresh - "changing context" involves obtaining a new access token - through the refresh endpoint if possible; otherwise by obtaining a completely fresh token set.

  You can see how this works in `Quidkey-core` - can you add support for *switching context* to `qk` please?

This led to qk api context set / qk api context show, with context preserved across token refreshes. With an additional prompt:

❯ nice! One thing I'd like to avoid is having to re-fetch the token every time.

  We can assume we're running on OS X - is there a way of storing the access token securely?

we ended up with secure token storage in Keychain, which avoids having to mint a fresh access token for every invocation.

Closing Thoughts #

Developing qk is the most hands-off I’ve been with a project. I’ve directly written none of the code myself, with my input primarily being requirements and broad architectural steers.

I’ve found that architectural decisions and refactoring often emerge naturally out of the conversations around small features, and that a little thoughtful prodding of Claude’s first draft plan can lead to it taking a much better approach.

So far, qk has saved me (and reportedly my colleagues) many hours of poking at web interfaces, looking up IDs and hand-crafting API requests, and it hasn’t cost much of my time. Using Claude Code, I can add features, fix annoyances and generally improve things in a brief conversation, often in the background without seriously interrupting whatever else I’m working on.

Would we have created it without LLMs? I think we probably should have, but experience at previous early-stage startups tells me that instead we’d have ended up with a mish-mash of ad-hoc scripts with no consistency in how they’re discovered, shared and used.