A developer posted on Hacker News recently about fine-tuning Qwen 3 at 0.6B parameters to categorize support questions. Tiny model. Specific job. The results beat what they were getting from much larger general-purpose models, faster and cheaper.

This keeps happening. And it keeps surprising people.

The instinct, especially for non-technical founders, is to pick the most famous AI. GPT, Claude, Gemini. Big name, big model, must be better. For some tasks, sure. For a lot of others, you're paying premium prices to get worse results than something purpose-built for the actual job.

The Swiss army knife problem

A general LLM is a Swiss army knife. It can do almost anything passably. But if you're actually carving wood, you want a chisel. If you're cutting bread, you want a bread knife.

The strategy mistake I see all the time: people pick AI tools based on what sounds impressive, then wonder why the output is mediocre. They've bought a Swiss army knife and they're trying to cut down a tree with it.

For a categorization task, a 600M-parameter model fine-tuned on your data will beat a 70B general model nine times out of ten. For code generation specific to one framework, a smaller model trained on that framework's actual patterns will smoke the generic one.

How to actually pick

Three questions, in order:

1. How narrow is the job? The narrower the task, the more specialized your tool should be. "Write me anything" needs general. "Categorize incoming support tickets" needs specialized.

2. Who owns the training data? If a tool was built by people who actually understand the domain, that shows up in the output. Generic chatbots wrapped in a domain UI usually feel generic.

3. What happens at scale? Cheap and right on the first try beats expensive and almost-right. Run the numbers on a thousand requests, not one.

Why this matters for app building

This is why we built DontCode the way we did. The AI inside it is fine-tuned specifically for building apps. Not a wrapper around someone else's chatbot.

Ask it to add KakaoPay to your checkout and it doesn't guess. Describe a form and where it should live and it wires it into the submission store the way the platform expects. Say "I want a daily digest email at 8am for active users" and it knows about cron jobs, email sending, the auth analytics tables, and the actual recipe for that pattern.

A general LLM doesn't know any of this. It writes the same generic code it writes anywhere, and you spend the next two hours debugging why it doesn't work.

The takeaway

Bigger is not always better. Famous is not always better. The right tool for the job is better. When you're picking AI to run a real part of your business, pick something built for that job, not the model with the loudest marketing.

Want to see what specialized AI feels like? Try DontCode and build something this weekend. More posts like this on the blog.