EngineeringMay 22, 20266 min read

Why most startups over-engineer too early

It's counterintuitive coming from an engineering agency, but here it is: the most expensive technical mistake we see startups make isn't building too little. It's building too much, too soon. They adopt tools designed for problems they don't have yet, burn months on infrastructure they won't need for years, and end up with a system that's harder to change than the one they were trying to avoid.

Kubernetes at 50 users

We see this constantly. A founding team spends their first three weeks setting up a Kubernetes cluster - writing Helm charts, configuring ingress controllers, setting up horizontal pod autoscaling, debugging networking policies, and wrestling with persistent volume claims. Three weeks before they have a single paying customer.

Kubernetes is extraordinary technology. It solves real problems: automated container orchestration, service discovery, rolling deployments, self-healing workloads, and resource management across a fleet of machines. These are genuine needs when you're running 20 services across multiple availability zones serving millions of requests. They are not genuine needs when your entire application is one API server and a database serving 50 users.

What to do instead: a single VPS. Seriously. A $20/month virtual private server running your application behind Caddy or Nginx, with a managed Postgres database from your cloud provider. Deploy with rsync and systemctl restart. Yes, it's unsophisticated. It also takes 20 minutes to set up instead of 3 weeks, costs $20/month instead of $200/month, and handles far more traffic than you think - a well-written API on a modern VPS can serve 5,000+ concurrent connections without breaking a sweat.

When Kubernetes actually makes sense: when you have multiple services that need independent scaling, when you're deploying to multiple regions, when your team has grown enough that you need standardized deployment patterns across many projects, or when your traffic patterns are spiky enough that autoscaling saves meaningful money. For most startups, that's Series B territory - not pre-launch.

Microservices with 2 engineers

The microservices pitch is compelling: small, independent services that can be developed, deployed, and scaled independently. Netflix does it. Amazon does it. It must be the right architecture.

What gets left out of the pitch is the cost. Microservices require service discovery, inter-service communication (and decisions about whether that's synchronous HTTP, asynchronous messaging, or gRPC), distributed tracing, independent CI/CD pipelines for each service, API versioning between services, network policies, retry logic with exponential backoff, circuit breakers, and a strategy for distributed transactions. Each of these is a nontrivial problem. Together they represent months of infrastructure work that has nothing to do with your product.

Netflix has thousands of engineers and a dedicated platform team that builds and maintains the tooling that makes microservices viable. You have two engineers and a deadline.

What to do instead: a monolith with clean internal boundaries. One deployable unit, but with clear module separation. Your user authentication logic lives in auth/, your payment processing in payments/, your notification system in notifications/ - each with defined interfaces and no direct database access across boundaries. This gives you the organizational benefits of microservices (clear ownership, separated concerns, testable interfaces) without the operational overhead. And when you eventually need to extract a service - because the notification system genuinely needs independent scaling - the clean boundary makes that extraction straightforward instead of surgical.

When microservices actually make sense: when your monolith's deploy cycle is bottlenecking multiple teams, when specific components have fundamentally different scaling requirements (your real-time chat server needs 100x the instances of your billing service), or when you need polyglot persistence - different services genuinely need different database technologies. These are real problems. They're just not problems you have with 2 engineers and 200 users.

GraphQL federation for one client app

GraphQL is a powerful query language that lets clients request exactly the data they need in a single request. Federation extends this by composing multiple GraphQL services into a unified schema, allowing different teams to own different parts of the graph independently.

The problem: you have one React app, one mobile app, and one backend. There's no graph to federate. What you have is a simple client-server relationship where the client needs data and the server provides it. GraphQL adds a query parser, a schema definition layer, a resolver architecture, a type system that needs to be kept in sync with your database models, N+1 query problems that require DataLoader patterns to solve, and a caching story that's significantly more complex than HTTP caching because every query is a POST request with a unique body.

What to do instead: REST with thoughtful endpoint design. A well-designed REST API with consistent naming, proper HTTP status codes, pagination, and maybe JSON:API or a similar spec for standardized response envelopes gives you everything you need. Your endpoints map cleanly to HTTP caching. Your API is debuggable with curl. Your documentation is a list of endpoints, not a schema definition language. If you find yourself building endpoints that return too much data because different clients need different fields, that's a real problem - but the solution might be sparse fieldsets (?fields=id,name,email) rather than a query language.

When GraphQL actually makes sense: when you have multiple client applications with genuinely different data requirements (a mobile app that needs minimal payloads, a web dashboard that needs rich nested data, a partner API that needs a subset), when your data graph is complex enough that clients regularly need to traverse relationships in ways your REST endpoints don't anticipate, or when multiple backend teams own different data domains and need a unified API layer. If you're a single team with a single client, GraphQL is overhead.

Event sourcing for a CRUD app

Event sourcing is an architectural pattern where you store every state change as an immutable event rather than overwriting the current state. Instead of a users table with the current state of each user, you store a sequence of events: UserCreated, EmailChanged, PasswordReset, SubscriptionUpgraded. The current state is derived by replaying the events. This gives you a complete audit trail, the ability to reconstruct the system at any point in time, and the ability to derive new read models from historical events.

It also gives you eventual consistency (your read models lag behind writes), projection management (building and rebuilding read models from event streams), event versioning (what happens when the schema of an event changes), event store scaling, and a mental model that most developers find genuinely difficult to reason about. Debugging "why does this user have the wrong subscription tier" changes from "look at the row in the database" to "replay 47 events and figure out where the projection diverged."

What to do instead: a relational database with an audit log. Postgres gives you ACID transactions, a mature query language, JSON columns for flexible data, and excellent tooling. Add a separate audit table that records who changed what and when - a simple trigger or application-level middleware that logs every mutation. You get 90% of the traceability benefit of event sourcing with 10% of the complexity. Your system is consistent. Your data is queryable with SQL. Your developers can debug issues by looking at a table instead of replaying event streams.

When event sourcing actually makes sense: when your domain genuinely requires temporal queries ("what was the portfolio value at market close on March 15?"), when regulatory compliance demands a tamper-proof audit trail (financial services, healthcare), when you have multiple read models that need to be derived from the same stream of facts, or when your domain events are themselves the product (event streaming platforms, activity feeds). A SaaS app that manages user subscriptions and sends invoices is not this domain.

The pattern behind the pattern

Every example above shares the same root cause: optimizing for problems that don't exist yet. Not problems that might exist someday, but problems that the team has never actually experienced. They read about Kubernetes in a blog post, not in a post-mortem. They adopted microservices because of a conference talk, not because their monolith's deploy cycle was blocking four teams.

There's a term for this: resume-driven development. Choosing technologies because they look good on a resume rather than because they solve a problem the project actually has. It's not always conscious. Engineers are drawn to interesting problems, and Kubernetes is more interesting than rsync. Event sourcing is more intellectually stimulating than a Postgres table with an audit log. But "interesting" and "appropriate" are different things.

The skill isn't knowing how to use Kubernetes. Any engineer can learn that in a week. The skill is looking at a system with 50 users and saying "you need a VPS and a database, and you need to spend the three weeks you would have spent on Kubernetes building features instead." The skill is knowing that the simple solution isn't the lazy solution - it's the correct solution at this scale, and the complex solution will still be available when scale demands it.

This is what separates engineering from development. Development is the ability to build things. Engineering is the ability to make good decisions about what to build. The best engineers we've worked with share one trait: they reach for the simplest tool that solves the problem, and they have a clear, specific answer for when the complex tool becomes necessary.

If your answer to "why Kubernetes?" is "because we might need to scale someday," you're over-engineering. If your answer is "because our deploy pipeline serves 14 services across 3 regions and we need automated rollbacks with canary analysis," you're engineering. The difference is specificity. Real problems have specific symptoms. Imaginary problems have vague futures.

Kaev builds systems matched to their actual requirements - no more, no less. If you're not sure whether your architecture is right-sized for your stage, we're happy to take a look.

Back to blog