On LLM-powered coding

Each new iteration of LLMs gets its coding capabilities ‘upgraded’, ’enhanced’ and more with great results from some software engineering scoring system. Everytime I see this I get excited and curious, but it always ends in disappointment for me. I find it a bit difficult to describe why exactly, but I’ll try to get some of my thoughts written up here.

I’ve had some great successes with LLMs in my work. It is great at finding bugs in my work when I give it some of my code and logs, and at the very least: very fast. It is also very strong at generating configuration files based on examples. This is also where I find that Claude Code is quite strong.

However, when I need more work done, I feel like I’m doing solo development with an annoying neighbour. This Claude guy keeps talking about stuff that is not important. Just like the other day when I was developing a gRPC service which uses long-lived connections for streaming data between clients that are connected to it. It worked great when I directly connected to the port, but as soon as I hooked it up to my local development setup it failed.

The local development setup routes traffic through Traefik and Docker networks as much as possible. This gives the local environment many of the features that I use in the Kubernetes environments. That is, stuff like: hostname resolving for hooking up related services (dnsmasq), https (with mkcert for a self-signed certificate), authentication (with Keycloak and OAuth2 Proxy), and more services. I use a custom tool that orchestrates the different dependencies per project, so it is admittedly a bit difficult to give the LLM the full context of development. But I think that’s the case in any environment, not just mine.

Given the complexity went from client using gRPC to connect to server to basically client using HTTP2+gRPC to connect to server I figured it would be quite easy to solve this. My initial prompts gave Claude the idea I should test using grpcurl in order to test the connection. I followed its instructions, but that didn’t work. So I fired up Claude Code and let it fiddle around a bit.

Claude tried to use grpcurl, but couldn’t even get the arguments of the command right. Later I discovered that the suggested tool could not handle the HTTP2-layer in the connection, and the gRPC server needs to enable reflection in order to be used with clients like this. Claude added some middlewares to the docker-compose.yaml file to try to solve the issue with the client. The tools it suggested would only work with directly connecting to the service, without Traefik in between, which was exactly what I was trying to prevent, because that would not be like a real-world implementation. I gave up on Claude here and went back to searching online and reading the documentation.

During my investigation I found that I had to configure the Treafik loadbalancer to use the h2c scheme and add some code to the client (basically WithTransportCredentials combined with a server name). After that I could use my test client to connect to the gRPC server without issue, through Traefik. However, there was a new issue: after exactly 1 minute the connection between the client and the server would be dropped, unlike without Traefik. I figured this would be a good one for Claude.

I asked Claude about the issue, and again it spit out some middleware for Traefik. This had no effect, and Claude once again suggested connecting directly to the gRPC server without Traefik. I prompted it some more, explaining that I must have Traefik there (because that simulates the NGINX Ingress Controller that I use on Kubernetes). That made Claude generate Golang code for the server and client. The generated code was neat and showed how to configure the client/server to handle connection age/timeouts. However, the code was clearly not going to solve the root cause of the issue as it only affected the connections through Traefik. So, I went back to doing it myself.

I discovered that the Traefik endpoints can be configured with different timeouts, not unlike the timeouts that Claude generated, but in this case for Traefik itself. I configured the timeouts to 0s and solved my issue. Now I have a functional local setup for the service, but I can’t say how much added value Claude was here.

For some reason I often seem to end up at a place where my main takeaways for using LLMs are: sometimes they can be great, and sometimes it is just a nuisance. But I can’t seem to determine which side it is going to end up before I realize that the LLM has become an obstacle instead of a tool for my use case.

I keep returning to a thought that by how I suspect that LLMs trained, they are unable to generate code for problems that are closer to real life. I reckon that a lot of tutorials, blog posts, quickstart guides, and other types of documentation/forums/etc. are the bulk of the training data, and thus lacking more architectural components. So maybe: real-life implementations require much more context than LLMs can handle?