Prologue

What follows is really more of a journal entry than a blog post. It is a start at a collection of my thoughts on using Kubernetes (K8s). I'm planting this stake in the ground so I can see how my thoughts change over time.

Background on my world with regards to K8s

K8s has been the talk of the town among the physics/software community I've been working in. Many large institutions are switching to it to manage as many of their applications as they can. CERN being the largest and most well known of them. So, I've been looking forward to my chance to use it.

At RadiaSoft we deploy everything inside of docker containers. But, we've chosen to not use K8s to manage containers. In as few words as possible our reasoning is: K8s doesn't provide enough abstractions at a high enough level to actually solve our problem. We'd have to write a piece of software on top of it to speak a langauge above it for our apps. And we'd still need a configuration management tool like Ansible to actually configure our metal. So, we'd add a dependency (K8s) and not really be any closer to having our problems solved. Instead we wrote our own tool RSConf which speaks a level of abstraction that matches our domain. It is akin to ansible - we use it to configure our machines. It sets up docker and systemd (among many other things). Then those two (along with our custom job supervisor) do mostly what K8s would do for us.

Inside of Sirepo, our web-app for runing physics simulations, is where the job_supervisor lives. The job supervisor manages our "agents" which is where the actual simulation/computation occurs. In production we use agents running in their own docker container. For development (and quick prod setups) we run local agents which just run as a unix process. In prod we also run "sbatch agents" which are unix processes run on login nodes at super computers. Those agents then manage job submissions through the SLURM workload manager. So, we run our "top-level" services using docker/systemd and then the job supervisor has flexibility about the types of agents it creates and it does the management of them (ex restarting if they die).

So, that's where I stood going into using K8s. We have an app and it could be deployed in K8s instead of with systemd. And maybe parts of the job supervisor could even be replaced with K8s. Or at the very least, it could "speak K8s" so agents would be started as K8s pods. I was curious how everything worked and excited to get the chance to see.

My first real use of K8s

We've been working with a national lab who is making the move to K8s. They've setup a cluster and have a couple demo applications deployed. We're working for them on a contract to build a better dashboard for visualizing simulation results. Eventually we will run a "digital twin" of their system. This would be a simulated system that would run alongside the real life system and could be used by operators to help optimize the functioning of the real system. So, I have been tasked with deploying Sirepo inside of their K8s cluster.

Sirepo consists of multiple front-end servers, one job supervisor, and n number of agents that run the simulations for users. The database is split between sqlite and postgres. A few services but nothing that crazy.

Time to get coding...

The make make problem

The first thing I noticed about K8s is a common problem. I know it as the "make make problem". A phrase I learned from Rob. Tools like Autoconf and CMake exemplify the make make problem. They are abstractions on top of an underlying abstraction (Makefiles) to build a piece of software. When Makefiles (and deployment targets) become too numerous/large/cumbersome then people reach for tools like CMake which creates a higher level abstraction on top of the Makefiles. This recursion of making programs that make other programs to eventually get the actual program working is in some ways what software is all about, abstraction. But, in other ways it is a sign that the low level thing (ex Makefiles) don't offer enough abstraction to get the job done so something else needs to be used to make them useful.

To me K8s looks like a bunch of Makefiles. I'm being glib when I say that. K8s is operating at quite a high level of abstraction. But, the individual yaml files are the Makefiles. For Sirepo this resulted in: A namespace, two config maps, two secrets, a persistent volume, a persistent volume claim, one storage class, 2 deployments, 2 services, and one ingress. As I wrote out each file and started to see how everying in K8s tied together I could already feel myself wanting some tooling that generated the yaml files. Things likes having variables (ex port numbers) be shared between the files would be nice. Also, I just described only our alpha system. A copy of each one of those files (with minor differences) would be needed for beta and prod. I needed something to generate all of this. A "make make" is born.

In itself make make isn't a terrible problem. It is common and can be solved with programming. It happens whenever a generic tool like K8s is used. The creators can't possibly cover all use cases so they give a hammer and nails with some instructions and tell you to build the house. The problem with K8s is they don't give you a hammer and nails. They give you and entire consturction company but you're only allowed to communicate with the workers via smoke signals.

Aside: DevOps tooling

There seems to be an illusion around a lot of DevOps tooling that it will be useful without much work. I don't have specific citations I can point to but this is more a feeling I get reading articles and talking to peers. For example, no one would install Python and Flask and expect to have a working web app. Many hours of work are going to have to be put in using those tools to build an app. But, there is some idea one can install a tool like Ansible, write up a couple yaml files, and voila - all infrastructure is now managed and perfectly happy. The DevOps tools themselves lend themselves to this illusion. The fact that so many of them have a yaml interface adds to the idea that you won't have to "program" them.

YAML is not a programming language

Back to my point about smoke signals above: YAML is smoke signals. It is a severly lacking way to communicate anything with even remote complexity. I don't have beef with YAML itself - json, HCL, toml, it doesn't matter to me. All of them suffer from the same problem: they are configuration languages and not programming languages. The fact is, that setting up a K8s cluster and deploying an application involves some configuration but it also involves some programming. I think we are kidding our selves, and by extension making our lives harder, by thinking that it is only a configuration problem. All of the programming details can't be hidden by the tooling makers. A toolmaker must have a great deal of confidence that their tool solves every last problem if they are only going to give YAML as the language to interface with the tool.

In "Code Complete" Steve McConell advises one to program into your langauge not in your langague. By that I think he means to create the missing pieces of the programming langague that you need to bring the abstraction level up to solve your problem. The simplest example I remember from the book is to create an assert function if your language doesn't have one and you need it.

For K8s I want (at the very least) a "make maker" so I can program into K8s and avoid a lot of repetition. But, YAML is all I have to build this with. With YAML I am forever bound to the abstraction level offered by the tool. In the case of K8s that means I am doomed to problems like copying port numbers around because YAML (and K8s underneath it) doesn't really support variables. To take a real example: Our client is securing their system by whitelisting IPs. They are just copying the list of IPs between each ingress resource they deploy. They don't have something on top of all ingress resources to manage the list. It goes without saying that is quite fragile. If K8s had a programming interface instead of a configuration interface this problem would be more easily resolved.

YAML has constructs like variables but I strongly believe it is a misfeature and a code smell to start using them. Once you need variables then you need conditionals. And then loops. And then you just want a full programming language not a configuration language.

I don't think I'm the only one with this problem. Tools like cdk8s solves this exact problem. They offer a programming interface instead of a configuration interface. There are also tools like Helm and Kustomize that solve it in other ways but are still hamstrung by a YAML interface. But, when I need to use cdk8s to start creating the abstractions I need to actually make K8s useful I'm meeting my brethren in the land of JavaScript fatigue. I need tools for my tools and that 2:30 feeling is setting in. K8s better be offering me something really powerful to warrant this russian doll stacking of tools. For Sirepo I don't think it offers enough to warrant using it.

Conclusion

The make make / yaml problem doesn't make K8s bad per se. But, it just means that once you install K8s and write your first few yaml files the story has just begun. You will probably need (or at least I would certainly want) another layer of abstraction on top of it. Another layer with programatic control is key. That way I could program "into" K8s to build abstractions for my domain. At that point I think it would be a good place to pump the brakes and see if you need K8s at all. Does it really solve your problems? Could you get by with less? At 2am when the pager goes off the fewer tools I have between me and my code the better. But, maybe you do need it. Google certainly did. That's why they wrote it. But, few people are working on Google scale problems.

Epilogue

It is always easier to tear down than create. I've just taken my swings at K8s. I went from never using K8s to having our app running in about a day. Any tool that can do that I think is getting some of the abstractions right. Kubectl is cohesive and well designed. The introspection one can have into the environment is really nice. And there is a lot of polish on K8s. Polish that is nearly impossible to achieve outside of a large tool worked on by many people. There is a huge library of informative talks, blog posts, tutorials, and docs. So, all of those resources give a lot of support to move quickly and get help when you're stuck.

In an effort to do some of our own building we're considering adding support for Kubernetes to RSConf. So with just the right bits of YAML and Python we can program into Kubernetes.