What I’ve Learned About Kubernetes

-

So, I recently finished a course about Kubernetes (and Docker and Google Cloud): Scalable Microservices with Kubernetes. Unfortunately, it’s only been a couple of days since then, and I’m already not exactly sure about what I learned.

To start with, here are some basics:

  1. Docker manages containers
  2. Kubernetes manages containers like Docker and is declarative (more about that later)
  3. Google Cloud has a similar set of services to AWS, but of course with different names

Lastly, I’m more a fan of the command-line than of GUIs so there will be more of a focus on commands.

Docker

Docker manages containers. I originally thought that a container was simply a virtual instance of a computer (in other words a virtual machine or VM). However, that’s incorrect with regards to how a container is actually used.

If you use containers as if they were VMs, then that makes it hard to elegantly and efficiently build systems with or from containers. A VM has all of the services and resources that a normal computer has: does a container have all of those resources? A VM is “the world an application lives in”. Is a container a “full world”?

There were 2 things that I read that were changed my view about containers:

  1. A (well-known) post on Hacker News about how Docker contains an OS
  2. A paper by Burns and Oppenheimer (Google) about “Design patterns for container-based distributed systems

I would recommend reading both, but for those of who you just want the raw info, here it is:

  1. Containers do not need to contain a full operating system (there goes my VM analogy!)
  2. Development with containers has reached the point that containers are being used as units — like lego-blocks — in design patterns

So, what are containers?

 

Well, sure, what does Wikipedia say? Wiki says that containers are, in essence, “an operating system feature in which the kernel allows the existence of multiple isolated user-space instances“. (If you have a background in operating systems or linux, then reading about cgroups is also interesting).

Of course, it’s my blog post, so I’ll write about it if I want to. What would I say that containers are? Hmm…

Containers are containers. Hahahaha.. No, really. Containers are a place for (a piece of) software. And, to paraphrase Larry Wall (of Perl fame), that piece of software in your container is probably communicating with something else, so containers always have network interfaces or access to a shared/external filesystem.

In short, containers are powerful because you can use a container to both isolate (or “make modular”) a piece of software and run a piece of software. In other words, this is the next level of modularization (and thus reuse) and execution. Other similar mechanisms to a container in the past are:

  • compilers (which created a binary that could be run)
  • Java (isolation from the operating system)
  • Service Oriented Architecture (isolation/modularization per computer, group of services)
  • Microservices (isolation/modularization per function/domain)

Inherent in the modularization + execution model is increased scalability.

One of the key takeaways here is that you should not be putting multiple applications/pieces of software in 1 container. In fact, it seems like this might be an anti-pattern and “tight-coupling” in essence.

Actual Docker use

In terms of actual facts, commands and code that I learned:

  • Docker images are described in Dockerfiles.
    • Actually, a Dockerfile basically1 just contains
      1. the names of what you’re using,
      2. the commands to set it up (RUNENVEXPOSE, etc..) and
      3. the commands to run the actual “contained” application (CMD).
  • docker run runs the container, docker ps shows the docker processes, docker stop stops the container, and docker rm removes the container from the system.
  • There are a couple of sites that host Dockerfiles: you only store 1 Dockerfile per “repository”, although the versions may differ (but not the name).

Here’s an example Dockerfile:

FROM java:8
RUN javac MyJavaClass.java
CMD ["java", "MyJavaClass"]

Kubernetes

Kubernetes makes me a little bit sad. That’s how awesome it is. What makes especially me sad about Kubernetes is that it’s declarative.

If you look back at the big picture of what we as coders (sorry, “software engineers”) have been working on for the last 30 years, we’ve been 1. building systems for other people and 2. making it easier for us to do (1) faster.

So (2), “making it easier for ourselves”, is great for other people and it’s great for building stuff quickly. However, in my opinion, most of the really hard (and interesting) problems are part of (2). And Kubernetes solves one more problem that future generations will never really have to solve again, which is a little sad (but a lot awesome).

Kubernetes solves scaling with containers2.

Before we go further, if you’ve never looked at D3.js, then go do that first. You’ve already read this far and deserve a break. No, really, go away. Come back later if you’re still interested. Yes, I mean it. BYE!

[ 2 ] I’m officially calling this the first occurance of Rietveld’s theory: the less words it takes to explain something in non-technical language, the more complex and impressive that thing is.

Declarative programming

You’re back! Where was I? Uhm.. Kubernetes! and D3.js! Wait, What?!?

So, if you’re used to most (imperative) programming languages, whether it’s python, bash scripts or java, then you’re used to spelling out everything:

int s = 0;
for( i = 0; i < 10; ++i ) {
  s += i;
}

Declarative thinking means that you don’t say what you want to do, you just say what you want to do! (Hahahaha.. ). Okay, what I actually mean is that you specify the requirements of the task as opposed to describing the steps of the task. It’s a paradigm shift.

D3.js was the first time I explicitly ran into this way of coding: you specify what you want from D3 instead of how D3 should do it. Then I also learned that SQL was declarative as well. Oh.. Duh.

Back to Kubernetes: you tell Kubernetes what you want, not how K8s should do it. (Yeah, people use K8s as an abbreviation for Kubernetes.) So, for example, that K8s should create a load balancer in front of 3 (identical) instances of a specific container. You put that in a Kubernetes config file (in YAML format), and it does that.

What’s impressive to me is the amount of network “magic” that Kubernetes is doing under-the-hood. The problems Kubernetes solves are both hard and relatively new, which says something about how much research Google has been doing in the last 2 decades.

Pods and Services…

Kubernetes pods are groupings of containers. In Burns’s and Oppenheimer’s paper (about container design patterns), they write that multiple-container patterns that take place within one “node” are equivalent to a Kubernetes pod. An example would be the Sidecar pattern. The Sidecar pattern is 1 pod containing 2 containers: 1 container with a web server and 1 container that streams the web server’s logs to somewhere else.

Kubernetes services are basically a way to communicate between groups of pods. (Sort of like how an ESB makes sure that different components/webservices can easily communicate with eachother. )

Maybe a quick example of service configuration file will help:

kind: Service
apiVersion: v1
metadata:
  name: my-service
spec:
  selector:
    myLabel: MyKey
  ports:
  - protocol: TCP
    port: 80
    targetPort: 9376

Oops, I forgot to mention that K8s also has this concept of labels, which are key-value pairs you can attach to pods (among other things). So the my-service service will route communication (messages, packets, etc.. ) to all of the pods that have the mylabel label with a value of MyKey. Of course, there are other types of services (and thus other ways to define them), but you can go read up on that yourself!

…and Deployments

Lastly, Kubernetes also has the concept of deployments. Remember that I said that Kubernetes was declarative? A deployment is a little bit like a firewall rule. Deployments are mostly used to manage the clustering/load-balancing of pods.

The documentation states: “You describe a desired state in a Deployment object, and the Deployment controller changes the actual state to the desired state at a controlled rate“. It also helpfully lists some use cases for when you would use a deployment.

The reason that I compared a K8s deployment to a firewall rule is that the deployment is not only applied, Kubernetes remembers that you’ve specified this. So if you then start to do things with Pods or Services that don’t match up to what you specified in your Deployment, you’ll run into problems.

One last tip: Kubernetes calls clusters of pods “replica sets“: you have multiple “replicas” or (identical) copies of a pod in a cluster, which makes up a.. replica set. You can thus describe what type of replica set you want in your K8s deployment.

Actual Kubernetes Use

So, how would you actually use all of this?

  • kubectl is our main command.
  • kubectl apply -f my-deployment.yaml applies a deployment configuration (described in the my-deployment.yaml file)
  • If you have a docker image and just want to run it via K8s, then you can do kubectl run MyK8sDeplName --image=myDockerImageNameAndVersion. This command also specifies a deployment.
    • kubectl run takes all sorts of other options; for example, you can use --replicas=<num-replicas> to specify how many replicas you want in your deployment.
  • If you want more granular control over your K8s resources, you can also use kubectl create to create “resources” (pods, services, etc.).
    • You would usually do something like this: kubectl create -f my-service-config.yaml.
    • Here’s a slightly more complex tutorial that covers that.
  • Of course, it’s also helpful to be able to get information about the actual status of your K8s clusters. You can use the following commands to do this:
    • You can use kubectl get to get specific information. For example, kubectl get pods or kubectl get services to list all pods or services.
    • For more in-depth information, use the kubectl describe command. For example, kubectl describe pods/nginx.
  • Lastly, there is of course kubectl delete (which deletes stuff..).

There are a bunch more commands you can read about here.

Lastly, there’s a funny but very interesting talk by Kelsey Hightower here that uses Tetris to explain Kubernetes.

The Google Cloud Platform

To start with, I haven’t actually worked with any cloud providers yet, so this was my first hands-on experience with one.

It seems to me like most cloud providers are now grouping their services into 4 types:

  1. “Functions” or “Lambdas” which can be thought of as pieces of code.
    • AWS uses “Lambda” to describe their service.
    • Google Cloud calls theirs “Functions“.
  2. “Platforms” which, for us Java developers, mean “application containers”, roughly. For example, a tomcat or jetty instance.
  3. “Containers as a Service”, which is a cloud service to manage Docker and Kubernetes resources.
    • AWS has ECS (Elastic Container Service) and EKS (Elastic Kubernetes Service).
    • The Google Cloud service is called the “Kubernetes Engine“.
  4. And, of course, “Infrastructure as a Service” which is simply Virtual Machines in the cloud.
    • AWS has EC2 (Elastic Compute Cloud).
    • The Google Cloud service is called the “Compute Engine“.

Using Google Cloud

Right.

While there are a bunch of different menus available in the google cloud console, one of the primary ways to interact with your Google Cloud resources is via the “Google Cloud Shell”. In short, this is an in-browser shell that seems to take place on a (virtual) linux machine.

In the Google Cloud shell, you have your own home directory as well as a fairly standard linux path. Of course, there are some other commands available, such as the gcloudkubectl and docker commands. To tell the truth, it all seems to work magically. By that, what I really mean is that it’s not clear to me why those commands “just work” in the google cloud shell, but they do.

There were only really 4 (or 5) commands I used with Google Cloud in the shell:

  • gcloud compute instances create <instance-name> [OPTIONS] creates a Google Compute Engine instance, which is a VM.
    • Among other options, you can specify what type of OS you want with various options.
  • gcloud compute ssh <instance-name> allows you to ssh to the Compute Engine instance.
  • You need gcloud compute zones list and gcloud config set compute/zone <zone> to set the timezone that an instance is in.
  • Lastly, for Kubernetes, I used gcloud container clusters create <kubernetes-cluster-name> to create a Google Kubernetes Engine instance (a.k.a. a Kubernetes cluster on the Google Cloud).

 

And folks, that’s all I wrote!

Image sources