Notes for: TGIK 007-009: Building a Controller

This goes through the 3 part series of building a Controller for k8s.

A controller is a service that watches Kubernetes resources and does stuff based on that. It can also create objects based on events. Basically a service that talks with k8s and does stuff.

Getting the sample controller running

Start a k8s cluster first. Fork the repo and clone it into your laptop. Rename the instances of jbeda with your danielfrg. My fork is danielfrg/tgik-controller.

Other things I had to do to get it running:

  • The project was using godep and had to move it to using dep
  • Import the gcp auth package since I was using a GCP cluster

Then I was able to build it and run it:

export KUBECONFIG=~/.kube/config go run *.go

Note that my fork includes the PodInformer so it will actually do what happens in the video. origin/master won't do that since that coded changed during the 3 episodes. Now you can create a pod and see the logs on the Controller:

kubectl run kuard --generator=run-pod/v1 --image=gcr.io/kuar-demo/kuard-amd64:1 kubeclt delete pod kuard 2018/07/16 22:49:02 onAdd: default/kuard 2018/07/16 22:49:02 onUpdate: default/kuard 2018/07/16 22:49:02 onUpdate: default/kuard 2018/07/16 22:49:03 onUpdate: default/kuard 2018/07/16 22:50:54 onUpdate: default/kuard 2018/07/16 22:50:54 onUpdate: default/kuard 2018/07/16 22:50:54 onUpdate: default/kuard 2018/07/16 22:50:59 onUpdate: default/kuard 2018/07/16 22:50:59 onDelete: default/kuard

Going through the code

The main library that the code uses is the k8s client-go. The Go lib is the most updated one and its more than just a REST client. It has nice utilities to connect based on KUBECONFIG files or internal secrets for service accounts. Provides some type safety for dealing with the k8s types, no raw JSON. One of the big things that it has is the ability to do a watch on objects (Pods or others), it keeps a cache on the objects that it has seen from the server (reduces load on the server). There is utilities to write controllers in a responsible way, like sending queries to the cluster to a queue that can do smart stuff like collapsing items that changed multiple times.

The Informer is the object that keeps the cache of k8s objects and informs when objects change. The second parameter is a period for the Informer to go and fetch everything that is watching, common times is 24h but really depends. All the k8s controller are Level set, so if they fail they will do a resync (check the world again) of they objects they are watching because something might have changed when they were down. It doesn't have to be that way if the controller is based on events.

On the actual controller we have objGetter (goes directly to the cluster) and objListener (reads the shared cache).

Ideas for controllers:

  1. (Picked this one) In a shared dev cluster, take the secrets from one namespace and duplicated them in the other namespaces
  2. Automatically create services for each deployment based on an annotation. Probably not a great idea for production but maybe for dev

If we only handle the Events (onAdd, onUpdate, onDelete) if the controller is down for a minute and something changes in the source then we will lose that notification and not propagate the change. We will fix that later.

Episode 007 ends with a simple code to read the events and some logic to parse the different types of Secrets from a source Namespace and needing to write the code to actually copy them. Episode 008 does continues from there.

SharedInformers are "shared" because the k8s Controller Manager there is a whole bunch of Controller that had its own informers, big clusters this took a lot of memory and the SharedInformes allows those controllers to share the cache.

This episode starts by changing some code given that there can be a bunch of edge cases for the synchronization controller we are writing. So Joe changes the idea and goes from a Event based more type controller to writing code that starts from scratch every time by getting source Secrets and the list of namespaces and makes sure it copies them correctly. Then on the controller on any even just run the sync code.

This if the doSync  and SyncNamespace functions in the final code. It relatively simple to write this, see here. One things to note is to take advantage of the great go client lib that has a bunch of nice utilities like scheme.Scheme.DeepCopy to copy objects.

The UUID is unique in space and time. Even delete an object and create it again it will come up with a new UUID, its the canonical way to identify an object.

Episode 009 focuses on finishing the Controller. It starts by extending the Controller to not be based only on Secrets but also Namespaces, so if a new namespace is added we can add the secrets to it. This just calls the sync function.

Tip: Be sure that the Controller waits for the caches to sync (cache.WaitForCacheSync(...), we don't want to run anything until we know the status of the cluster.

Then Joe adds a queue based on workqueue.NewNamedRateLimitingQueue so it doesn't just hit the server all the time for each request. So every time an action happens an item gets added to the queue.

The Run function now has a defr func() (so that it gets executed at the end of the Run function regardless of how that ends) that shutdown the queue and waits for all workers to be done.

Now we need a worker to read this queue and actually do something. We do this using a gorutine and just going go func() { ... }. In the worker we just call a processNextWorkItem() that we write.

In this processNextWorkItem  func we need to be sure to mark the key as processed using defer c.queue.Done(key)

Code for episode 009 is this section.

Links