CS207 Systems Development for Computational Science

Lecture 25: Kubernetes

Thursday, December 3rd 2019

Harvard University
Fall 2019
Instructors: David Sondak
Guest Lecturer: Pavlos Protopapas


Recap I: The Goal

  • Remember that we want to find effective ways to deploy our apps.
  • This proves to be much more difficult than we might initially imagine.
  • We might develop our app on a specific hardware and OS.
  • When we release our app, there may be many problems for our clients:
    • Different operating systems
    • Different package dependencies
    • Even strange behavior on different types of hardware

Recap II: Virtual Environments

  • We first tried to solve this with virtual environments:
    • virtualenv, conda, etc.
  • This was a good start:
    • For example, it removed some of the complexities associated with package dependencies.
  • However, it didn't solve some of the big, deeper problems:
    • Operating system and hardware questions.
    • Doesn't provide true emulation of the correct environment.

Recap III: Virtual Machines

  • We learned that virtual machines were an excellent option.
  • A virtual machine is a machine running virtual hardware
  • It contains its own operating system.
  • It provides true isolation, which is important for security.
  • Many other benefits including huge cost savings
    • No need to have many separate servers.
  • Unfortunately, VMs can be heavy and slow.
    • May be overkill for many applications.

Recap IV: Containers

  • Next, we explored containers (mainly Docker).
  • We learned that containers are like VMs, but not as isolated.
  • They contain a pretty isolated environment (OS, filesystem, etc), but they're not a full machine.
  • In this way, containers are lightweight but still provide a mechanism for apps to be run across platforms.
  • With this in mind, one can envision a "container swarm"
    • Containers for all the apps!

How can we manage such a thing?!

Enter Container Management

  • So now everyone is developing their apps and bundling them in containers.
  • For best resource management and portability, applications should be run in a container.
  • These apps are available in a production environment (like a cloud provider, such as AWS, Google Cloud, or Microsoft Azure).
  • People are using these containers (launching them, closing them, upgrading them).
  • Sometimes containers go down due to an overload of requests or processing.

Managing these containers manually is difficult in a large-scale production environment.

Enter Kubernetes (AKA K8s)

  • K8s does all this management for you!
  • K8s is an open-source platform for container management developed by Google.
  • It allows users to define rules for how container management should occur, and then it handles the rest!
  • Check out kubernetes.io for more info and in-depth tutorials.

Kubernetes can do many useful things!

  • Service discovery
    • Give users/clients a single entry point to your application through a single URL.
    • Provides encapsulation for the more sensitive parts of your application, like databases and passwords.
  • Load balancing
    • We do not want one container to be overloaded with too many requests.
    • Increase the number of containers based off of utilization.
    • More efficient use of hardware because multiple containers are run on the same machine, which reduces costs.
    • Distribute network traffic to many container for stable production run-time.
  • Self-healing
    • Restart failed containers.
    • Kill unresponsive containers.
    • Upgrade containers to a newer image.
  • Much, much more. See this link for a more detailed discussion of the benefits of Kubernetes.

The very basics of Kubernetes: The cluster

  • K8s works on a cluster of machines/nodes. This could be VMs on your local machine or a group of machines through a cloud provider.
  • The cluster includes one master node and at least one worker node.

The very basics of Kubernetes: The master node

  • Main task: manage the worker node(s) and the containers need to run an application.
  • The master node consists of:
    • An API server.
    • A scheduler to assign a worker node for a new application.
    • Controller manager:
      • Keeps track of worker nodes
      • Handles node failures
      • Replicates containers if needed
      • Provide endpoints to access the application from the outside world
    • Cloud controller
      • Communicate with cloud provide regarding resources such as nodes and IP addresses.

The very basics of Kubernetes: The worker node(s)

  • A worker node consists of:
    • The container runtime (Docker).
      • Package your app into a Docker container and push to a registry.
      • K8s pulls a specified image and deploys it on a worker node.
    • kubelet --- Talks to the API server and manages containers on its node.
    • kube-proxy --- Load-balances network traffic between application components and the outside world.

A Kubernetes Overview: Running an Application in Kubernetes

  • Post a description of your app to the Kubernetes API server.
    • The description includes:
      • Repository for Docker images
      • Relationship between containers
      • Which ones need co-location (as in on the same machine)
      • How many replicas are needed for each container
  • Internal or external network services are also described.
    • A lookup service is provided and a given service is exposed at a particular IP address.
    • kube-proxy makes sure connections to the service are load balanced.
  • master makes sure that the deployed state of the application matches description.

Exercise