Manual deployment of JupyterHub on Kubernetes for a single machine

Mai Ngoc Kien
6 min readApr 12, 2020

--

In this post, I will install Jupyterhub on Kubernetes step by step and manually. This gives me a chance to understand some parts of mechanism behind the scene of a well-known application.

1. Technical overview

1.1. Jupyterhub

JupyterHub brings the power of notebooks to groups of users. It is a set of processes that together provide a single user Jupyter Notebook server for each person in a group. There are three major subsystems in JupyterHub:

  • Hub (Python/Tornado): manages user accounts, authentication, and coordinates Single User Notebook Servers using a Spawner.
  • Proxy: the public facing part of JupyterHub that uses a dynamic proxy to route HTTP requests to the Hub and Single User Notebook Servers. configurable http proxy (node-http-proxy) is the default proxy.
  • Single-User Notebook Server (Python/Tornado): a dedicated, single-user, Jupyter Notebook server is started for each user on the system when the user logs in. The object that starts the single-user notebook servers is called a Spawner.
Technical overview. Picture from https://jupyterhub.readthedocs.io/en/stable/reference/technical-overview.html

Users access JupyterHub through a web browser, by going to the IP address or the domain name of the server.

The basic principles of operation are:

  • The Hub spawns the proxy (in the default JupyterHub configuration).
  • The proxy forwards all requests to the Hub by default.
  • The Hub handles login, and spawns single-user notebook servers on demand.
  • The Hub configures the proxy to forward url prefixes to single-user notebook servers.

1.2. Kubernetes

Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications.

Containers are a technnology for packaging the (compiled) code for an application along with the dependencies it needs at run time. They are similar to VMs, but they have relaxed isolation properties to share the Operating System (OS) among the applications. Therefore, containers are considered lightweight. Similar to a VM, a container has its own filesystem, CPU, memory, process space, and more. As they are decoupled from the underlying infrastructure, they are portable across clouds and OS distributions.

A container image is a ready-to-run software package, containing everything needed to run an application: the code and any runtime it requires, application and system libraries, and default values for any essential settings.

kubectl is a command line tool for controlling Kubernetes clusters.

Minikube is a tool that makes it easy to run Kubernetes locally. Minikube runs a single-node Kubernetes cluster inside a Virtual Machine (VM) on a computer.

1.3. Docker

Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Container images become containers at runtime and in the case of Docker containers — images become containers when they run on Docker Engine.

2. Setting up Kubenetes cluster

In this post, I will install a single node Kubenetes cluster using Minikube. I need to install kubectl, minikube and start running the cluster.

I follow the instruction from Kubernetes to install kubectl and minikube. I also install Virtual Box to run minikube in. Starting the cluster by the command.

$ minikube start --driver=virtualbox --memory 8192 --cpus 4

The memory and cpus flags are set depending on your machine.

3. Preparing Docker images

There are 3 important images.

3.1. Jupyterhub

I prepare a Dockerfile, build it and push it to my Dockerhub. The code is available here.

$ docker build . -t {dockerhub-username}/{image-name}:{image-tag}
$ docker push {dockerhub-username}/{image-name}:{image-tag}

Some thing to note is the jupyterhub_config.py which contains configuration for the jupyter hub. I will point out some settings.

# Define spawner class which spawns single-user notebook servers
c.JupyterHub.spawner_class = 'kubespawner.KubeSpawner'
# Connect to a proxy running in a different pod
c.ConfigurableHTTPProxy.api_url = 'http://{}:{}'.format(os.environ['PROXY_API_SERVICE_HOST'], int(os.environ['PROXY_API_SERVICE_PORT']))
c.ConfigurableHTTPProxy.should_start = False
# Ip address and port which is used to access the jupyter hub from browser
c.JupyterHub.ip = os.environ['PROXY_PUBLIC_SERVICE_HOST']
c.JupyterHub.port = int(os.environ['PROXY_PUBLIC_SERVICE_PORT'])
# the hub should listen on all interfaces, so the proxy can access it
c.JupyterHub.hub_ip = '0.0.0.0'
# Gives spawned containers access to the API of the hub
c.JupyterHub.hub_connect_ip = os.environ['HUB_SERVICE_HOST']
c.JupyterHub.hub_connect_port = int(os.environ['HUB_SERVICE_PORT'])
# Authentication, using dummy for testing
c.JupyterHub.authenticator_class = 'jupyterhub.auth.DummyAuthenticator'
c.DummyAuthenticator.password = "some_password"

Where is environment variables? Kubernetes will set up them automatically for us. For example, later on, I will create a Service named proxy-api and K8S will assign environment variables named PROXY-API-SERVICE-HOST and PROXY-API-SERVICE-PORT.

3.2. Configurable-http-proxy

This Docker image is available here. I need to remember the image name and tag.

jupyterhub/configurable-http-proxy:latest

3.3. Singleuser jupyter notebook

You can find a notebook image here and put the name and tag to the jupyterhub-config.py for KubeSpawner. For example:

c.KubeSpawner.image = "jupyter/pyspark-notebook:latest"

4. Deploy on Kubernetes

At this point, I can only deploy using kubectl command line and manually. I prepare yaml files (available here) which contains configuration for the deployment, service, etc. Later, I will try to understand Helm to make the deployment automatically.

  • Services

I need to create services for the proxy and the hub. This will allows jupyter-hub and proxy can interact with each other via their address. As I mentioned earlier, Kubernetes will assign environment variable for the services host and port. To create services, run the command

$ kubectl create -f proxy/service.yaml
$ kubectl create -f jupyter-hub/service.yaml
  • RBAC

As I understand, I need to create RBAC on Kubernetes to give permission for the account to read and write on my pods/resource.

$ kubectl create -f jupyter-hub/rbac.yaml
  • Deployment

I should define a security token for both the hub and the proxy. I set the environment variable CONFIGPROXY_AUTH_TOKEN (in the deployment file of both of them) to a random hex string representing 32 bytes. Generate the token by running the command and copy the output:

$ openssl rand -hex 32

Now, I create the deployment which results in creating pods.

$ kubectl create -f proxy/deployment.yaml
$ kubectl create -f jupyter-hub/deployment.yaml

The source code is available on Github: https://github.com/KienMN/JupyterHub-on-Kubernetes

5. Deployment confirmation

If there is no error, run the command and see the result

Available services in the Kubernetes cluster. The JupyterHub web UI is now on port 32502

In the web browser, access to the address IP-address-of-the-cluster:32502 (in my case, the address is the address of minikube cluster, shown by command $ minikube ip). Login with any username and password “some_password” as we defined in the authentication section of jupyterhub_config.py file.

Web UI for Jupyter Hub

Now, there are 3 pods in the cluster: one for the hub, one for the proxy and one for the single user notebook server.

Pods in the cluster. Single user notebook pod (jupyter-jovyan pod) is deployed only after user log in and spawn a notebook server.

Reference

--

--

Mai Ngoc Kien
Mai Ngoc Kien

No responses yet