Mounting Kubernetes service account secrets for single user Jupyter notebook pod
I could not access the Kubernetes API from jupyter notebook pod, it took me some time to figure out.
In my post about manually deploy Jupyterhub on Kubernetes (https://medium.com/@kienmn97/manually-deploy-jupyterhub-on-kubernetes-for-a-single-machine-dbcd9c9e50a4), everything seems working just fine to the minimum extents. However, the problems came out when I tried more sophisticated functions. Hence, I decide to figure out the problems and update about them gradually.
Of course, “zero to jupyterhub k8s” provides very nice Helm
chart with fully supported configurations to deploy JupyterHub on Kubernetes so that we can easily “install and play”. I will try to follow their setup to mitigate the problems and see what is behind the scene.
As in my previous post above, my cluster is like this. There are 3 pods in the cluster: one for the hub, one for the proxy and one for the single user notebook server.
I was trying to connect to Kubernetes API server (“apiserver” for shorthand) in order to create a Spark Context (you may not need to care about Spark in this post) with some lines of code in Jupyter notebook (from single user notebook pod in the cluster).
import pyspark
conf = pyspark.SparkConf()# Try to connect to K8S API server
conf.setMaster("k8s://https://kubernetes.default.svc.cluster.local:443")# Set some config
conf.set("spark.kubernetes.container.image", "gcr.io/spark-operator/spark-py:v2.4.5")
...# Create a Spark Session
spark = pyspark.sql.SparkSession.builder.config(conf=conf).getOrCreate()
I encoutered these errors.
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
...
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
...
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
After searching around, I found out that I need to authenticate to the apiserver (at kubernetes.default.svc
DNS name) from a pod (in my case, from jupyter notebook server pod). More information is here: https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#accessing-the-api-from-a-pod
Follow an instruction to authenticate when connect to the apiserver in case of Spark, I need to include some more lines to my code.
conf.set("spark.kubernetes.authenticate.caCertFile", "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt")conf.set("spark.kubernetes.authenticate.oauthTokenFile", "/var/run/secrets/kubernetes.io/serviceaccount/token")conf.set("spark.kubernetes.authenticate.driver.serviceAccountName", "spark") # "spark" is a service account to deploy Spark on the cluster
Then, new errors occurred.
Caused by: java.io.FileNotFoundException: /var/run/secrets/kubernetes.io/serviceaccount/token (No such file or directory)
There was no requirement files in the pod. I walked around my cluster and realized that every pods has information about mounting service account, except the jupyter notebook itself.
It turned out that I missed a configuration (in jupyterhub_config.py
which is stored config for JupyterHub) for the KubeSpawner to spawn jupyter single notebook pod and mount service account secret files from Kubernetes. This is
config c.KubeSpawner.service_account = Unicode(None)
Now, set this config with the value of the service account that already existed in the cluster, rebuild the JupyterHub Docker image with modified config file and restart the pod to update new image.
Reference
- KubeSpawner documentation: https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html
- Jupyterhub config file from zero-to-jupyterhub-k8s repository by Jupyter: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/files/hub/jupyterhub_config.py
- Accessing K8S cluster: https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/
- Running Spark with Jupyter Notebook & HDFS on Kubernetes: https://kublr.com/blog/running-spark-with-jupyter-notebook-hdfs-on-kubernetes/