Install PrimeHub Community on MicroK8S Single Node (Ubuntu)

This document will guide you to install MicroK8s on a single node and PrimeHub Community with a easy script.

Provision a cluster

MicroK8s supports multi-platform, we demonstrate it in the following spec:

Ubuntu 18.04 LTS
Kubernetes 1.19 version
IP address: EXTERNAL-IP
Networking: allow port 80 for HTTP

Requirement

Git

Please follow the os-specific command to install git command

cURL

cURL is a command-line tool that allows us to do HTTP requests from shell. To install cURL, please follow the os-specific method. For example.

Ubuntu

sudo apt update
sudo apt install curl

RHEL/CentOS

yum install curl

Clone PrimeHub Repository

git clone https://github.com/InfuseAI/primehub.git

Install PrimeHub required binaries

./primehub/install/primehub-install required-bin

This will install the required commands onto ~/bin. You should append the ~/bin to your PATH variables, or use the following command to append and read from the .bashrc

echo "export PATH=$HOME/bin:$PATH" >> ~/.bashrc
source ~/.bashrc

Install MicroK8s Single Node

We provide a install script which makes the installation much easier to create a MicroK8s-single-node Kubernetes.

Run the create singlenode command:

./primehub/install/primehub-install create singlenode --k8s-version 1.19

After the first execution, you will see the message. Because it adds the user to microk8s group and needs to relogin:

[Require Action] Please relogin this session and run create singlenode again

After relogin, run the same command again to finish the single-node provision:

./primehub/install/primehub-install create singlenode --k8s-version 1.19

During the installation, you might run into troubles or need to modify the default settings, please check the TroubleShooting section.

Quick Verification

Access nginx-ingress with your EXTERNAL-IP:

curl http://${EXTERNAL-IP}

The output will be 404 because no Ingress resources are defined yet:

default backend - 404

Configurations

Configure GPU (optional)

Download and install Nvidia GPU drivers from official website
Enable GPU feature

Please be aware that if MicroK8s v1.21, you need to modify the default_runtime_name and DO NOT enable GPU feature by microk8s enable gpu

if MicroK8s is prior to v1.20 (<= 1.20), enable gpu feature by default
```
microk8s.enable gpu
```

if MicroK8s is 1.21, please follow the steps

Modify the file /var/snap/microk8s/current/args/containerd-template.toml and manually change the default_runtime_name to to nvidia-container-runtime

# default_runtime_name is the default runtime name to use.
- default_runtime_name = "${RUNTIME}"
+ default_runtime_name = "nvidia-container-runtime"

Restart the microk8s
```
microk8s stop
microk8s start
```

Install Nvidia Device Plugin by helm

helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update
helm install -n kube-system nvidia-device-plugin nvdp/nvidia-device-plugin

Verify GPU

kubectl describe node | grep 'nvidia.com/gpu'

Verify within MicroK8s cluster

deviceplugin_pod=$(kubectl -n kube-system get pod | grep nvidia-device-plugin | awk '{print $1}')

kubectl -n kube-system exec -t ${deviceplugin_pod} nvidia-smi

Ref: https://github.com/NVIDIA/gpu-operator/issues/163#issuecomment-794445253

Configure snap (optional)

snap will update packages automatically. If you plan to use it in a production-ready environment, you could

Set the update process run on a special time window, or delay it before a date
Disable it by setting a wrong proxy

Please see the manual from snapcraft.

Using Self-hosted DNS (Optional)

If your domain name is not hosted by public DNS server, using the self-hosted DNS server instead.

Validate domain name for PrimeHub. regexr.com

# The domain name must match 
[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*

Configure K8S CoreDNS

kubectl edit cm -n kube-system coredns

Please modify the following line and fill your own DNS server

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . <Fill your own DNS server>
        cache 30
        loop
        reload
        loadbalance
    }
...

After changing the config map of coredns, please use the following command to restart coredns and apply the new configuration.

kubectl rollout -n kube-system restart deploy coredns

Reference

https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/

Install PrimeHub

Prepare two terminals, one to execute the primehub install script, the other to monitor the install progress by watching the pods status.

Terminal one

Install by primehub-install create primehub and specify the version. ex. v3.6.2. Please check the latest stable version.

Check available stable versions

./primehub/install/primehub-install

Install the latest stable version by default

./primehub/install/primehub-install create primehub --primehub-ce

Or install the specific version such as v3.7.0 as below

./primehub/install/primehub-install create primehub --primehub-version <version> --primehub-ce

Enter the PRIMEHUB_DOMAIN, KC_PASSWORD, PH_PASSWORD by command prompt.

The install script will start by preflight check, init config, and so on.

[Preflight Check]
[Preflight Check] Pass
[Verify] Mininal k8s resources
...
[Install] PrimeHub
[Check] primehub.yaml
[Generate] primehub.yaml
[Install] PrimeHub   
...
[Progress] wait for bootstrap job ready
...

Terminal two

Open another terminal to run the command to watch the progress.

watch 'kubectl -n hub get pods'

Or once the primehub-bootstrap is running, check the progress of bootstrapping.

kubectl logs -n hub $(kubectl get pod -n hub | grep primehub-bootstrap | cut -d' ' -f1) -f

Once to see most pods with Running STATUS except primehub-bootstrap-xxx pod in Completed STATUS and the READY indicator should be N/N.

Example watch console for the completed installation:

NAME                                                   READY   STATUS      RESTARTS   AGE
hub-758bd48876-wwwww                                   1/1     Running     0          17m
keycloak-0                                             1/1     Running     0          17m
keycloak-postgres-0                                    1/1     Running     0          17m
metacontroller-0                                       1/1     Running     0          17m
primehub-admission-xxxxxxxxxx-yyyyy                    1/1     Running     0          17m
primehub-bootstrap-xxxxx                               0/1     Completed   0          17m
primehub-console-xxxxxxxxxx-yyyyy                      1/1     Running     0          17m
primehub-controller-xxxxxxxxxx-yyyyy                   2/2     Running     0          17m
primehub-graphql-xxxxxxxxx-yyyyy                       1/1     Running     0          17m
primehub-metacontroller-webhook-xxxxxxxxxx-yyyyy       1/1     Running     0          17m
primehub-watcher-xxxxxxxxxx-yyyyy                      1/1     Running     0          17m
proxy-6bdd94cc-yyyyy                                   1/1     Running     0          17m

Then go back to Terminal one and wait until you see messages:

[Completed] Install PrimeHub

PrimeHub:   http://`$PRIMEHUB_DOMAIN` ( phadmin / `$PH_PASSWORD` )
Id Server:  http://`$PRIMEHUB_DOMAIN`/auth/admin/ ( keycloak / `$KC_PASSWORD` )

[Completed]

Enable PrimeHub Store

After the fresh installation, need to enable PrimeHub Store.

Set flag by edit the env

~/primehub/install/primehub-install env edit

Add PRIMEHUB_FEATURE_STORE flag to the last line of .env
```
PRIMEHUB_FEATURE_STORE=true
```

Update the configuration by primehub-install command

~/primehub/install/primehub-install upgrade primehub

New to PrimeHub

Initially, PrimeHub has a built-in user phadmin, a built-in group phusers, several instance types/image which are set Global. phadmin can launch a notebook quickly by using these resources.

Now PrimeHub CE is ready, see Launch Notebook to launch your very first JupyterNotebook on PrimeHub. Also see User Guide to have the fundamental knowledge of PrimeHub.

Troubleshooting

Generate a log file for diagnosis.

./primehub/install/primehub-install diagnose

You may run into troubles during the installation, we list some of them, hopefully, you find resolutions here.

Using valid hostname and domain

Validate the hostname of the node with the following regular expression. regexr.com
```
[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*
```

Symptoms

When hostname is invalid, the installation might suspend at the microk8s status phase, because the cluster is not running:

ubuntu@foo_bar:~$ ./primehub-install create singlenode foo-bar:5000 --k8s-version 1.19
[Search] Folder primehub-v2.6.2
[Not Found] Folder primehub-v2.6.2
[Search] tarball primehub-v2.6.2.tar.gz
[Not Found] tarball primehub-v2.6.2.tar.gz
[Search] primehub helm chart with version: v2.6.2
[Not Found] primehub v2.6.2 in infuseai helm chart
[Skip] Don't need PrimeHub release package when target != primehub
[check] microk8s status

Resolution

You would get microk8s is not running from the status result:

ubuntu@foo_bar:~$ microk8s.status
microk8s is not running. Use microk8s.inspect for a deeper inspection.

If you run the inspect command , it shows everything running:

ubuntu@foo_bar:~$ microk8s.inspect
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-flanneld is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Service snap.microk8s.daemon-proxy is running
  Service snap.microk8s.daemon-kubelet is running
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Service snap.microk8s.daemon-etcd is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy openSSL information to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster

Building the report tarball
  Report tarball is at /var/snap/microk8s/1489/inspection-report-20200713_093426.tar.gz

You could find the root cause in the kubelet's inspection-report logs:

ubuntu@foo_bar:~/inspection-report/snap.microk8s.daemon-kubelet$ cat systemctl.log
● snap.microk8s.daemon-kubelet.service - Service for snap application microk8s.daemon-kubelet
   Loaded: loaded (/etc/systemd/system/snap.microk8s.daemon-kubelet.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-07-13 09:34:10 UTC; 14s ago
 Main PID: 10180 (kubelet)
    Tasks: 12 (limit: 2329)
   CGroup: /system.slice/snap.microk8s.daemon-kubelet.service
           └─10180 /snap/microk8s/1489/kubelet --kubeconfig=/var/snap/microk8s/1489/credentials/kubelet.config --cert-dir=/var/snap/microk8s/1489/certs --client-ca-file=/var/snap/microk8s/1489/certs/ca.crt --anonymous-auth=false --network-plugin=cni --root-dir=/var/snap/microk8s/common/var/lib/kubelet --fail-swap-on=false --cni-conf-dir=/var/snap/microk8s/1489/args/cni-network/ --cni-bin-dir=/snap/microk8s/1489/opt/cni/bin/ --feature-gates=DevicePlugins=true --eviction-hard=memory.available<100Mi,nodefs.available<1Gi,imagefs.available<1Gi --container-runtime=remote --container-runtime-endpoint=/var/snap/microk8s/common/run/containerd.sock --node-labels=microk8s.io/cluster=true

Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.139863   10180 kubelet.go:2263] node "foo_bar" not found
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.240093   10180 kubelet.go:2263] node "foo_bar" not found
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.340308   10180 kubelet.go:2263] node "foo_bar" not found
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.440529   10180 kubelet.go:2263] node "foo_bar" not found
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.540736   10180 kubelet.go:2263] node "foo_bar" not found
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.640915   10180 kubelet.go:2263] node "foo_bar" not found
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: I0713 09:34:24.713484   10180 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: I0713 09:34:24.714871   10180 kubelet_node_status.go:70] Attempting to register node foo_bar
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.721575   10180 kubelet_node_status.go:92] Unable to register node "foo_bar" with API server: Node "foo_bar" is invalid: metadata.name: Invalid value: "foo_bar": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.741101   10180 kubelet.go:2263] node "foo_bar" not found

Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.721575 10180 kubelet_node_status.go:92] Unable to register node "foo_bar" with API server: Node "foo_bar" is invalid: metadata.name: Invalid value: "foo_bar": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is 'a-z0-9?(.a-z0-9?)*')

Please fix the hostname, domain name and reinstall microk8s. You could destroy it with this command:

$ ./primehub-install destroy singlenode

Ensure CNI IP Range not overlaid with your Egress network

MicroK8s uses iptables as kube-proxy implementation, the overlay IP ranges might cause unexpected behavior with networking. For example, a client in a pod might not access same IP range external endpoint (banned by iptables).

ubuntu@foo-bar:~$ ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc fq_codel state UP group default qlen 1000
    link/ether 06:62:0d:d6:82:d4 brd ff:ff:ff:ff:ff:ff
    inet 172.31.39.226/20 brd 172.31.47.255 scope global dynamic eth0
       valid_lft 2651sec preferred_lft 2651sec
    inet6 fe80::462:dff:fed6:82d4/64 scope link
       valid_lft forever preferred_lft forever
3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UNKNOWN group default
    link/ether ba:0f:7c:6d:8d:56 brd ff:ff:ff:ff:ff:ff
    inet 10.1.46.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::b80f:7cff:fe6d:8d56/64 scope link
       valid_lft forever preferred_lft forever
4: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UP group default qlen 1000
    link/ether ee:98:e2:60:4e:18 brd ff:ff:ff:ff:ff:ff
    inet 10.1.46.1/24 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::ec98:e2ff:fe60:4e18/64 scope link
       valid_lft forever preferred_lft forever

Microk8s uses flannel as the default CNI, it would create two interface flannel.1 and cni0. In the example eth0 IP range(172.31.0.0) is not overlaid with CNI's IP range (10.1.0.0).

Resolution

If the IP Range is overlaid with each other, please fix it by update CNI's IP range configuration /var/snap/microk8s/current/args/flannel-network-mgr-config:

{"Network": "10.1.0.0/16", "Backend": {"Type": "vxlan"}}

We might change it to 10.3.0.0/16 and restart microk8s

{"Network": "10.3.0.0/16", "Backend": {"Type": "vxlan"}}

We have to delete cni0 to make it re-create in the new configuration (flannel.1 would be updated automatically):

sudo ip link delete cni0

You might find all pods in the new IP ranges:

ubuntu@foo-bar:~$ k get pod -A -o wide
NAMESPACE        NAME                                             READY   STATUS             RESTARTS   AGE   IP              NODE      NOMINATED NODE   READINESS GATES
default          foobar-6bfcbb6974-c2gt7                          1/1     Running            1          29m   10.3.49.116     foo-bar   <none>           <none>
ingress-nginx    nginx-ingress-controller-676d5ccd4c-gmm5f        1/1     Running            3          32m   172.31.39.226   foo-bar   <none>           <none>
ingress-nginx    nginx-ingress-default-backend-5b967cf596-crhdp   1/1     Running            2          32m   10.3.49.117     foo-bar   <none>           <none>
kube-system      coredns-9b8997588-wm2n7                          1/1     Running            4          34m   10.3.49.113     foo-bar   <none>           <none>
kube-system      hostpath-provisioner-7b9cb5cdb4-d9hlm            1/1     Running            3          34m   10.3.49.115     foo-bar   <none>           <none>
kube-system      tiller-deploy-969865475-h6gtb                    1/1     Running            2          32m   10.3.49.114     foo-bar   <none>           <none>
metacontroller   metacontroller-0                                 1/1     Running            2          31m   10.3.49.112     foo-bar   <none>           <none>

DNS configuration might reset to the default values when enable/disable microk8s addons

If you update the coredns ConfigMap, please keep a backup to restore it after every microk8s addons enabling or disabling.

Symptom

Applications are not able to resolve some domain names, they are registered in your internal DNS.

ubuntu@foo-bar:~$ kubectl -n kube-system get cm coredns -o yaml

Here is the default configuration, we could customize the core-dns by editing it:

apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . 8.8.8.8 8.8.4.4
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"Corefile":".:53 {\n    errors\n    health\n    ready\n    kubernetes cluster.local in-addr.arpa ip6.arpa {\n      pods insecure\n      fallthrough in-addr.arpa ip6.arpa\n    }\n    prometheus :9153\n    forward . 8.8.8.8 8.8.4.4\n    cache 30\n    loop\n    reload\n    loadbalance\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"EnsureExists","k8s-app":"kube-dns"},"name":"coredns","namespace":"kube-system"}}
  creationTimestamp: "2020-07-13T09:44:53Z"
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
    k8s-app: kube-dns
  name: coredns
  namespace: kube-system
  resourceVersion: "7079"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
  uid: 7b0c948b-083a-49a0-a18e-49b173d78c5a

Try editing forward like this:

forward . 8.8.8.8

Enable some addons either:

microk8s.enable istio
microk8s.enable gpu

The coredns ConfigMap will be reset to the default values.

Resolution

There is no a good way to tackle it.

Please remember to backup your settings and apply it after every addons enabling or disable.