Install PrimeHub Community on MicroK8S Single Node (Ubuntu)
This document will guide you to install MicroK8s on a single node and PrimeHub Community with a easy script.
Provision a cluster
MicroK8s supports multi-platform, we demonstrate it in the following spec:
- Ubuntu 18.04 LTS
- Kubernetes 1.19 version
- IP address:
EXTERNAL-IP
- Networking: allow port 80 for HTTP
Requirement
Git
Please follow the os-specific command to install git command
cURL
cURL is a command-line tool that allows us to do HTTP requests from shell. To install cURL, please follow the os-specific method. For example.
Ubuntu
sudo apt update
sudo apt install curl
RHEL/CentOS
yum install curl
Clone PrimeHub Repository
git clone https://github.com/InfuseAI/primehub.git
Install PrimeHub required binaries
./primehub/install/primehub-install required-bin
This will install the required commands onto ~/bin
. You should append the ~/bin
to your PATH
variables, or use the following command to append and read from the .bashrc
echo "export PATH=$HOME/bin:$PATH" >> ~/.bashrc
source ~/.bashrc
Install MicroK8s Single Node
We provide a install script which makes the installation much easier to create a MicroK8s-single-node Kubernetes.
Run the create singlenode
command:
./primehub/install/primehub-install create singlenode --k8s-version 1.19
After the first execution, you will see the message. Because it adds the user to microk8s
group and needs to relogin:
[Require Action] Please relogin this session and run create singlenode again
After relogin, run the same command again to finish the single-node provision:
./primehub/install/primehub-install create singlenode --k8s-version 1.19
During the installation, you might run into troubles or need to modify the default settings, please check the TroubleShooting section.
Quick Verification
Access nginx-ingress with your EXTERNAL-IP
:
curl http://${EXTERNAL-IP}
The output will be 404
because no Ingress
resources are defined yet:
default backend - 404
Configurations
Configure GPU (optional)
Download and install Nvidia GPU drivers from official website
Enable GPU feature
Please be aware that if MicroK8s v1.21, you need to modify the
default_runtime_name
and DO NOT enable GPU feature by microk8s enable gpu
if MicroK8s is prior to v1.20 (<= 1.20), enable gpu feature by default
microk8s.enable gpu
if MicroK8s is 1.21, please follow the steps
Modify the file
/var/snap/microk8s/current/args/containerd-template.toml
and manually change thedefault_runtime_name
to tonvidia-container-runtime
# default_runtime_name is the default runtime name to use. - default_runtime_name = "${RUNTIME}" + default_runtime_name = "nvidia-container-runtime"
Restart the microk8s
microk8s stop microk8s start
Install Nvidia Device Plugin by helm
helm repo add nvdp https://nvidia.github.io/k8s-device-plugin helm repo update helm install -n kube-system nvidia-device-plugin nvdp/nvidia-device-plugin
Verify GPU
kubectl describe node | grep 'nvidia.com/gpu'
Verify within MicroK8s cluster
deviceplugin_pod=$(kubectl -n kube-system get pod | grep nvidia-device-plugin | awk '{print $1}') kubectl -n kube-system exec -t ${deviceplugin_pod} nvidia-smi
Ref: https://github.com/NVIDIA/gpu-operator/issues/163#issuecomment-794445253
Configure snap (optional)
snap will update packages automatically. If you plan to use it in a production-ready environment, you could
- Set the update process run on a special time window, or delay it before a date
- Disable it by setting a wrong proxy
Please see the manual from snapcraft.
Using Self-hosted DNS (Optional)
If your domain name is not hosted by public DNS server, using the self-hosted DNS server instead.
Validate domain name for PrimeHub. regexr.com
# The domain name must match
[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*
Configure K8S CoreDNS
kubectl edit cm -n kube-system coredns
Please modify the following line and fill your own DNS server
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . <Fill your own DNS server>
cache 30
loop
reload
loadbalance
}
...
After changing the config map of coredns, please use the following command to restart coredns and apply the new configuration.
kubectl rollout -n kube-system restart deploy coredns
Reference
https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/
Install PrimeHub
Prepare two terminals, one to execute the primehub install script, the other to monitor the install progress by watching the pods status.
Terminal one
Install by primehub-install create primehub
and specify the version. ex. v3.6.2
. Please check the latest stable version.
Check available stable versions
./primehub/install/primehub-install
Install the latest stable version by default
./primehub/install/primehub-install create primehub --primehub-ce
Or install the specific version such as v3.7.0 as below
./primehub/install/primehub-install create primehub --primehub-version <version> --primehub-ce
Enter the PRIMEHUB_DOMAIN
, KC_PASSWORD
, PH_PASSWORD
by command prompt.
The install script will start by preflight check, init config, and so on.
[Preflight Check]
[Preflight Check] Pass
[Verify] Mininal k8s resources
...
[Install] PrimeHub
[Check] primehub.yaml
[Generate] primehub.yaml
[Install] PrimeHub
...
[Progress] wait for bootstrap job ready
...
Terminal two
Open another terminal to run the command to watch the progress.
watch 'kubectl -n hub get pods'
Or once the primehub-bootstrap
is running, check the progress of bootstrapping.
kubectl logs -n hub $(kubectl get pod -n hub | grep primehub-bootstrap | cut -d' ' -f1) -f
Once to see most pods with Running STATUS except primehub-bootstrap-xxx pod in Completed STATUS and the READY indicator should be N/N.
Example watch console for the completed installation:
NAME READY STATUS RESTARTS AGE
hub-758bd48876-wwwww 1/1 Running 0 17m
keycloak-0 1/1 Running 0 17m
keycloak-postgres-0 1/1 Running 0 17m
metacontroller-0 1/1 Running 0 17m
primehub-admission-xxxxxxxxxx-yyyyy 1/1 Running 0 17m
primehub-bootstrap-xxxxx 0/1 Completed 0 17m
primehub-console-xxxxxxxxxx-yyyyy 1/1 Running 0 17m
primehub-controller-xxxxxxxxxx-yyyyy 2/2 Running 0 17m
primehub-graphql-xxxxxxxxx-yyyyy 1/1 Running 0 17m
primehub-metacontroller-webhook-xxxxxxxxxx-yyyyy 1/1 Running 0 17m
primehub-watcher-xxxxxxxxxx-yyyyy 1/1 Running 0 17m
proxy-6bdd94cc-yyyyy 1/1 Running 0 17m
Then go back to Terminal one and wait until you see messages:
[Completed] Install PrimeHub
PrimeHub: http://`$PRIMEHUB_DOMAIN` ( phadmin / `$PH_PASSWORD` )
Id Server: http://`$PRIMEHUB_DOMAIN`/auth/admin/ ( keycloak / `$KC_PASSWORD` )
[Completed]
Enable PrimeHub Store
After the fresh installation, need to enable PrimeHub Store.
Set flag by edit the env
~/primehub/install/primehub-install env edit
Add PRIMEHUB_FEATURE_STORE flag to the last line of
.env
PRIMEHUB_FEATURE_STORE=true
Update the configuration by primehub-install command
~/primehub/install/primehub-install upgrade primehub
New to PrimeHub
Initially, PrimeHub has a built-in user phadmin
, a built-in group phusers
, several instance types/image which are set Global. phadmin
can launch a notebook quickly by using these resources.
Now PrimeHub CE is ready, see Launch Notebook to launch your very first JupyterNotebook on PrimeHub. Also see User Guide to have the fundamental knowledge of PrimeHub.
Troubleshooting
Generate a log file for diagnosis.
./primehub/install/primehub-install diagnose
You may run into troubles during the installation, we list some of them, hopefully, you find resolutions here.
Using valid hostname and domain
Validate the hostname of the node with the following regular expression. regexr.com
[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*
Symptoms
When hostname is invalid, the installation might suspend at the microk8s status
phase, because the cluster is not running:
ubuntu@foo_bar:~$ ./primehub-install create singlenode foo-bar:5000 --k8s-version 1.19
[Search] Folder primehub-v2.6.2
[Not Found] Folder primehub-v2.6.2
[Search] tarball primehub-v2.6.2.tar.gz
[Not Found] tarball primehub-v2.6.2.tar.gz
[Search] primehub helm chart with version: v2.6.2
[Not Found] primehub v2.6.2 in infuseai helm chart
[Skip] Don't need PrimeHub release package when target != primehub
[check] microk8s status
Resolution
You would get microk8s is not running
from the status result:
ubuntu@foo_bar:~$ microk8s.status
microk8s is not running. Use microk8s.inspect for a deeper inspection.
If you run the inspect command , it shows everything running:
ubuntu@foo_bar:~$ microk8s.inspect
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-flanneld is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-apiserver is running
Service snap.microk8s.daemon-apiserver-kicker is running
Service snap.microk8s.daemon-proxy is running
Service snap.microk8s.daemon-kubelet is running
Service snap.microk8s.daemon-scheduler is running
Service snap.microk8s.daemon-controller-manager is running
Service snap.microk8s.daemon-etcd is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy current linux distribution to the final report tarball
Copy openSSL information to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
Building the report tarball
Report tarball is at /var/snap/microk8s/1489/inspection-report-20200713_093426.tar.gz
You could find the root cause in the kubelet's inspection-report logs:
ubuntu@foo_bar:~/inspection-report/snap.microk8s.daemon-kubelet$ cat systemctl.log
● snap.microk8s.daemon-kubelet.service - Service for snap application microk8s.daemon-kubelet
Loaded: loaded (/etc/systemd/system/snap.microk8s.daemon-kubelet.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2020-07-13 09:34:10 UTC; 14s ago
Main PID: 10180 (kubelet)
Tasks: 12 (limit: 2329)
CGroup: /system.slice/snap.microk8s.daemon-kubelet.service
└─10180 /snap/microk8s/1489/kubelet --kubeconfig=/var/snap/microk8s/1489/credentials/kubelet.config --cert-dir=/var/snap/microk8s/1489/certs --client-ca-file=/var/snap/microk8s/1489/certs/ca.crt --anonymous-auth=false --network-plugin=cni --root-dir=/var/snap/microk8s/common/var/lib/kubelet --fail-swap-on=false --cni-conf-dir=/var/snap/microk8s/1489/args/cni-network/ --cni-bin-dir=/snap/microk8s/1489/opt/cni/bin/ --feature-gates=DevicePlugins=true --eviction-hard=memory.available<100Mi,nodefs.available<1Gi,imagefs.available<1Gi --container-runtime=remote --container-runtime-endpoint=/var/snap/microk8s/common/run/containerd.sock --node-labels=microk8s.io/cluster=true
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.139863 10180 kubelet.go:2263] node "foo_bar" not found
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.240093 10180 kubelet.go:2263] node "foo_bar" not found
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.340308 10180 kubelet.go:2263] node "foo_bar" not found
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.440529 10180 kubelet.go:2263] node "foo_bar" not found
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.540736 10180 kubelet.go:2263] node "foo_bar" not found
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.640915 10180 kubelet.go:2263] node "foo_bar" not found
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: I0713 09:34:24.713484 10180 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: I0713 09:34:24.714871 10180 kubelet_node_status.go:70] Attempting to register node foo_bar
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.721575 10180 kubelet_node_status.go:92] Unable to register node "foo_bar" with API server: Node "foo_bar" is invalid: metadata.name: Invalid value: "foo_bar": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.741101 10180 kubelet.go:2263] node "foo_bar" not found
Jul 13 09:34:24 foo_bar microk8s.daemon-kubelet[10180]: E0713 09:34:24.721575 10180 kubelet_node_status.go:92] Unable to register node "foo_bar" with API server: Node "foo_bar" is invalid: metadata.name: Invalid value: "foo_bar": a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is 'a-z0-9?(.a-z0-9?)*')
Please fix the hostname, domain name and reinstall microk8s. You could destroy it with this command:
$ ./primehub-install destroy singlenode
Ensure CNI IP Range not overlaid with your Egress network
MicroK8s uses iptables
as kube-proxy
implementation, the overlay IP ranges might cause unexpected behavior with networking. For example, a client in a pod might not access same IP range external endpoint (banned by iptables).
ubuntu@foo-bar:~$ ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc fq_codel state UP group default qlen 1000
link/ether 06:62:0d:d6:82:d4 brd ff:ff:ff:ff:ff:ff
inet 172.31.39.226/20 brd 172.31.47.255 scope global dynamic eth0
valid_lft 2651sec preferred_lft 2651sec
inet6 fe80::462:dff:fed6:82d4/64 scope link
valid_lft forever preferred_lft forever
3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UNKNOWN group default
link/ether ba:0f:7c:6d:8d:56 brd ff:ff:ff:ff:ff:ff
inet 10.1.46.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::b80f:7cff:fe6d:8d56/64 scope link
valid_lft forever preferred_lft forever
4: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UP group default qlen 1000
link/ether ee:98:e2:60:4e:18 brd ff:ff:ff:ff:ff:ff
inet 10.1.46.1/24 scope global cni0
valid_lft forever preferred_lft forever
inet6 fe80::ec98:e2ff:fe60:4e18/64 scope link
valid_lft forever preferred_lft forever
Microk8s uses flannel
as the default CNI, it would create two interface flannel.1
and cni0
. In the example eth0
IP range(172.31.0.0
) is not overlaid with CNI's IP range (10.1.0.0
).
Resolution
If the IP Range is overlaid with each other, please fix it by update CNI's IP range configuration /var/snap/microk8s/current/args/flannel-network-mgr-config
:
{"Network": "10.1.0.0/16", "Backend": {"Type": "vxlan"}}
We might change it to 10.3.0.0/16
and restart microk8s
{"Network": "10.3.0.0/16", "Backend": {"Type": "vxlan"}}
We have to delete cni0
to make it re-create in the new configuration (flannel.1 would be updated automatically):
sudo ip link delete cni0
You might find all pods in the new IP ranges:
ubuntu@foo-bar:~$ k get pod -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default foobar-6bfcbb6974-c2gt7 1/1 Running 1 29m 10.3.49.116 foo-bar <none> <none>
ingress-nginx nginx-ingress-controller-676d5ccd4c-gmm5f 1/1 Running 3 32m 172.31.39.226 foo-bar <none> <none>
ingress-nginx nginx-ingress-default-backend-5b967cf596-crhdp 1/1 Running 2 32m 10.3.49.117 foo-bar <none> <none>
kube-system coredns-9b8997588-wm2n7 1/1 Running 4 34m 10.3.49.113 foo-bar <none> <none>
kube-system hostpath-provisioner-7b9cb5cdb4-d9hlm 1/1 Running 3 34m 10.3.49.115 foo-bar <none> <none>
kube-system tiller-deploy-969865475-h6gtb 1/1 Running 2 32m 10.3.49.114 foo-bar <none> <none>
metacontroller metacontroller-0 1/1 Running 2 31m 10.3.49.112 foo-bar <none> <none>
DNS configuration might reset to the default values when enable/disable microk8s addons
If you update the coredns
ConfigMap, please keep a backup to restore it after every microk8s addons enabling or disabling.
Symptom
Applications are not able to resolve some domain names, they are registered in your internal DNS.
ubuntu@foo-bar:~$ kubectl -n kube-system get cm coredns -o yaml
Here is the default configuration, we could customize the core-dns by editing it:
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . 8.8.8.8 8.8.4.4
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"Corefile":".:53 {\n errors\n health\n ready\n kubernetes cluster.local in-addr.arpa ip6.arpa {\n pods insecure\n fallthrough in-addr.arpa ip6.arpa\n }\n prometheus :9153\n forward . 8.8.8.8 8.8.4.4\n cache 30\n loop\n reload\n loadbalance\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"EnsureExists","k8s-app":"kube-dns"},"name":"coredns","namespace":"kube-system"}}
creationTimestamp: "2020-07-13T09:44:53Z"
labels:
addonmanager.kubernetes.io/mode: EnsureExists
k8s-app: kube-dns
name: coredns
namespace: kube-system
resourceVersion: "7079"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: 7b0c948b-083a-49a0-a18e-49b173d78c5a
Try editing forward
like this:
forward . 8.8.8.8
Enable some addons either:
microk8s.enable istio
microk8s.enable gpu
The coredns ConfigMap will be reset to the default values.
Resolution
There is no a good way to tackle it.
Please remember to backup your settings and apply it after every addons enabling or disable.