CRDs · PrimeHub

In PrimeHub data model, it mentions when a/an instance type, image, and dataset is created via Admin UI, under the hood, there are a CRD object created in Kubernetes and a Realm Role created in Keycloak. This document describes what CRDs are created for and the context of them.

CRD, CustomResourceDefinition, PrimeHub uses the custom resource mechanism to manage structured data (custom objects) stored in Kubernetes. There are three of them, Instance Type, Image and Dataset.

For more detail of CRD, please refer to Extend the Kubernetes API with CustomResourceDefinitions.

Instance Type

Basic

An Instance Type object contains these following settings/configurations.

You can use the following commands to view the stored data.

# List instance type objects.
kubectl -n hub get instancetype

# View the stored data of an instance type object.
kubectl -n hub get instancetype/object_name -o yaml --export

The structured data (including Node Selector and Tolerations if any) of an instance type object displays in yaml format as below.

apiVersion: primehub.io/v1alpha1
kind: InstanceType
metadata:
  generation: 1
  name: p100
  selfLink: /apis/primehub.io/v1alpha1/namespaces/hub/instancetypes/p100
spec:
  description: p100
  displayName: p100
  limits.cpu: 4
  limits.memory: 26G
  limits.nvidia.com/gpu: 1
  nodeSelector:
    memory: low
  requests.cpu: 4
  requests.memory: 26G
  tolerations:
  - effect: NoSchedule
    key: dummy
    operator: Equal
    value: cpu1core

Spec:

displayName Display Name
description Description
limits.cpu CPU Limit
limits.memory Memory Limit
limits.nvidia.com/gpu GPU Limit
requests.cpu CPU Request
requests.memory Memory Request

Toleration

When a node is marked with a Taint, it cannot accept any pods which don't tolerate the taints. Toleration are applied to pods so that pods are allowed to schedule onto nodes with matching taints. Please refers to Taints and Toleration for more detail.

On Admin UI, toleration which specifies a tolerable taint(key-value pair) with an effect to take.

We add toleration to tolerate a specific taint with an effect to take, the data is stored as below.

tolerations:
  - effect: NoSchedule
    key: dummy
    operator: Equal
    value: cpu1core
    - ...

Toleration settings:

effect: NoSchedule, NoExecute, PreferNoSchedule and None.
key: The key of a taint.
operator: tolerations[].operator, Exists and Equal.
value: The value of a taint is required when Operator is Equal.

NodeSelector

Pods can be constrained to only be able to/prefer to run on particular nodes that are labeled matching key-value pairs. We add a nodeSelector with memory/low via Admin UI, the data is stored as below.

nodeSelector:
    memory: low
        cpucore: one

Node Selector settings:

key: The key of a label.
value: The value of a label.

Image

Basic

An Image object contains these following settings/configurations.

You can use the following commands to view the stored data.

# List image objects.
kubectl -n hub get image

# View the stored data of an image object.
kubectl -n hub get image/object_name -o yaml --export

The structured data (including Pull Secret if any) of an image object displays in yaml format as below.

apiVersion: primehub.io/v1alpha1
kind: Image
metadata:
  generation: 1
  name: name-of-image
  selfLink: /apis/primehub.io/v1alpha1/namespaces/hub/images/name-of-image
spec:
  description: ""
  displayName: name-of-image
  pullSecret: pull-secret-xxx
  url: registry.gitlab.com/infuseai/docker-stacks/scipy-notebook:073d6073

Spec:

displayName: The display name of a image on UI.
description: Description.
url: The registry url where an image is located.
pullSecret: The name of a Secret, this is a Secret we add via Admin UI. If required, the secret is used to pull the image.

Dataset

Basic

A Dataset object contains these following setting/configuration.

You can use the following commands to view the stored data.

# List image objects.
kubectl -n hub get dataset

# View the stored data of an image object.
kubectl -n hub get dataset/object_name -o yaml --export

The structured data (including launchGroupOnly if true) of a type pv dataset object displays in yaml format as below.

apiVersion: primehub.io/v1alpha1
kind: Dataset
metadata:
  annotations:
        dataset.primehub.io/homeSymlink: "false"
    dataset.primehub.io/launchGroupOnly: "false"
    dataset.primehub.io/mountRoot: /datasets
  generation: 1
  name: data-rw-test
  selfLink: /apis/primehub.io/v1alpha1/namespaces/hub/datasets/data-rw-test
spec:
  description: data-rw-test
  displayName: data-rw-test
  type: pv
  url: ""
  variables: {}
  volumeName: data-rw-test

Currently, there are types pv, git, nfs, hostPath and env of datasets. All of types has following data fields in common, also, each type has its own data fields. In following sections, they are described respectively.

Annotations:

dataset.primehub.io/mountRoot: A path of mount root.
dataset.primehub.io/launchGroupOnly: It can only be selected in a launch Group if true.
dataset.primehub.io/homeSymlink(hidden from UI): A flag of making a symlink in home directory of users if true.

Spec:

displayName: The display name on UI.
description: The description.
type: pv, git, nfs, hostPath and env.

Type pv

PV, persistent volume, dataset has a data field,volumeName. The container mount point of the dataset is varied with the combination of volumeName, and both annotations of mountRoot and homeSymlink. PV is auto provisioning by default. There is an option so that the administrators can set the underlying settings manually.

Pv with auto provisioning

annotations:
  dataset.primehub.io/mountRoot: /datasets

spec:
  volumeName: test

Container Mount Point /datasets/test

Pv with manual provisioning

annotations:
  dataset.primehub.io/mountRoot: /datasets

spec:
  volumeName: test
  pv:
    provisioning: manual

Container Mount Point /datasets/test

Pv with homeSymlink

annotations:
  dataset.primehub.io/mountRoot: /datasets
  dataset.primehub.io/homeSymlink: "true"

spec:
  volumeName: test

Container Mount Point /datasets/test

Symlinks ln -s /dataset/test ~/test

Pv with mountRoot

annotations:
  dataset.primehub.io/mountRoot: /foo/bar

spec:
  volumeName: test

Container Mount Point /foo/bar/test

Pv with mountRoot and homeSymlink specified

annotations:
  dataset.primehub.io/mountRoot: /foo/bar
  dataset.primehub.io/homeSymlink: "true"

spec:
  volumeName: test

Container Mount Point /foo/bar/test

Symlinks ln -s /foo/bar/test ~/test

Type git

Git dataset has a data field, Url, which points to a git repo and a data field, Secret, which is added via Admin UI if a secret is required to pull the dataset from repo.

The container mount point of the dataset is varied with the combination of both annotations of mountRoot and homeSymlink.

annotations:
  dataset.primehub.io/primehub-gitsync: "true"
  dataset.primehub.io/gitSyncHostRoot: /home/dataset
  dataset.primehub.io/gitSyncRoot: /gitsync

spec:
  type: git
  url: repo: xxx/myrepo

Annotations:

dataset.primehub.io/primehub-gitsync: true by default.
dataset.primehub.io/gitSyncHostRoot: (Hidden from UI) The host path to put the gitsync result. /home/dataset by default.
dataset.primehub.io/gitSyncRoot: (Hidden from UI) The path to mount the gitsync dataset. /gitsync by default.

Spec:

url: The url of a repo.

Gitsync dataset with secret

spec:
    gitsync:
      secret:  image-pull
    type:      git
    url:       repo: xxx/myrepo

Spec:

gitsync.secret: A secret is used for pulling a dataset from repo.

Container Mount Point /gitsync/myrepo.

Symlinks ln -s /gitsync/myrepo/myrepo /dataset/myrepo.

Gitsync dataset with homeSymlink

Annotations:  dataset.primehub.io/homeSymlink: true

Spec:
    Type:      git
    URL:       repo: xxx/myrepo

Container Mount Point /gitsync/myrepo.

Symlinks

ln -s /gitsync/myrepo/myrepo /dataset/myrepo.
ln -s /dataset/myrepo ~/myrepo.

Gitsync dataset with mountRoot

annotations:  
    dataset.primehub.io/mountRoot: /foo/bar

spec:
    type:      git
    url:       repo: xxx/myrepo

Container Mount Point /gitsync/myrepo.

Symlinks ln -s /gitsync/myrepo/myrepo /foo/bar/myrepo.

Gitsync dataset with homeSymlink and mountRoot

annotations: 
    dataset.primehub.io/homeSymlink: true
    dataset.primehub.io/mountRoot: /foo/bar

spec:
    type:      git
    url:       repo: xxx/myrepo

Container Mount Point /gitsync/myrepo.

Symlinks

ln -s /gitsync/myrepo/myrepo /foo/bar/myrepo.
ln -s /foo/bar/myrepo ~/myrepo.

Type nfs

Nfs dataset has additional data fields server and path which set the nfs ip/domain and the nfs path. The mount point logic is the same as pv dataset.

Nfs dataset example

annotations:
  dataset.primehub.io/mountRoot: /datasets
  dataset.primehub.io/homeSymlink: "true"

spec:
  volumeName: test
  nfs:
    server: 192.168.0.10
    path: /

Type hostPath

HostPath dataset has an additional data field path which set the path in host. The mount point logic is the same as pv dataset.

HostPath dataset example

annotations:
  dataset.primehub.io/mountRoot: /datasets
  dataset.primehub.io/homeSymlink: "true"

spec:
  volumeName: test
  hostPath:
    path: /tmp

Basic env

Taking environment variables as datasets.

spec:
  variables:
    MYSQLDB: sql_mine

Spec:

variables: variables in key/value pair.