CRDs
In PrimeHub data model, it mentions when a/an instance type, image, and dataset is created via Admin UI, under the hood, there are a CRD object created in Kubernetes and a Realm Role created in Keycloak. This document describes what CRDs are created for and the context of them.
CRD, CustomResourceDefinition, PrimeHub uses the custom resource mechanism to manage structured data (custom objects) stored in Kubernetes. There are three of them, Instance Type, Image and Dataset.
For more detail of CRD, please refer to Extend the Kubernetes API with CustomResourceDefinitions.
Instance Type
Basic
An Instance Type object contains these following settings/configurations.
You can use the following commands to view the stored data.
# List instance type objects.
kubectl -n hub get instancetype
# View the stored data of an instance type object.
kubectl -n hub get instancetype/object_name -o yaml --export
The structured data (including Node Selector and Tolerations if any) of an instance type object displays in yaml format as below.
apiVersion: primehub.io/v1alpha1
kind: InstanceType
metadata:
  generation: 1
  name: p100
  selfLink: /apis/primehub.io/v1alpha1/namespaces/hub/instancetypes/p100
spec:
  description: p100
  displayName: p100
  limits.cpu: 4
  limits.memory: 26G
  limits.nvidia.com/gpu: 1
  nodeSelector:
    memory: low
  requests.cpu: 4
  requests.memory: 26G
  tolerations:
  - effect: NoSchedule
    key: dummy
    operator: Equal
    value: cpu1core
Spec:
- displayNameDisplay Name
- descriptionDescription
- limits.cpuCPU Limit
- limits.memoryMemory Limit
- limits.nvidia.com/gpuGPU Limt
- requests.cpuCPU Request
- requests.memoryMemory Request
Toleration
When a node is marked with a Taint, it cannot accept any pods which don't tolerate the taints. Tolerations are applied to pods so that pods are allowed to schedule onto nodes with matching taints. Please refers to Taints and Tolerations for more detail.
On Admin UI, toleration which specifies a tolerable taint(key-value pair) with an effect to take.
We add toleration to tolerate a specific taint with an effect to take, the data is stored as below.
tolerations:
  - effect: NoSchedule
    key: dummy
    operator: Equal
    value: cpu1core
    - ...
Tolerations settings:
- effect:- NoSchedule,- NoExecute,- PreferNoScheduleand- None.
- key: The key of a taint.
- operator:- tolerations[].operator,- Existsand- Equal.
- value: The value of a taint is required when- Operatoris- Equal.
NodeSelector
Pods can be constrained to only be able to/prefer to run on particular nodes that are labeled matching key-value pairs. We add a nodeSelector with memory/low via Admin UI, the data is stored as below.
nodeSelector:
    memory: low
        cpucore: one
Node Selector settings:
- key: The key of a label.
- value: The value of a label.
Image
Basic
An Image object contains these following settings/configurations.
You can use the following commands to view the stored data.
# List image objects.
kubectl -n hub get image
# View the stored data of an image object.
kubectl -n hub get image/object_name -o yaml --export
The structured data (including Pull Secret if any) of an image object displays in yaml format as below.
apiVersion: primehub.io/v1alpha1
kind: Image
metadata:
  generation: 1
  name: name-of-image
  selfLink: /apis/primehub.io/v1alpha1/namespaces/hub/images/name-of-image
spec:
  description: ""
  displayName: name-of-image
  pullSecret: pull-secret-xxx
  url: registry.gitlab.com/infuseai/docker-stacks/scipy-notebook:073d6073
Spec:
- displayName: The display name of a image on UI.
- description: Description.
- url: The registry url where an image is located.
- pullSecret: The name of a- Secret, this is a Secret we add via Admin UI. If required, the secret is used to pull the image.
Dataset
Basic
A Dataset object contains these following setting/configuration.
You can use the following commands to view the stored data.
# List image objects.
kubectl -n hub get dataset
# View the stored data of an image object.
kubectl -n hub get dataset/object_name -o yaml --export
The structured data (including launchGroupOnly if true) of a type pv dataset object displays in yaml format as below.
apiVersion: primehub.io/v1alpha1
kind: Dataset
metadata:
  annotations:
        dataset.primehub.io/homeSymlink: "false"
    dataset.primehub.io/launchGroupOnly: "false"
    dataset.primehub.io/mountRoot: /datasets
  generation: 1
  name: data-rw-test
  selfLink: /apis/primehub.io/v1alpha1/namespaces/hub/datasets/data-rw-test
spec:
  description: data-rw-test
  displayName: data-rw-test
  type: pv
  url: ""
  variables: {}
  volumeName: data-rw-test
Currently, there are types pv, git and env of datasets. All of types has following data fields in common, also, each type has its own data fields.  In following sections, they are described respectively.
Annotations:
- dataset.primehub.io/mountRoot: A path of mount root.
- dataset.primehub.io/launchGroupOnly: It can only be selected in a launch Group if- true.
- dataset.primehub.io/homeSymlink(hidden from UI): A flag of making a symlink in home directory of users if- true.
Spec:
- displayName: The display name on UI.
- description: The description.
- type:- pv,- gitand- env.
Type pv
PV, persistent volume, dataset has a data field,volumeName. The container mount point of the dataset is varied with the combination of volumeName, and both annotations of mountRoot and homeSymlink.
Pv without hostpath
annotations:
    dataset.primehub.io/mountRoot: /datasets
spec:
        volumeName: test
Container Mount Point /datasets/test
Pv with hostpath
annotations:
    dataset.primehub.io/mountRoot: /datasets
spec:
        volumeName: hostpath:/home/datasets/test
Container Mount Point /datasets/test
Pv with homeSymlink
annotations:
    dataset.primehub.io/mountRoot: /datasets
    dataset.primehub.io/homeSymlink: "true"
spec:
        volumeName: hostpath:/home/datasets/test
Container Mount Point /datasets/test
Symlinks ln -s /dataset/test ~/test
Pv with mountRoot
annotations:
    dataset.primehub.io/mountRoot: /foo/bar
spec:
        volumeName: hostpath:/home/datasets/test
Container Mount Point /foo/bar/test
Pv with mountRoot and homeSymlink specified
annotations:
    dataset.primehub.io/mountRoot: /foo/bar
    dataset.primehub.io/homeSymlink: "true"
spec:
         volumeName: hostpath:/home/datasets/test
Container Mount Point /foo/bar/test
Symlinks ln -s /foo/bar/test ~/test
Type git
Git dataset has a data field, Url, which points to a git repo and a data field, Secret, which is added via Admin UI if a secret is required to pull the dataset from repo.
The container mount point of the dataset is varied with the combination of both annotations of mountRoot and homeSymlink.
annotations:
  dataset.primehub.io/primehub-gitsync: "true"
  dataset.primehub.io/gitSyncHostRoot: /home/dataset
  dataset.primehub.io/gitSyncRoot: /gitsync
spec:
  type: git
  url: repo: xxx/myrepo
Annotations:
- dataset.primehub.io/primehub-gitsync:- trueby default.
- dataset.primehub.io/gitSyncHostRoot: (Hidden from UI) The host path to put the gitsync result.- /home/datasetby default.
- dataset.primehub.io/gitSyncRoot: (Hidden from UI) The path to mount the gitsync dataset.- /gitsyncby default.
Spec:
- url: The url of a repo.
Gitsync dataset with secret
spec:
    gitsync:
      secret:  image-pull
    type:      git
    url:       repo: xxx/myrepo
Spec:
- gitsync.secret: A secret is used for pulling a dataset from repo.
Container Mount Point /gitsync/myrepo.
Symlinks ln -s /gitsync/myrepo/myrepo /dataset/myrepo.
Gitsync dataset with homeSymlink
Annotations:  dataset.primehub.io/homeSymlink: true
Spec:
    Type:      git
    URL:       repo: xxx/myrepo
Container Mount Point /gitsync/myrepo.
Symlinks
- ln -s /gitsync/myrepo/myrepo /dataset/myrepo.
- ln -s /dataset/myrepo ~/myrepo.
Gitsync dataset with mountRoot
annotations:  
    dataset.primehub.io/mountRoot: /foo/bar
spec:
    type:      git
    url:       repo: xxx/myrepo
Container Mount Point /gitsync/myrepo.
Symlinks ln -s /gitsync/myrepo/myrepo /foo/bar/myrepo.
Gitsync dataset with homeSymlink and mountRoot
annotations: 
    dataset.primehub.io/homeSymlink: true
    dataset.primehub.io/mountRoot: /foo/bar
spec:
    type:      git
    url:       repo: xxx/myrepo
Container Mount Point /gitsync/myrepo.
Symlinks
- ln -s /gitsync/myrepo/myrepo /foo/bar/myrepo.
- ln -s /foo/bar/myrepo ~/myrepo.
Basic env
Taking environment variables as datasets.
spec:
  variables:
    MYSQLDB: sql_mine
Spec:
- variables: variables in- key/valuepair.
