When spawning a user's notebook pod, spawner mounts various types of persistent volumes. They include
- User volume
- Group volume (Project volume)
- Datasets volume
User volume stores users' own data. When a user spawns jupyter server for the first time, the user's PVC is created.
The capacity is determined in the order
- Use user's specified
- If this setting is not defined, use the max
userVolumeCapacityof all the groups the user belongs to.
- If none of the group set the
userVolumeCapacity, use the system's
The storage class is defined in the helm value.
Group volume (Project volume)
A group volume (project volume) stores the data shared by a group. To enable the group volume, the administrator should enable the
Shared Volume in the group setting. And the volume is created if it doesn't exist at the very first time of the mount to the group volume.
The storage class is defined in the helm value
primehub: sharedVolumeStorageClass: "<rwx-storage-class>"
But in most cases, we use an NFS pvc provisioned by
primehub-groupvolume-nfs. The configuration would look like this
primehub: sharedVolumeStorageClass: "" groupvolume: storageClass: standard
- the group volume is manually provisioned
primehub-groupvolume-nfscontroller would create an NFS server backed by an RWO pvc
- the RWO pvc uses a "standard" storage class.
If you didn't specify primehub-group-sc's value ("standard" here) in yaml, it will be set by
Currently, there are these types of the dataset would be mounted as persistence volume
- pv (hostpath)
If the dataset is the type of
pv, the dataset is backed by a pvc. The difference comparing to group volume is
- A dataset can be connected to multiple groups; while group volume only belongs to a group
- The connection from dataset to a group can be read-only or read-write
- Can enable an upload server
Under the hood, the storage class is defined in the helm value
groupvolume: storageClass: standard
It will create an NFS server backed by an RWO pvc which is similar to group volume.
If you didn't specify a value ("standard" here) in yaml, it will be set by
metadata.namein the CRD and the
spec.volumeNamein the CRD.
A special hack of
pv is set the
volumeName with prefix
hostpath:. In this way, the dataset is not backed by a pvc, instead, the volume is a hostpath volume
|hostpath||path after the |
hostpathis not allowed to edit in the admin dashboard in the future.
If the dataset is a type of
git, the dataset is backed by a hostpath and periodically pulls the data from a repository. Under the hood, we use gitsync daemonset to sync the data to hostpath
|The mount path of the pv dataset.|
|Whether a symbolic link to the mountPath is required.|
|The host path to put the gitsync result.|
|The mount path of the git dataset|
|Mount only when the launching group connects this dataset|
|Only works in pv type dataset. Create an upload server.|
|Only works when an upload server is created. Set http basic auth based on secret.|
Symbolic links in the home folder
The home folder is mounted by the user volume. However, in order to locate the project volume and dataset easily, we create symbolic links in the home folder.
- Group Volumes:
- Dataset Volumes:
The symbolic links to a dataset can be removed by setting the annotation
Launch group only
By default, if a user has permissions to access the group volume or a dataset, the spawner would mount it no matter which group the user selects.
But if a group volume or a dataset is configured
Launch Group Only to
no. The volume is only mounted while the user selects the group which connects to this volume.