Dataset Upload
Provide a upload server to upload data to pv type dataset.
Configuration
Prerequisite
Required PRIMEHUB_FEATURE_USER_PORTAL true. And PRIMEHUB_DOMAIN be set.
Settings
Please add these variables to the .env file
| Name | Value |
|---|---|
PRIMEHUB_FEATURE_DATASET_UPLOAD | true |
Install
make release-install-primehub
Migration
Set PRIMEHUB_STORAGE_CLASS env to correct storage class.
Troubleshooting
- Check Primehub Console Container's Environment Variables
The environment variables should be added automatically.
PRIMEHUB_FEATURE_DATASET_UPLOAD will be added to graphql and ui containers when PRIMEHUB_FEATURE_DATASET_UPLOAD is true in your cluster's .env file.
CMS_APP_PREFIX will be added to graphql container.
PRIMEHUB_GROUP_SC will be added to graphql container. This value is based on
groupvolume:
storageClass: {value}
And if you didn't specify value in yaml, it will be set by PRIMEHUB_STORAGE_CLASS env.
- Check Issuer
If you are using letsencrypt-prod-dns issuer, your dataset upload ingress annotations should contain:
certmanager.k8s.io/acme-challenge-type: "dns01"
certmanager.k8s.io/acme-dns01-provider: "clouddns"
certmanager.k8s.io/cluster-issuer: "letsencrypt-prod-dns"
Design
We use tus protocol to do the resumable file uploads. Backend is tusd. Frontend package is uppy. In order to let user view/edit uploaded files, also have a flask server to view/edit uploaded files. The package to view files is Flask-AutoIndex. Therefore, dataset upload deployment contains two containers and both have a mounted pv dataset.
Metacontroller is used to automatically create desired resources based on our settings.
Application code is under modules/primehub-dataset-upload. K8s and metacontroller related code is under modules/charts/primehub.
Start/Stop Dataset Upload Server
When dataset has an annotation dataset.primehub.io/uploadServer: "true", it will start a dataset upload server.
Otherwise, it is stopped.
Currently, dataset upload url is https://<primehub domain>/admin/dataset/<namespace>/<dataset name>/browse/.
Enable Http Auth to Dataset Upload Server
First, need to have a secret which is created by htpasswd. EX:
htpasswd -c auth <name>
kubectl -n hub create secret generic dataset-upload-<name> --from-file=auth
Then add an annotation dataset.primehub.io/uploadServerAuthSecretName: dataset-upload-<name> to enable http auth.
Username is <name>.
Current Post-Finish Hook in Tus Server (Tusd)
- Make a dir if we need
- Move .bin to their real file name
- Remove .info which generated by tusd
- If it is a zip file, unzip it
Other Notes
- Cli resumable ability now only handle bad network situations. It dose not handle the situation that user cancel a upload job. (web & cli can't resume interchangeably) (https://github.com/tus/tus-js-client/issues/62)
- Mechanism to clean up temporary state files.
Cli
- Download from https://github.com/avvertix/tus-client-cli/releases/tag/v0.3.0
- ./tus-client-macos upload
<filepath>https://<primehub domain>/admin/dataset/<name>/upload/files/