PrimeHub Apps
Allows third-party application integrated into PrimeHub platform.
Features
- Shared domain: The installed application can be accessed in the sub-path of PrimeHub's domain. We don't need an additional domain for this application.
- Authorization: Allows to restrict the applications only accessible to group members, PrimeHub logged-in users, or public users.
- Data Persistence: Allows applications to persist data in group volume and access data in other persistent storage, like datasets and PHFS.
- Resource constraint: Enforce the resource CPU, memory, GPU quota limitation in a group.
Concepts
Application
We introduce a new concept: Application in PrimeHub. An application is an instance of an integrated third-party application (e.g. MLflow). We install it as a group resource and can install multiple instances for a kind of application within a group.
Application Template
Application template describes how an application is installed. It contains this information
- podTemplate: used to create the deployment of the application
- Service Ports: used to create the service
- HTTP Port: The HTTP port if the application has a web interface
- defaultEnvs: The default env variables are used when creating the application. When an application is created, the values would be put in enviornment variables of the target application.
- ENV Name
- Description
- Default value
- Optional
Create an Application
Users can create an application from an application template.
- Select an application template
- Fill the default envs provided by the template
- Select the instance type
- Choose the scope (Public / PrimeHub users only/ Group members only). The scope only affects the web interface of the application.
- Create
Preset Environment Variables
The preset environment variables can be used in the value field of the environment variables. Here is the list of preset environment variables
PRIMEHUB_APP_ID
: The PhApplication k8s resource name<app-id>
PRIMEHUB_APP_ROOT
: The root of persistence storage for the application.<group-volume>/phapplications/<app-id>
(if group volume available)/phapplications/<app-id>
(if group volume not available)
PRIMEHUB_APP_BASE_URL
: The url prefix for the application/console/apps/<app-id>
PRIMEHUB_URL
: The external url of PrimeHubPRIMEHUB_GROUP
: The group name
Connect to Application
There are two ways to connect to the application
- Connect to a web interface from sub-path of PrimeHub: Most of the applications are web-based applications. To access this kind of application, users can access it from
https://<primehub>/console/apps/<app-id>
from the browser. - Connect to a TCP endpoint from the host name and port: For some applications, they provide non-HTTP service. We can access it by the service endpoint
<my-app-svc>:<my-app-port>
. The endpoint can be only accessed in the PrimeHub cluster internally. (like notebooks and jobs)
Application Management
- Users can start/stop an application
- Users can get the basic information of the application
- Users can get the application log
- Users can delete an application
Implementation
Principles
- Introduce a new CRD
PhApplication
that represents the app to install. It will derive a deployment and service for this app. - We use
PhAppTemplate
to createPhApplication
. However,PhApplication
can work standalone without PhAppTemplate. The controller ofPhApplication
should not knowPhAppTemplate
. - GraphQL uses
PhAppTemplate
to createPhApplication
. And the template and template data are stored in thePhApplication
's annotation for update use thereafter.
CRDs
PhApplication
apiVersion: primehub.io/v1alpha1
kind: PhApplication
metadata:
name: mlflow-xyzab
namespace: hub
annotations:
phapplication.primehub.io/template: '<json of app template>'
phapplication.primehub.io/template-data: '<json of app template>'
spec:
stop: false
displayName: "My MLflow"
groupName: "phusers"
instanceType: ""
scope: "group"
podTemplate:
spec:
containers:
- name: mlflow
image: larribas/mlflow:1.9.1
command:
- mlflow
args:
- server
- --host
- 0.0.0.0
- --default-artifact-root
- $(DEFAULT_ARTIFACT_ROOT)
- --backend-store-uri
- $(BACKEND_STORE_URI)
env:
- name: FOO
value: bar
- name: BACKEND_STORE_URI
value: "sqlite:///$(PRIMEHUB_APP_ROOT)/mlflow.db"
- name: DEFAULT_ARTIFACT_ROOT
value: "$(PRIMEHUB_APP_ROOT)/mlruns"
ports:
- containerPort: 5000
name: http
protocol: TCP
svcTemplate:
spec:
ports:
- name: http
port: 5000
protocol: TCP
targetPort: 5000
httpPort: 5000
status:
phase: "Ready"
message: "Error message"
serviceName: "app-mlflow-xyzab"
- annotations:
phapplication.primehub.io/template
: template content used to create thisPhApplication
phapplication.primehub.io/template-data
: template data used to create thisPhApplication
. It is the POST data of the graphqlPhApplication
create.
- spec:
- podTemplate: the template of pods
- svcTemplate: the template of service
- scope: "group", "primehub", "public"
- httpPort: the backend service port that the proxy should forward to
- status:
- phase: "Starting", "Ready", "Updating", "Stopping", "Stopped", "Error"
- message: human readable message
- serviceName: name of service (used for graphql
serviceName
field)
PhAppTemplate
apiVersion: primehub.io/v1alpha1
kind: PhAppTemplate
metadata:
name: mlflow
namespace: hub
spec:
name: MLFlow
description:
version:
docLink:
icon: <url to icon or data-uri>
defaultEnvs:
- name: BACKEND_STORE_URI
description: ""
defaultValue: "sqlite://$(PRIMEHUB_APP_ROOT)/mlflow.db"
optional: false
- name: DEFAULT_ARTIFACT_ROOT
description: ""
defaultValue: "$(PRIMEHUB_APP_ROOT)/mlruns"
optional: false
template:
# The template of phApplication
spec:
podTemplate:
spec:
containers:
- name: mlflow
image: larribas/mlflow:1.9.1
command:
- mlflow
args:
- server
- --host
- 0.0.0.0
- --default-artifact-root
- $(DEFAULT_ARTIFACT_ROOT)
- --backend-store-uri
- $(BACKEND_STORE_URI)
env:
- name: FOO
value: bar
ports:
- containerPort: 5000
name: http
protocol: TCP
svcTemplate:
spec:
ports:
- name: http
port: 5000
protocol: TCP
targetPort: 5000
httpPort: 5000
- spec:
- version: the version string of the template
- description: free form description of this applicatin
- docLink: the document url
- defaultEnvs: used for create the additional envs
- name: the name of the environment variable
- descsription: description of the variable
- defaultValue: the default value of the variable
- optional: if the environment is optional
- template (the content of phApplication). See the phApplication
NOTE:
- Why we copy the content of PhAppTemplate to PhApplication instead of use name ref is we want to decouple the created app from the template.
phapplication.primehub.io/template
is used for GraphQL to use. We keep the template so we can use it thereafter while updating the app.
Control Plane
GraphQL create
PhApplication
resource fromPhAppTemplate
The controller of
PhApplication
reconciles thePhApplication
. The hierarchy isPhApplication ├── Deployment ├── Service └── NetworkPolicy
Create
- Console get the template list from GraphQL
- Console select one template and list the defaultEnvs to the UI's variables
- Console call GraphQL to create phapplication
- Get the phapptemplate content
- Append env variables to the end of the container's env
- Set the scope
Update
- Console gets the PhApplication from GraphQL.
- GraphQL returns PhApplication user data
phapplication.primehub.io/template-data
and the PhApplication default envs fromphapplication.primehub.io/template
- Console can reset the variables
- Console can add/remove/update the env vars
- Console call update to the GraphQL
- GraphQL gets the current PhApplication, and modify container env, scope, instance type.
- GraphQL cannot change the appId and template
Controller
Deployment
- name
app-<app-id>
- Use
spec.podTemplate.spec
- Volumes
- Add group, dataset volumes
- Add empty dir if no group volume available
- Init Container
- run as root
mkdir -p $(PRIMEHUB_APP_ROOT)
- Container
- Keep only the first container
- Set resources from instanceType
- Prepend (not append) the primehub required envs.
PRIMEHUB_APP_ID
, ,PRIMEHUB_APP_ROOT
andPRIMEHUB_APP_BASE_URL
- Mount group, dataset volumes
- Mount empty dir if no group volume available (
/phapplications/<app-id>
)
- The created pod should have label
app=primehub-app
primehub.io/phapplication=<appid>
,primehub.io/group: <escaped group>
- name
Service
- name
app-<app-id>
- Use
spec.svcTemplates.spec
- name
NetworkPolicy
- Allows the ingress traffic from
- pod label with
primehub.io/group=<escape-group>
- primehub-console (for proxy)
- pod label with
- Allows the ingress traffic from
status.Phase
The phase of PhApplication
- Starting: App is starting, no ready pod and service still not available
deployment.status.readyReplicas==0
- Ready: App is ready to use
deployment.status.readyReplicas==1
anddeployment.status.replicas==1
- Updating: App is updating, old pod is still ready to use, but new version of app is starting.
deployment.status.readyReplicas==1
anddeployment.status.replicas>1
- Stopping: App is stopping, the pod is terminating but resource has not been freed.
spec.stop=true
anddeployment.status.replicas>0
- Stopped: App is stopped, the pod is delated. No resource is used.
spec.stop=true
anddeployment.status.replicas==0
- We can check by
starting
andupdating
by deployment status
- Starting: App is starting, no ready pod and service still not available
Data Plane
Http Proxy
App path is under https://<primehub>/console/apps/<app-id>
. We validate if the user can access app by server session. The first solution we come out is to validate the traffic by access token. The flow is as follows:
- Get the app information of
<app-id>
in the path - If app is with scope
group only
, it will check if the owner of this access token has permission of the group - If yes, accept the request and proxy to upstream service
Performance issue:
- Access Token is expired about 5 mins. Too short to cache.
- To refresh the access token, we need to request Keycloak token endpoint to ask for a new access token.
- If the refresh token is expired, we need to go through the OIDC process.
- The cache miss rate would be high because we keep changing the access token.
The solution is to implement the "session" concept
- If a new connection to the app, it will use the access token to authorize the request by the access token. If the traffic is accepted, create a session.
- When the session is created, the console sets a cookie with key
phapplication-session-id
under path/console/apps/<app>
, expired in 30mins, and maintains the session cache on the server side. - If the request contains the session cookie and it is found in the session cache. Allows the request to the backend. And it will extend the expiration time to 30mins.
- If the session id is not found in the server, authorize the request as step 1
Performance issue:
- Because the session can be easily extended, there would be much fewer Keycloak token endpoint requests.
- The cache miss rate would be low because the session id is only expired if it is not used for 30 mins.
Log Traffic
Log API
- Add a new endpoint (generic log endpoints authorized by group label)
/api/logs/pods/<pod>
- GraphQL get the pod and find if there are
primehub.io/group
label and unescaped the group. If not found, rejected. - Check the user of the token is the group member of the app
- Get the pod log from k8s API
Console & GraphQL
Get the pod list from GraphQL API (reference
PhDeployment
)phApplication(...) { pods { name // app-mlflow-xyzab-xxxx-yyyyy logEndpoint // https://hub.a.demo.primehub.io/api/logs/pods/app-mlflow-xyzab-xxxx-yyyyy } }
GraphQL get the pods from the label
primehub.io/phapplication=<appid>