Allow users to deploy a model as a service.
- Multiple ML framework support: Supports Tensorflow 1, Tensorflow 2, Keras, Pytorch, XGBoost, MXNet, Scikit-learn, LightGBM
- Multiple language support: Supports Python, Java, R, NodeJS, Go (Please see the Seldon wrapper in the Seldon wrapper document)
- Horizontal scaleout: The deployed model service can be scaled to multiple replicas of the model service. So that it can achieve load balancing and fault tolerance easily.
- Deployment history: Track the deployment history.
- Resource constraint: The usage of resources for model deployment is constrained by group resource quota.
- Ingress: Create the ingress resource to redirect external requests to internal model service.
- Endpoint Type: Supports public or private endpoints. For private endpoint, multiple users(clients) is supported.
Deploy a model
- (Admin) Enable a model deployment for one group
- Create a deployment. Select the instance type, model image, and the number of replicas.
- Wait until the deployment is ready
curlto test against the model deployment endpoint
Package a model
- Train a model for some ML framework and select the best model for deployment
- Wrap the model file and build the image by
- Test the packaged image locally
- Push the image to docker registry
Update a deployment
- Select a deployment
- Click the Update button
- Change the image and deploy
Please add this variable to the
|Enable the model deployment|
Seldon is a model deployment solution in the community. The reason why we select Seldon as the solution is because it provides a common way to package different framework's by different programming languages into a docker image.
Seldon also provides an operator under
Seldon core project to manage a
SeldonDeployment resource and reconcile it to the underlying deployments and services. However, for simplicity, we decide not to use the
SeldonDeployment resource. Instead, we define PhDeployment and the controller for generating/to generate the underlying Deployment and Service directly.
A custom resource
PhDeployment is defined for PrimeHub-defined model deployment. The deployment is very similar to the Kubernetes native deployment, the controller would spawn a deployment according to the
PhDeployment's spec. The difference is that the spec contains the PrimeHub-specific concept, like User, Group, and InstanceType.
Here is an example of
apiVersion: primehub.io/v1alpha1 kind: PhDeployment metadata: name: spam-classifier-abcxy namespace: hub spec: displayName: "spam classifier" userId: "4d203a08-896a-4aa8-86e2-882f4d4aadec" userName: "phadmin" groupId: "ca6b032e-b8be-44d2-9646-092622d6ba15" groupName: "phusers" stop: false description: | This is my first deployment. This is my first deployment. This is my first deployment. endpoint: accessType: private - name: foo token: $apr1$sZ55Hcwn$NSHL3Y.HiBBTLQCIIEbUm. - name: bar token: $apr1$bJ3Ar/uT$BM2iFc6RObu7ZdYffToIQ. predictors: - name: predictor1 replicas: 2 modelImage: sandhya1594/spam-classifier:18.104.22.168 instanceType: cpu-only metadata: LEARNING_RATE: "0.02" MINI_BATCH: "20" ACCURACY: "0.98" status: phase: deploying message: "Deploying" replicas: 2 availableReplicas: 2 endpoint: https://primehub.local/deployment/user-defined-postfix/predict history: - time: 2020-03-23T02:03:15Z spec: <PhDeploymentSpec> - time: 2020-03-22T23:45:23Z spec: <PhDeploymentSpec>
PhDeployment resource has the following children
- Ingress: The ingress resource to route the traffic to given model deployment
- Service: The service resource of a given deployment
- Deployment: The deployment of the user's image.
- Secret: The secret for
HTTP basic authenticationof a given private endpoint access type deployment
For each deployment, it requires to provide the model image. It is responsible to translate the REST request to the internal model prediction call.
In Seldon documentation, there are two ways to prepare the model image
Pre-Packaged Inference Servers
- MLflow Server
- SKLearn server
- Tensorflow Serving
- XGBoost server
- Python Language Wrapper (Production)
- Java Language Wrapper (Incubating)
- R Language Wrapper (ALPHA)
- NodeJS Language Wrapper (ALPHA)
- Go Language Wrapper (ALPHA)
Currently, primehub model deployment ONLY supports the language wrapper solution. In the future, we may provide a guideline to write a Dockerfile to pack the model file in the pre-packaged server image.
For a primehub model deployment, the prefix would be
And the input and output of the prediction endpoint are tensor or ndarray.
You can also send an unstructured data (e.g. image file), please find more examples in our model deployment examples
We provide two endpoint types:
private. If a deployment is set to
public, anyone who can connect to the domain(URL) has the privilege to use the model through the API endpoint.
If a deployment is set to
private, user must provide the correct
HTTP basic authentication information when sending the request to the API endpoint. Otherwise, the deployment will return the
401 Unauthorized error. The
HTTP basic authentication user name and password(token) can be configured under the UI. We also support multiple user names/passwords configurations.
The underlying technical process when setting the
HTTP basic authentication under the UI is:
- Data scientist adds a user client in UI
- GraphQL generates a random password(token) in
md5format and it will only show one time in UI
- GraphQL also update
phdeploymentcrd endpoint information in the spec
- Primehub-controller update the
HTTP basic authenticationconfigurations
There are 5 phases in the
- Deploying: Model is deploying. When a deployment is created, updated, or started, it will go to this phase immediately.
- Deployed: Model is deployed successfully. All replicas are in available state.
- Stopping: The deployment is stopping. When a deployment is stopped, it will go to this phase immediately.
- Stopped: The deployment is stopped successfully.
- Failed: The model deployment is failed.
There are several reasons for the
Failed phase. They include
- Group, or instance type not found
- Image invalid or cannot be pulled successfully
- Group resource not enough
- Cluster resource not enough
A model deployment consumes only group quota.
The pod of the model deployment has the label
primehub.io/group=escape(<group>). The PrimeHub's validating webhook would invalidate the pod creation if the new pod exceeds the group resource.
Once the resource exceeds, the deployment would change to phase
Failed with "group resource not enough" error message.
The pod of the model deployment has the labels
The GraphQL server would list the pod by these labels and show the log for the container name
.spec changes, there appends a new record under
.status.history. It contains
time for update time and
spec for the snapshot of the current new updated
The history array only keeps the latest 32 records.
We use Seldon engine to export Prometheus metrics. Under the hood, it accepts the prediction request and forwards it to the user wrapped model container. At the same time, it keeps track of the count and time for each request. The metrics details are described in Seldon metrics
Seldon has a project name Seldon analytics. In which, it installs the Prometheus and Grafana. However, our preferred Prometheus/Grafana installation is prometheus-operator. To adapt the metrics to Prometheus-operator, we implement our own PodMonitoring and Grafana dashboard to visualize the collected metrics.