模型部署 · PrimeHub

透過 Model Deployment 功能，使用者可以對 Deployment 進行新增、刪除、更新及佈建等操作。當 Deployment 功能於 Group 設定中開啟時，該 Group 的成員就可以使用此功能。在資源分配上，佈建上線的 model service 須在同 Group quota 限制下，才能佈建成功。管理者可透過 Grafana 來監測該佈建上線服務的使用狀態及資源使用數據；使用者可以檢視佈建的歷史記錄。

開啟功能

首先開啟指定 Group 的 Model Deployment 功能。

如果先決的專案群組並沒有開啟Model Deployment，將會看到此訊息

Feature not available - Model Deployment is not enabled for this group. Please contact your administrator to enable it.

請將洽管理員為此專案群組開啟 Model Deployment 或切換至其它已開啟功能的專案群組。

主頁

此頁一覽所有已創建的部署；

狀態分別由下列顏色來區別：

輸入Search by name: 依關鍵字搜尋部署。
點擊+ Create Deployment 按鈕，進入 Deployment 創建頁。
點擊Refresh按鈕，更新部署狀態。
勾選Deployed By Me: 僅列出由自己發佈的部署。

部署

各部署列出下列簡短資訊；點擊各部署可查看詳細資訊。

Info	Description
Title	名稱
Endpoint	佈署上線服務 URL
Last Updated	最後更新時間 by `user`
Status	Deployed Failed Deploying Stopped

Manage: 點擊進入部署詳細頁。
Start/Stop: 起動/停止部署服務。

創建

請確認目前預先決定的專案群組，是否為設想的群組；切換專案群組請用下拉選單 Group:。

Deployment Details

Deployment Name: 名稱。
Deployment ID: 系統產生或使用者自行輸入唯一 ID。
Model Image:
- 可以直接部署的映像檔 (Tutorial: 透過 Language Wrapper 打包模型映像檔部署).
- 或是採用 pre-packaged model server 映像檔搭配 model file (Tutorial: 透過 Pre-packaged Server 模型部署、Tutorial: 透過 Pre-packaged Server 模型部署 (PHFS)).
  
  選擇建議合適的 pre-packaged model server 映像檔；可以點擊連結查看相關教學。
Model URI: 模型檔案路徑；支援的 URIs。
Image Pull Secret: 如果必要，請指定下拉 Model Image 所需的 pull secret。
Descriptions: 使用者輸入描述。

模型部署有兩種方式，直接部署已包裝模型檔的映像檔(由使用者事先包裝) 及指定模型檔 Model URI 搭配指定包裝用的映像檔在機制下自動部署成服務。請根據部署方法帶入合適的Model Image及模型檔 Model URI 。

Environment Variables

可加入多個環境變數。

Name: 變數名。
Value: 數值。

Metadata

可加入多個額外「鍵/值」組合。

Name: 鍵名。
Value: 數值。

Resources

InstanceTypes: 指定資源配置請求的 instance type。
Replicas: 指定上線副本的數量。

Endpoint

Private Access: 設定 Endpoint 存取為公開或私人開關；如果開啟私人，則部署詳細頁上會顯示Clients頁籤，可由此產生存取 Token。

Deploy

Update Message: 使用者針對每次更新附上說明。

點擊Deploy鈕，進行部署。

佈建啟動時會跳出對話框，點擊可查看部署詳細內容頁。

部署詳細資訊

Information

欄位	描述
Status	狀態
Message	相關訊息
Endpoint	佈建上線服務 URL
Creation Time	創建時間
Last Updated	最後更新時間
Model Image	指定的 Model Image
Model URI	指定的模型檔案路徑
Image Pull Secret	下拉此 Image 用的 Secret
Description	使用者輸入描述
Instance Type	佈建用資源配請求
Replicas	副本個數
Access Type	Public 或 Private
Run an Example	以實際資料來替代 `Curl` 範例中 `${YOUR_DATA}` 來驗證部署服務；範例依據 `Private` 或 `Public` 而有差異

Metadata 表
Environment Variables 表: 按下小眼睛圖示來顯示變數內容。

Logs

Replicas: 查看指定副本。

Logs 頁上顯示目前部署的記錄。

Timestamp 以 Universal Time Coordinated (UTC) 為基準。

點擊Scroll to Bottom可直接跳至記錄最尾端。

預設只顯示最新的 2000 行記錄；點擊Download可下載完整記錄檔。

History

History 頁上顯示過去已部署的 Deployment 的記錄。

點擊View連結來查看查看各個部署詳細記錄。

Info	Description
User	當時啟動部署的使用者
Deployment Stopped	true 或 false
Model Image	使用 model image url
Model URI	指定的模型檔案路徑
Replicas	副本個數
Group	當時啟動部署的群組
Instance Type	使用的 Instance Type 資源
Timestamp	最後更新時間
Description	使用者輸入描述
Access Type	Public 或 Private
Clients	當 Access Type 為 Private 時，可存取的 clients

Metadata 表
Environment Variables 表: 按下小眼睛圖示來顯示變數內容。

Clients

只有當Private Access開啟時，才會顯示此頁籤。

填入Client Name及點擊 Add client產生該帳號的對應Client Token。

必須帶入此 Token 才能存取私人 endpoint；我們可以將其帶入 curl 命令參數-u <client-name>:<client-token>。

curl -X POST \
    -u <client-name>:<client-token> \
    -d '{"data":{"names":["a","b"],"tensor":{"shape":[2,2],"values":[0,0,1,1]}}}' \
    -H "Content-Type: application/json" \
    https://<primehub>/deployment/<model>/api/<version>/predictions

Client Token 產生後，在介面上只會短暫地顯示一次，請記錄下來；若遺失，請刪除再重新產生。

小技巧

如果希望在模型預測時，知道「誰Client Name」發送請求；可以在預測函式中，從請求的 header X-Forwarded-User取得資訊，如下：

from flask import request as req

...
req.headers.get('X-Forwarded-User') # you can get the client name from the header
...

變更

在 Deployment 頁，點擊 Update 對此部署內容進行更動及部署更新。

其中有Group、 Deployment name、 Deployment ID 無法更動，其餘欄位皆可更新。

刪除

點擊各個 Deployment 框，在部署詳細頁，點擊右上方Delete 鈕，並輸入欲刪除的部署名稱做為確認。

停止服務

點擊各個 Deployment 框，在部署詳細頁，點擊右上方Stop 鈕，停止服務。

監測服務

PrimeHub 提供一個基於 Seldon Core Analytics 的 Grafana 監測板，我們可以根據 deployment/model/model version 來選擇監測目標。

首先從 User Portal 進入 Grafana。
選擇 PrimeHub / Model Deployments 監測板, 此時會列舉出所有已佈署提供服務的模型。
選擇要監測的佈署，即可監測模型的運作。

預設監測指標:

QPS (Queries Per Second)
Success rate
4xx, error if any
5xx, error is any
Predict QPS
Reward

The reward is interpreted as the proportion of successes in the batch of data samples. Thus this implementation inherently assumes binary rewards for each sample in the batch. The helper function n_success_failures calculates the number of successes and failures given the batch of data samples and the reward. -Reference.
Latency

此監測板基於 Seldon Core Analytics；更多詳細進階資訊可以參照 document 及 code。

授權警示

當已使用模型部署數量 > 授權模型部署數量 + 10%，警示訊息會顯示，更進一步，Create Deployment 將無法使用。

Please contact your system administrator for assistance to upgrade your license to run more models.

想要得知目前授權資訊，請見 PrimeHub License