Cluster Quickstart¶
In this tutorial you will learn how to Train, Package and Deploy a model from scratch on Odahu. Once deployed, the model serves RESTful requests, and makes a prediction when provided user input.
Odahu’s API server performs Train, Package, and Deploy operations for you, using its REST API.
Prerequisites¶
- Odahu cluster
- MLFlow and REST API Packager (installed by default)
- Odahu-flow CLI or Plugin for JupyterLab (installation instructions: CLI, Plugin)
- JWT token from API (instructions)
- Google Cloud Storage bucket on Google Compute Platform
- GitHub repository and an ssh key to connect to it
Tutorial¶
In this tutorial, you will learn how to:
- Create an MLFlow project
- Setup Connections
- Train a model
- Package the model
- Deploy the packaged model
- Use the deployed model
This tutorial uses a dataset to predict the quality of the wine based on quantitative features like the wine’s fixed acidity, pH, residual sugar, and so on.
Code for the tutorial is available on GitHub.
Create MLFlow project¶
Before | Odahu cluster that meets prerequisites |
---|---|
After | Model code that predicts wine quality |
Create a new project folder:
$ mkdir wine && cd wine
Create a training script:
$ touch train.py
Paste code into the file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | import os import warnings import sys import argparse import pandas as pd import numpy as np from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score from sklearn.model_selection import train_test_split from sklearn.linear_model import ElasticNet import mlflow import mlflow.sklearn def eval_metrics(actual, pred): rmse = np.sqrt(mean_squared_error(actual, pred)) mae = mean_absolute_error(actual, pred) r2 = r2_score(actual, pred) return rmse, mae, r2 if __name__ == "__main__": warnings.filterwarnings("ignore") np.random.seed(40) parser = argparse.ArgumentParser() parser.add_argument('--alpha') parser.add_argument('--l1-ratio') args = parser.parse_args() # Read the wine-quality csv file (make sure you're running this from the root of MLflow!) wine_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "wine-quality.csv") data = pd.read_csv(wine_path) # Split the data into training and test sets. (0.75, 0.25) split. train, test = train_test_split(data) # The predicted column is "quality" which is a scalar from [3, 9] train_x = train.drop(["quality"], axis=1) test_x = test.drop(["quality"], axis=1) train_y = train[["quality"]] test_y = test[["quality"]] alpha = float(args.alpha) l1_ratio = float(args.l1_ratio) with mlflow.start_run(): lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42) lr.fit(train_x, train_y) predicted_qualities = lr.predict(test_x) (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities) print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio)) print(" RMSE: %s" % rmse) print(" MAE: %s" % mae) print(" R2: %s" % r2) mlflow.log_param("alpha", alpha) mlflow.log_param("l1_ratio", l1_ratio) mlflow.log_metric("rmse", rmse) mlflow.log_metric("r2", r2) mlflow.log_metric("mae", mae) mlflow.set_tag("test", '13') mlflow.sklearn.log_model(lr, "model") # Persist samples (input and output) train_x.head().to_pickle('head_input.pkl') mlflow.log_artifact('head_input.pkl', 'model') train_y.head().to_pickle('head_output.pkl') mlflow.log_artifact('head_output.pkl', 'model') |
In this file, we:
- Start MLflow context on line 46
- Train
ElasticNet
model on line 48 - Set metrics, parameters and tags on lines 59-64
- Save model with name
model
(model is serialized and sent to the MLflow engine) on line 66 - Save input and output samples (for persisting information about input and output column names) on lines 69-72
Create an MLproject file:
$ touch MLproject
Paste code into the file:
name: wine-quality-example
conda_env: conda.yaml
entry_points:
main:
parameters:
alpha: float
l1_ratio: {type: float, default: 0.1}
command: "python train.py --alpha {alpha} --l1-ratio {l1_ratio}"
Note
Read more about MLproject structure on the official MLFlow docs.
Create a conda environment file:
$ touch conda.yaml
Paste code to the created file:
name: example
channels:
- defaults
dependencies:
- python=3.6
- numpy=1.14.3
- pandas=0.22.0
- scikit-learn=0.19.1
- pip:
- mlflow==1.0.0
Note
All python packages that are used in training script must be listed in the conda.yaml file.
Read more about conda environment on the official conda docs.
Make directory “data” and download the wine data set:
$ mkdir ./data
$ wget https://raw.githubusercontent.com/odahu/odahu-examples/develop/mlflow/sklearn/wine/data/wine-quality.csv -O ./data/wine-quality.csv
After this step the project folder should look like this:
.
├── MLproject
├── conda.yaml
├── data
│ └── wine-quality.csv
└── train.py
Setup connections¶
Before | Odahu cluster that meets prerequisites |
---|---|
After | Odahu cluster with Connections |
Odahu Platform uses the concept of Connections to manage authorizations to external services and data.
This tutorial requires three Connections:
- A GitHub repository, where the code is located
- A Google Cloud Storage folder, where input data is located (wine-quality.csv)
- A Docker registry, where the trained and packaged model will be stored for later use
You can find more detailed documentation about a connection configuration here.
Create a Connection to GitHub repository¶
Because odahu-examples repository already contains the required code we will just use this repository. But feel free to create and use a new repository if you want.
Odahu is REST-powered, and so we encode the REST “payloads” in this tutorial in YAML files. Create a directory where payloads files will be staged:
$ mkdir ./odahu-flow
Create payload:
$ touch ./odahu-flow/vcs_connection.odahu.yaml
Paste code into the created file:
kind: Connection
id: odahu-flow-tutorial
spec:
type: git
uri: git@github.com:odahu/odahu-examples.git
reference: origin/master
keySecret: <paste here your base64-encoded key github ssh key>
description: Git repository with odahu-flow-examples
webUILink: https://github.com/odahu/odahu-examples
Note
Read more about GitHub ssh keys
Create a Connection using the Odahu-flow CLI:
$ odahuflowctl conn create -f ./odahu-flow/vcs_connection.odahu.yaml
Or create a Connection using Plugin for JupyterLab:
- Open jupyterlab (available by <your.cluster.base.address>/jupyterhub);
- Navigate to ‘File Browser’ (folder icon)
- Select file
./odahu-flow/vcs_connection.odahu.yaml
and in context menu presssubmit
button;
Create Connection to wine-quality.csv object storage¶
Create payload:
$ touch ./odahu-flow/wine_connection.odahu.yaml
Paste this code into the file:
kind: Connection
id: wine-tutorial
spec:
type: gcs
uri: gs://<paste your bucket address here>/data-tutorial/wine-quality.csv
region: <paste region here>
keySecret: <paste base64-encoded key secret here> # should be enclosed in single quotes
description: Wine dataset
Create a connection using the Odahu-flow CLI or Plugin for JupyterLab, as in the previous example.
If wine-quality.csv is not in the GCS bucket yet, use this command:
$ gsutil cp ./data/wine-quality.csv gs://<bucket-name>/data-tutorial/
Create a Connection to a docker registry¶
Create payload:
$ touch ./odahu-flow/docker_connection.odahu.yaml
Paste this code into the file:
kind: Connection # type of payload
id: docker-tutorial
spec:
type: docker
uri: <past uri of your registry here> # uri to docker image registry
username: <paste your username here>
password: <paste your base64-encoded password here>
description: Docker registry for model packaging
Create the connection using Odahu-flow CLI or Plugin for JupyterLab, as in the previous example.
Check that all Connections were created successfully:
- id: docker-tutorial
description: Docker repository for model packaging
type: docker
- id: odahu-flow-tutorial
description: Git repository with odahu-flow-tutorial
type: git
- id: models-output
description: Storage for trainined artifacts
type: gcs
- id: wine-tutorial
description: Wine dataset
type: gcs
Congrats! You are now ready to train the model.
Train the model¶
Before | Project code, hosted on GitHub |
---|---|
After | Trained GPPI model (a Trained Model Binary) |
Create payload:
$ touch ./odahu-flow/training.odahu.yaml
Paste code into the file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | kind: ModelTraining id: wine-tutorial spec: model: name: wine version: 1.0 toolchain: mlflow # MLFlow training toolchain integration entrypoint: main workDir: mlflow/sklearn/wine # MLproject location (in GitHub) data: - connection: wine-tutorial # Where to save a local copy of wine-quality.csv from wine-tutorial GCP connection localPath: mlflow/sklearn/wine/wine-quality.csv hyperParameters: alpha: "1.0" resources: limits: cpu: 4 memory: 4Gi requests: cpu: 2 memory: 2Gi algorithmSource: vcs: connection: odahu-flow-tutorial |
In this file, we:
- line 7: Set Odahu toolchain’s name to mlflow
- line 8: Reference
main
method inentry_points
(which is defined for MLproject files) - line 9: Point
workDir
to the MLFlow project directory. (This is the directory that has the MLproject in it.) - line 10: A section defining input data
- line 11:
connection
id of the wine_connection.odahu.yaml (created in the previous step) - line 13:
localPath
relative (to Git repository root) path of the data file at the training (docker) container where data were put - lines 14-15: Input hyperparameters, defined in MLProject file, and passed to
main
method - line 23: A section defining training source code
- line 24:
vcs
if source code located in a repository andobjectStorage
if in a storage. Should not use both - line 25: id of the vcs_connection.odahu.yaml (created in the previous step)
Train using Odahu-flow CLI:
$ odahuflowctl training create -f ./odahu-flow/training.odahu.yaml
Check Train logs:
$ odahuflowctl training logs --id wine-tutorial
The Train process will finish after some time.
To check the status run:
$ odahuflowctl training get --id wine-tutorial
When the Train process finishes, the command will output this YAML:
state
succeededartifactName
(filename of Trained Model Binary)
Or Train using the Plugin for JupyterLab:
- Open jupyterlab
- Open cloned repo, and then the folder with the project
- Select file
./odahu-flow/training.odahu.yaml
and in context menu presssubmit
button
You can see model logs using Odahu cloud mode
in the left side tab (cloud icon) in Jupyterlab
- Open
Odahu cloud mode
tab - Look for
TRAINING
section - Press on the row with ID=wine
- Press button
LOGS
to connect to Train logs
After some time, the Train process will finish. Train status is updated in column status
of the TRAINING section
in the Odahu cloud mode
tab. If the model training finishes with success, you will see status=succeeded.
Then open Train again by pressing the appropriate row. Look at the Results section. You should see:
artifactName
(filename of Trained Model Binary)
artifactName
is the filename of the trained model. This model is in GPPI format.
We can download it from storage defined in the models-output
Connection. (This connection is created during Odahu Platform installation, so we were not required to create this Connection as part of this tutorial.)
Package the model¶
Before | The trained model in GPPI Trained Model Binary |
---|---|
After | Docker image for the packaged model, including a model REST API |
Create payload:
$ touch ./odahu-flow/packaging.odahu.yaml
Paste code into the file:
1 2 3 4 5 6 7 8 | kind: ModelPackaging id: wine-tutorial spec: artifactName: "<fill-in>" # Use artifact name from Train step targets: - connectionName: docker-tutorial # Docker registry when output image will be stored name: docker-push integrationName: docker-rest # REST API Packager |
In this file, we:
- line 4: Set to artifact name from the Train step
- line 6: Set to docker registry, where output will be staged
- line 7: Specify the docker command
- line 8: id of the REST API Packager
Create a Package using Odahu-flow CLI:
$ odahuflowctl packaging create -f ./odahu-flow/packaging.odahu.yaml
Check the Package logs:
$ odahuflowctl packaging logs --id wine-tutorial
After some time, the Package process will finish.
To check the status, run:
$ odahuflowctl packaging get --id wine-tutorial
You will see YAML with updated Package resource. Look at the status section. You can see:
image
# This is the filename of the Docker image in the registry with the trained model prediction, served via REST`.
Or run Package using the Plugin for JupyterLab:
- Open jupyterlab
- Open the repository that has the source code, and navigate to the folder with the MLProject file
- Select file
./odahu-flow/packaging.odahu.yaml
and in the context menu press thesubmit
button
To view Package logs, use Odahu cloud mode
in the side tab of your Jupyterlab
- Open
Odahu cloud mode
tab - Look for
PACKAGING
section - Click on the row with ID=wine
- Click the button for
LOGS
and view thePackaging
logs
After some time, the Package process will finish. The status of training is updated in column status
of the PACKAGING section in the Odahu cloud mode
tab. You should see status=succeeded.
Then open PACKAGING again by pressing the appropriate row. Look at the Results section. You should see:
image
(this is the filename of docker image in the registry with the trained model as a REST service`);
Deploy the model¶
Before | Model is packaged as image in the Docker registry |
---|---|
After | Model is served via REST API from the Odahu cluster |
Create payload:
$ touch ./odahu-flow/deployment.odahu.yaml
Paste code into the file:
1 2 3 4 5 6 7 | kind: ModelDeployment id: wine-tutorial spec: image: "<fill-in>" predictor: odahu-ml-server minReplicas: 1 imagePullConnectionID: docker-tutorial |
In this file, we:
- line 4: Set the
image
that was created in the Package step - line 5: Set the
predictor
that indicates what Inference Server is used in the image; Check `Predictors`_ for more; - line 7: Set the connection ID to access the container registry where the image lives
Create a Deploy using the Odahu-flow CLI:
$ odahuflowctl deployment create -f ./odahu-flow/deployment.odahu.yaml
After some time, the Deploy process will finish.
To check its status, run:
$ odahuflowctl deployment get --id wine-tutorial
Or create a Deploy using the Plugin for JupyterLab:
- Open jupyterlab
- Open the cloned repo, and then the folder with the MLProject file
- Select file
./odahu-flow/deployment.odahu.yaml
. In context menu press thesubmit
button
You can see Deploy logs using the Odahu cloud mode
side tab in your Jupyterlab
- Open the
Odahu cloud mode
tab - Look for the
DEPLOYMENT
section - Click the row with ID=wine
After some time, the Deploy process will finish. The status of Deploy is updated in column status
of the DEPLOYMENT section in the Odahu cloud mode
tab. You should see status=Ready.
Use the deployed model¶
Step input data | The deployed model |
---|
After the model is deployed, you can check its API in Swagger:
Open <your-odahu-platform-host>/service-catalog/swagger/index.html
and look and the endpoints:
GET /model/wine-tutorial/api/model/info
– OpenAPI model specification;POST /model/wine-tutorial/api/model/invoke
– Endpoint to do predictions;
But you can also do predictions using the Odahu-flow CLI.
Create a payload file:
$ touch ./odahu-flow/r.json
Add payload for /model/wine-tutorial/api/model/invoke
according to the OpenAPI schema. In this payload we provide values for model input variables:
{
"columns": [
"fixed acidity",
"volatile acidity",
"citric acid",
"residual sugar",
"chlorides",
"free sulfur dioxide",
"total sulfur dioxide",
"density",
"pH",
"sulphates",
"alcohol"
],
"data": [
[
7,
0.27,
0.36,
20.7,
0.045,
45,
170,
1.001,
3,
0.45,
8.8
]
]
}
Invoke the model to make a prediction:
$ odahuflowctl model invoke --mr wine-tutorial --json-file r.json
{"prediction": [6.0], "columns": ["quality"]}
Congrats! You have completed the tutorial.