Cluster Quickstart

In this tutorial you will learn how to Train, Package and Deploy a model from scratch on Odahu. Once deployed, the model serves RESTful requests, and makes a prediction when provided user input.

Odahu’s API server performs Train, Package, and Deploy operations for you, using its REST API.

Prerequisites

Tutorial

In this tutorial, you will learn how to:

  1. Create an MLFlow project
  2. Setup Connections
  3. Train a model
  4. Package the model
  5. Deploy the packaged model
  6. Use the deployed model

This tutorial uses a dataset to predict the quality of the wine based on quantitative features like the wine’s fixed acidity, pH, residual sugar, and so on.

Code for the tutorial is available on GitHub.

Create MLFlow project

Before Odahu cluster that meets prerequisites
After Model code that predicts wine quality

Create a new project folder:

$ mkdir wine && cd wine

Create a training script:

$ touch train.py

Paste code into the file:

train.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
import os
import warnings
import sys
import argparse

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

import mlflow
import mlflow.sklearn

def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2

if __name__ == "__main__":
    warnings.filterwarnings("ignore")
    np.random.seed(40)

    parser = argparse.ArgumentParser()
    parser.add_argument('--alpha')
    parser.add_argument('--l1-ratio')
    args = parser.parse_args()

    # Read the wine-quality csv file (make sure you're running this from the root of MLflow!)
    wine_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "wine-quality.csv")
    data = pd.read_csv(wine_path)

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    alpha = float(args.alpha)
    l1_ratio = float(args.l1_ratio)

    with mlflow.start_run():
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        predicted_qualities = lr.predict(test_x)

        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)
        mlflow.set_tag("test", '13')

        mlflow.sklearn.log_model(lr, "model")

        # Persist samples (input and output)
        train_x.head().to_pickle('head_input.pkl')
        mlflow.log_artifact('head_input.pkl', 'model')
        train_y.head().to_pickle('head_output.pkl')
        mlflow.log_artifact('head_output.pkl', 'model')

In this file, we:

  • Start MLflow context on line 46
  • Train ElasticNet model on line 48
  • Set metrics, parameters and tags on lines 59-64
  • Save model with name model (model is serialized and sent to the MLflow engine) on line 66
  • Save input and output samples (for persisting information about input and output column names) on lines 69-72

Create an MLproject file:

$ touch MLproject

Paste code into the file:

MLproject
name: wine-quality-example
conda_env: conda.yaml
entry_points:
    main:
        parameters:
            alpha: float
            l1_ratio: {type: float, default: 0.1}
        command: "python train.py --alpha {alpha} --l1-ratio {l1_ratio}"

Note

Read more about MLproject structure on the official MLFlow docs.

Create a conda environment file:

$ touch conda.yaml

Paste code to the created file:

conda.yaml
name: example
channels:
  - defaults
dependencies:
  - python=3.6
  - numpy=1.14.3
  - pandas=0.22.0
  - scikit-learn=0.19.1
  - pip:
    - mlflow==1.0.0

Note

All python packages that are used in training script must be listed in the conda.yaml file.

Read more about conda environment on the official conda docs.

Make directory “data” and download the wine data set:

$ mkdir ./data
$ wget https://raw.githubusercontent.com/odahu/odahu-examples/develop/mlflow/sklearn/wine/data/wine-quality.csv -O ./data/wine-quality.csv

After this step the project folder should look like this:

.
├── MLproject
├── conda.yaml
├── data
│   └── wine-quality.csv
└── train.py

Setup connections

Before Odahu cluster that meets prerequisites
After Odahu cluster with Connections

Odahu Platform uses the concept of Connections to manage authorizations to external services and data.

This tutorial requires three Connections:

  • A GitHub repository, where the code is located
  • A Google Cloud Storage folder, where input data is located (wine-quality.csv)
  • A Docker registry, where the trained and packaged model will be stored for later use

You can find more detailed documentation about a connection configuration here.

Create a Connection to GitHub repository

Because odahu-examples repository already contains the required code we will just use this repository. But feel free to create and use a new repository if you want.

Odahu is REST-powered, and so we encode the REST “payloads” in this tutorial in YAML files. Create a directory where payloads files will be staged:

$ mkdir ./odahu-flow

Create payload:

$ touch ./odahu-flow/vcs_connection.odahu.yaml

Paste code into the created file:

vcs_connection.odahu.yaml
kind: Connection
id: odahu-flow-tutorial
spec:
  type: git
  uri: git@github.com:odahu/odahu-examples.git
  reference: origin/master
  keySecret: <paste here your base64-encoded key github ssh key>
  description: Git repository with odahu-flow-examples
  webUILink: https://github.com/odahu/odahu-examples

Note

Read more about GitHub ssh keys

Create a Connection using the Odahu-flow CLI:

$ odahuflowctl conn create -f ./odahu-flow/vcs_connection.odahu.yaml

Or create a Connection using Plugin for JupyterLab:

  1. Open jupyterlab (available by <your.cluster.base.address>/jupyterhub);
  2. Navigate to ‘File Browser’ (folder icon)
  3. Select file ./odahu-flow/vcs_connection.odahu.yaml and in context menu press submit button;

Create Connection to wine-quality.csv object storage

Create payload:

$ touch ./odahu-flow/wine_connection.odahu.yaml

Paste this code into the file:

wine_connection.odahu.yaml
kind: Connection
id: wine-tutorial
spec:
  type: gcs
  uri: gs://<paste your bucket address here>/data-tutorial/wine-quality.csv
  region: <paste region here>
  keySecret: <paste base64-encoded key secret here>  # should be enclosed in single quotes
  description: Wine dataset

Create a connection using the Odahu-flow CLI or Plugin for JupyterLab, as in the previous example.

If wine-quality.csv is not in the GCS bucket yet, use this command:

$ gsutil cp ./data/wine-quality.csv gs://<bucket-name>/data-tutorial/

Create a Connection to a docker registry

Create payload:

$ touch ./odahu-flow/docker_connection.odahu.yaml

Paste this code into the file:

docker_connection.odahu.yaml
kind: Connection  # type of payload
id: docker-tutorial
spec:
  type: docker
  uri: <past uri of your registry here>  # uri to docker image registry
  username: <paste your username here>
  password: <paste your base64-encoded password here>
  description: Docker registry for model packaging

Create the connection using Odahu-flow CLI or Plugin for JupyterLab, as in the previous example.

Check that all Connections were created successfully:

- id: docker-tutorial
    description: Docker repository for model packaging
    type: docker
- id: odahu-flow-tutorial
    description: Git repository with odahu-flow-tutorial
    type: git
- id: models-output
    description: Storage for trainined artifacts
    type: gcs
- id: wine-tutorial
    description: Wine dataset
    type: gcs

Congrats! You are now ready to train the model.

Train the model

Before Project code, hosted on GitHub
After Trained GPPI model (a Trained Model Binary)

Create payload:

$ touch ./odahu-flow/training.odahu.yaml

Paste code into the file:

./odahu-flow/training.odahu.yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
kind: ModelTraining
id: wine-tutorial
spec:
  model:
    name: wine
    version: 1.0
  toolchain: mlflow  # MLFlow training toolchain integration
  entrypoint: main
  workDir: mlflow/sklearn/wine  # MLproject location (in GitHub)
  data:
    - connection: wine-tutorial
      # Where to save a local copy of wine-quality.csv from wine-tutorial GCP connection
      localPath: mlflow/sklearn/wine/wine-quality.csv
  hyperParameters:
    alpha: "1.0"
  resources:
    limits:
       cpu: 4
       memory: 4Gi
    requests:
       cpu: 2
       memory: 2Gi
   algorithmSource:
     vcs:
       connection: odahu-flow-tutorial

In this file, we:

  • line 7: Set Odahu toolchain’s name to mlflow
  • line 8: Reference main method in entry_points (which is defined for MLproject files)
  • line 9: Point workDir to the MLFlow project directory. (This is the directory that has the MLproject in it.)
  • line 10: A section defining input data
  • line 11: connection id of the wine_connection.odahu.yaml (created in the previous step)
  • line 13: localPath relative (to Git repository root) path of the data file at the training (docker) container where data were put
  • lines 14-15: Input hyperparameters, defined in MLProject file, and passed to main method
  • line 23: A section defining training source code
  • line 24: vcs if source code located in a repository and objectStorage if in a storage. Should not use both
  • line 25: id of the vcs_connection.odahu.yaml (created in the previous step)

Train using Odahu-flow CLI:

$ odahuflowctl training create -f ./odahu-flow/training.odahu.yaml

Check Train logs:

$ odahuflowctl training logs --id wine-tutorial

The Train process will finish after some time.

To check the status run:

$ odahuflowctl training get --id wine-tutorial

When the Train process finishes, the command will output this YAML:

Or Train using the Plugin for JupyterLab:

  1. Open jupyterlab
  2. Open cloned repo, and then the folder with the project
  3. Select file ./odahu-flow/training.odahu.yaml and in context menu press submit button

You can see model logs using Odahu cloud mode in the left side tab (cloud icon) in Jupyterlab

  1. Open Odahu cloud mode tab
  2. Look for TRAINING section
  3. Press on the row with ID=wine
  4. Press button LOGS to connect to Train logs

After some time, the Train process will finish. Train status is updated in column status of the TRAINING section in the Odahu cloud mode tab. If the model training finishes with success, you will see status=succeeded.

Then open Train again by pressing the appropriate row. Look at the Results section. You should see:

artifactName is the filename of the trained model. This model is in GPPI format. We can download it from storage defined in the models-output Connection. (This connection is created during Odahu Platform installation, so we were not required to create this Connection as part of this tutorial.)

Package the model

Before The trained model in GPPI Trained Model Binary
After Docker image for the packaged model, including a model REST API

Create payload:

$ touch ./odahu-flow/packaging.odahu.yaml

Paste code into the file:

./odahu-flow/packaging.odahu.yaml
1
2
3
4
5
6
7
8
kind: ModelPackaging
id: wine-tutorial
spec:
  artifactName: "<fill-in>"  # Use artifact name from Train step
  targets:
    - connectionName: docker-tutorial  # Docker registry when output image will be stored
      name: docker-push
  integrationName: docker-rest  # REST API Packager

In this file, we:

  • line 4: Set to artifact name from the Train step
  • line 6: Set to docker registry, where output will be staged
  • line 7: Specify the docker command
  • line 8: id of the REST API Packager

Create a Package using Odahu-flow CLI:

$ odahuflowctl packaging create -f ./odahu-flow/packaging.odahu.yaml

Check the Package logs:

$ odahuflowctl packaging logs --id wine-tutorial

After some time, the Package process will finish.

To check the status, run:

$ odahuflowctl packaging get --id wine-tutorial

You will see YAML with updated Package resource. Look at the status section. You can see:

  • image # This is the filename of the Docker image in the registry with the trained model prediction, served via REST`.

Or run Package using the Plugin for JupyterLab:

  1. Open jupyterlab
  2. Open the repository that has the source code, and navigate to the folder with the MLProject file
  3. Select file ./odahu-flow/packaging.odahu.yaml and in the context menu press the submit button

To view Package logs, use Odahu cloud mode in the side tab of your Jupyterlab

  1. Open Odahu cloud mode tab
  2. Look for PACKAGING section
  3. Click on the row with ID=wine
  4. Click the button for LOGS and view the Packaging logs

After some time, the Package process will finish. The status of training is updated in column status of the PACKAGING section in the Odahu cloud mode tab. You should see status=succeeded.

Then open PACKAGING again by pressing the appropriate row. Look at the Results section. You should see:

  • image (this is the filename of docker image in the registry with the trained model as a REST service`);

Deploy the model

Before Model is packaged as image in the Docker registry
After Model is served via REST API from the Odahu cluster

Create payload:

$ touch ./odahu-flow/deployment.odahu.yaml

Paste code into the file:

./odahu-flow/deployment.odahu.yaml
1
2
3
4
5
6
7
kind: ModelDeployment
id: wine-tutorial
spec:
  image: "<fill-in>"
  predictor: odahu-ml-server
  minReplicas: 1
  imagePullConnectionID: docker-tutorial

In this file, we:

  • line 4: Set the image that was created in the Package step
  • line 5: Set the predictor that indicates what Inference Server is used in the image; Check `Predictors`_ for more;
  • line 7: Set the connection ID to access the container registry where the image lives

Create a Deploy using the Odahu-flow CLI:

$ odahuflowctl deployment create -f ./odahu-flow/deployment.odahu.yaml

After some time, the Deploy process will finish.

To check its status, run:

$ odahuflowctl deployment get --id wine-tutorial

Or create a Deploy using the Plugin for JupyterLab:

  1. Open jupyterlab
  2. Open the cloned repo, and then the folder with the MLProject file
  3. Select file ./odahu-flow/deployment.odahu.yaml. In context menu press the submit button

You can see Deploy logs using the Odahu cloud mode side tab in your Jupyterlab

  1. Open the Odahu cloud mode tab
  2. Look for the DEPLOYMENT section
  3. Click the row with ID=wine

After some time, the Deploy process will finish. The status of Deploy is updated in column status of the DEPLOYMENT section in the Odahu cloud mode tab. You should see status=Ready.

Use the deployed model

Step input data The deployed model

After the model is deployed, you can check its API in Swagger:

Open <your-odahu-platform-host>/service-catalog/swagger/index.html and look and the endpoints:

  1. GET /model/wine-tutorial/api/model/info – OpenAPI model specification;
  2. POST /model/wine-tutorial/api/model/invoke – Endpoint to do predictions;

But you can also do predictions using the Odahu-flow CLI.

Create a payload file:

$ touch ./odahu-flow/r.json

Add payload for /model/wine-tutorial/api/model/invoke according to the OpenAPI schema. In this payload we provide values for model input variables:

./odahu-flow/r.json
{
  "columns": [
    "fixed acidity",
    "volatile acidity",
    "citric acid",
    "residual sugar",
    "chlorides",
    "free sulfur dioxide",
    "total sulfur dioxide",
    "density",
    "pH",
    "sulphates",
    "alcohol"
  ],
  "data": [
    [
      7,
      0.27,
      0.36,
      20.7,
      0.045,
      45,
      170,
      1.001,
      3,
      0.45,
      8.8
    ]
  ]
}

Invoke the model to make a prediction:

$ odahuflowctl model invoke --mr wine-tutorial --json-file r.json
./odahu-flow/r.json
{"prediction": [6.0], "columns": ["quality"]}

Congrats! You have completed the tutorial.