Connections

Odahu needs to know how to connect to a bucket, git repository, and so on. This kind of information is handled by Connection API.

General connection structure

All types of connections have the same general structure. But different connections require a different set of fields. You can find the examples of specific type of connection in the id of the Connection types section. Below you can find the description of all fields:

Connection API
kind: Connection
# Unique value among all connections
# Id must:
#  * contain at most 63 characters
#  * contain only lowercase alphanumeric characters or ‘-’
#  * start with an alphanumeric character
#  * end with an alphanumeric character
id: "id-12345"
spec:
    # Optionally description of a connection
    description: "Some description"
    # Optionally link to the web resource. For example, git repo or a gcp bucket
    webUILink: https://test.org/123
    # URI. It is a required value.
    uri: s3://some-bucket/path/file
    # Type of a connection. Available values: s3, gcs, azureblob, git, docker, ecr.
    type: s3
    # Username
    username: admin
    # Password, must be base64-encoded
    password: admin
    # Service account role
    role: some-role
    # AWS region or GCP project
    region: some region
    # VCS reference
    reference: develop
    # Key ID, must be base64-encoded
    keyID: "1234567890"
    # SSH or service account secret, must be base64-encoded
    keySecret: b2RhaHUK
    # SSH public key, must be base64-encoded
    publicKey: b2RhaHUK
    # Defines if connection is vital. Vital connections cannot be deleted
    vital: false

Connection management

Connections can be managed using the following ways.

Swagger UI

Swagger UI is available at http://api-service/swagger/index.html URL.

Odahu-flow CLI

Odahuflowctl supports the connection API. You must be login if you want to get access to the API.

  • Getting all connections in json format:
odahuflowctl conn get --format json
  • Getting the reference of the connection:
odahuflowctl conn get --id odahu-flow-examples -o 'jsonpath=[*].spec.reference'
  • Creating of a connection from conn.yaml file:
odahuflowctl conn create -f conn.yaml
  • All connection commands and documentation:
odahuflowctl conn --help

JupyterLab

Odahu-flow provides the JupyterLab extension for interacting with Connection API.

Connection types

For now, Odahu-flow supports the following connections types:

S3

An S3 connection allows interactions with s3 API. This type of connection is used as storage of:

  • model trained artifacts.
  • input data for ML models.

Note

You can use any S3 compatible API, for example minio or Ceph.

Before usage, make sure that:

Note

At that moment, Odahu-flow only supports authorization though IAM User. We will support AWS service role and authorization using temporary credentials in the near future.

The following fields of connection API are required:

  • spec.type - It must be equal s3.
  • spec.keyID - base64-encoded access key ID (for example, AKIAIOSFODNN7EXAMPLE).
  • spec.keySecret - base64-encoded secret access key (for example, wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY).
  • spec.uri - S3 compatible URI, for example s3://<bucket-name>/dir1/dir2/
  • spec.region - AWS Region, where a bucket was created.
Example of Connection S3:
kind: Connection
id: "training-data"
spec:
    type: s3
    uri: s3://raw-data/model/input
    # keyID before base64-encoding: AKIAIOSFODNN7EXAMPLE
    keyID: "QUtJQUlPU0ZPRE5ON0VYQU1QTEU="
    # keySecret before base64 encoding: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
    keySecret: "d0phbHJYVXRuRkVNSS9LN01ERU5HL2JQeFJmaUNZRVhBTVBMRUtFWQ=="
    description: "Training data for a model"
    region: eu-central-1

Google Cloud Storage

Google Cloud Storage allows storing and accessing data on Google Cloud Platform infrastructure. This type of connection is used as storage of:

  • model trained artifacts.
  • input data for ML models.

Before usage, make sure that:

Note

Workload Identity is the recommended way to access Google Cloud services from within GKE due to its improved security properties and manageability. We will support the Workload Identity in the near future.

The following fields of connection API are required:

  • spec.type - It must be equal gcs.
  • spec.keySecret - base64-encoded service account key in json format.
  • spec.uri - GCS compatible URI, for example gcs://<bucket-name>/dir1/dir2/
  • spec.region - GCP Region, where a bucket was created.
Example of Connection GCS:
kind: Connection
id: "training-data"
spec:
    type: gcs
    uri: gsc://raw-data/model/input
    keySecret: ewogICAgInR5cGUiOiAic2VydmljZV9hY2NvdW50IiwKICAgICJwcm9qZWN0X2lkIjogInByb2plY3RfaWQiLAogICAgInByaXZhdGVfa2V5X2lkIjogInByaXZhdGVfa2V5X2lkIiwKICAgICJwcml2YXRlX2tleSI6ICItLS0tLUJFR0lOIFBSSVZBVEUgS0VZLS0tLS1cbnByaXZhdGVfa2V5XG4tLS0tLUVORCBQUklWQVRFIEtFWS0tLS0tXG4iLAogICAgImNsaWVudF9lbWFpbCI6ICJ0ZXN0QHByb2plY3RfaWQuaWFtLmdzZXJ2aWNlYWNjb3VudC5jb20iLAogICAgImNsaWVudF9pZCI6ICIxMjM0NTU2NzgiLAogICAgImF1dGhfdXJpIjogImh0dHBzOi8vYWNjb3VudHMuZ29vZ2xlLmNvbS9vL29hdXRoMi9hdXRoIiwKICAgICJ0b2tlbl91cmkiOiAiaHR0cHM6Ly9vYXV0aDIuZ29vZ2xlYXBpcy5jb20vdG9rZW4iLAogICAgImF1dGhfcHJvdmlkZXJfeDUwOV9jZXJ0X3VybCI6ICJodHRwczovL3d3dy5nb29nbGVhcGlzLmNvbS9vYXV0aDIvdjEvY2VydHMiLAogICAgImNsaWVudF94NTA5X2NlcnRfdXJsIjogImh0dHBzOi8vd3d3Lmdvb2dsZWFwaXMuY29tL3JvYm90L3YxL21ldGFkYXRhL3g1MDkvdGVzdEBwcm9qZWN0X2lkLmlhbS5nc2VydmljZWFjY291bnQuY29tIgp9
    description: "Training data for a model"
    region: us-central2
Original service account JSON key, that is used in the example above, before base64-encoding:
{
    "type": "service_account",
    "project_id": "project_id",
    "private_key_id": "private_key_id",
    "private_key": "-----BEGIN PRIVATE KEY-----\nprivate_key\n-----END PRIVATE KEY-----\n",
    "client_email": "test@project_id.iam.gserviceaccount.com",
    "client_id": "123455678",
    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    "token_uri": "https://oauth2.googleapis.com/token",
    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
    "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/test@project_id.iam.gserviceaccount.com"
}

Azure Blob storage

Odahu-flow uses the Blob storage in Azure to store:

  • model trained artifacts.
  • input data for ML models.

Before usage, make sure that:

The following fields of connection API are required:

  • spec.type - It must be equal azureblob.
  • spec.keySecret - Odahu-flow uses the shared access signatures to authorize in Azure. The key has the following format: “<primary_blob_endpoint>/<sas_token>” and must be base64-encoded.
  • spec.uri - Azure storage compatible URI, for example <bucket-name>/dir1/dir2/
Example of Connection Blob Storage:
kind: Connection
id: "training-data"
spec:
    type: azureblob
    uri: raw-data/model/input
    # keySecret before base64-encoding: https://myaccount.blob.core.windows.net/?restype=service&comp=properties&sv=2019-02-02&ss=bf&srt=s&st=2019-08-01T22%3A18%3A26Z&se=2019-08-10T02%3A23%3A26Z&sr=b&sp=rw&sip=168.1.5.60-168.1.5.70&spr=https&sig=F%6GRVAZ5Cdj2Pw4tgU7IlSTkWgn7bUkkAg8P6HESXwmf%4B
    keySecret: aHR0cHM6Ly9teWFjY291bnQuYmxvYi5jb3JlLndpbmRvd3MubmV0Lz9yZXN0eXBlPXNlcnZpY2UmY29tcD1wcm9wZXJ0aWVzJnN2PTIwMTktMDItMDImc3M9YmYmc3J0PXMmc3Q9MjAxOS0wOC0wMVQyMiUzQTE4JTNBMjZaJnNlPTIwMTktMDgtMTBUMDIlM0EyMyUzQTI2WiZzcj1iJnNwPXJ3JnNpcD0xNjguMS41LjYwLTE2OC4xLjUuNzAmc3ByPWh0dHBzJnNpZz1GJTZHUlZBWjVDZGoyUHc0dGdVN0lsU1RrV2duN2JVa2tBZzhQNkhFU1h3bWYlNEI=
    description: "Training data for a model"

GIT

Odahu-flow uses the GIT type connection to download a ML source code from a git repository.

The following fields of connection API are required:

  • spec.type - It must be equal git.
  • spec.keySecret - a base64 encoded SSH private key.
  • spec.uri - GIT SSH URL, for example git@github.com:odahu/odahu-examples.git

spec.reference must be provided either in a connection OR in a model training object (General training structure).

Example of command to encode ssh key:

cat ~/.ssh/id_rsa | base64 -w0

Note

Odahu-flow only supports authorization through SSH.

Warning

We recommend using the read-only deploy keys: Github docs or Gitlab docs.

Example of GIT Connection:
kind: Connection
id: "ml-repository"
spec:
    type: git
    uri: git@github.com:odahu/odahu-examples.git
    keySecret: ClNVUEVSIFNFQ1JFVAoK
    reference: master
    description: "Git repository with the Odahu-Flow examples"
    webUILink: https://github.com/odahu/odahu-examples

Docker

This type of connection is used for pulling and pushing of the Odahu packager result Docker images to a Docker registry. We have been testing the following Docker repositories:

Warning

Every docker registry has its authorization specificity. But you must be able to authorize by a username and password. Read the documentation.

Before usage, make sure that:

  • You have a username and password.

The following fields of connection API are required:

  • spec.type - It must be equal docker.
  • spec.username - docker registry username.
  • spec.password - base64-encoded docker registry password.
  • spec.uri - docker registry host.

Warning

Connection URI must not contain a URI schema.

Example of GCR Docker connection
kind: Connection
id: "docker-registry"
spec:
    type: docker
    uri: gcr.io/project/odahuflow
    username: "_json"
    password: ewogICAgInR5cGUiOiAic2VydmljZV9hY2NvdW50IiwKICAgICJwcm9qZWN0X2lkIjogInByb2plY3RfaWQiLAogICAgInByaXZhdGVfa2V5X2lkIjogInByaXZhdGVfa2V5X2lkIiwKICAgICJwcml2YXRlX2tleSI6ICItLS0tLUJFR0lOIFBSSVZBVEUgS0VZLS0tLS1cbnByaXZhdGVfa2V5XG4tLS0tLUVORCBQUklWQVRFIEtFWS0tLS0tXG4iLAogICAgImNsaWVudF9lbWFpbCI6ICJ0ZXN0QHByb2plY3RfaWQuaWFtLmdzZXJ2aWNlYWNjb3VudC5jb20iLAogICAgImNsaWVudF9pZCI6ICIxMjM0NTU2NzgiLAogICAgImF1dGhfdXJpIjogImh0dHBzOi8vYWNjb3VudHMuZ29vZ2xlLmNvbS9vL29hdXRoMi9hdXRoIiwKICAgICJ0b2tlbl91cmkiOiAiaHR0cHM6Ly9vYXV0aDIuZ29vZ2xlYXBpcy5jb20vdG9rZW4iLAogICAgImF1dGhfcHJvdmlkZXJfeDUwOV9jZXJ0X3VybCI6ICJodHRwczovL3d3dy5nb29nbGVhcGlzLmNvbS9vYXV0aDIvdjEvY2VydHMiLAogICAgImNsaWVudF94NTA5X2NlcnRfdXJsIjogImh0dHBzOi8vd3d3Lmdvb2dsZWFwaXMuY29tL3JvYm90L3YxL21ldGFkYXRhL3g1MDkvdGVzdEBwcm9qZWN0X2lkLmlhbS5nc2VydmljZWFjY291bnQuY29tIgp9
Original service account JSON key, that is used in the example above, before base64-encoding:
{
    "type": "service_account",
    "project_id": "project_id",
    "private_key_id": "private_key_id",
    "private_key": "-----BEGIN PRIVATE KEY-----\nprivate_key\n-----END PRIVATE KEY-----\n",
    "client_email": "test@project_id.iam.gserviceaccount.com",
    "client_id": "123455678",
    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    "token_uri": "https://oauth2.googleapis.com/token",
    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
    "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/test@project_id.iam.gserviceaccount.com"
}
Example of Docker Hub
kind: Connection
id: "docker-registry"
spec:
    type: docker
    uri: docker.io/odahu/
    username: username
    # password before encoding: mypassword
    password: bXlwYXNzd29yZA===

Amazon Elastic Container Registry

Amazon Elastic Container Registry is a managed AWS Docker registry. This type of connection is used for pulling and pushing of the Odahu packager result Docker images.

Note

The Amazon Docker registry does not support a long-lived credential and requires explicitly to create a repository for every image. These are the reasons why we create a dedicated type of connection for the ECR.

Before usage, make sure that:

The following fields of connection API are required:

  • spec.type - It must be equal ecr.
  • spec.keyID - base64-encoded access key ID (for example, AKIAIOSFODNN7EXAMPLE).
  • spec.keySecret - base64-encoded secret access key (for example, wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY).
  • spec.uri - The url must have the following format, aws_account_id.dkr.ecr.`region`.amazonaws.com/some-prefix.
  • spec.region - AWS Region, where a docker registry was created.
Example of Connection ECR:
kind: Connection
id: "docker-registry"
spec:
    type: ecr
    uri: 5555555555.dkr.ecr.eu-central-1.amazonaws.com/odahuflow
    # keyID before base64-encoding: "AKIAIOSFODNN7EXAMPLE"
    keyID: QUtJQUlPU0ZPRE5ON0VYQU1QTEU=
    # keySecret before base64-encoding: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
    keySecret: d0phbHJYVXRuRkVNSS9LN01ERU5HL2JQeFJmaUNZRVhBTVBMRUtFWQ==
    description: "Packager registry"
    region: eu-central-1