Changelog

Odahu 1.6.0, 3 September 2021

Features:

  • Core:
    • MLFlow artifacts storage is now correctly works with cloud storage for Google Cloud and Amazon.

Bug Fixes:

  • Core:
    • Model feedback & Triton model logs are now stored with model name/version (#607,).

Odahu 1.5.0, 1 August 2021

Features:

  • Core:
  • Python SDK:
    • Add clients to work with User and Feedback entities (#295).

Updates

  • Core:
    • Set model-name/model-version headers on service mesh level (#496). That looses the requirements to inference servers. Previously any inference server (typically a model is packed into one on Packaging stage) was obligated to include the headers into response for feedback loop to work properly. That rule restricts from using any third-party inference servers (such as NVIDIA Triton), because we cannot control the response headers.
    • Removed deprecated fields updateAt/createdAt from core API entities (#394).
    • Move to recommended and more high-level way of using Knative which under-the-hood is responsible for a big part of ModelDeployment functionality (#347).
  • CLI:
    • Model info and invoke parameter JWT renamed to token (#577).
    • Usage descriptions updated (#577).
    • Auth tokens are automatically refreshing (#509).
  • Aiflow plugin:

    • Airflow plugin operators expect a service account’s client_secret in a password field of Airflow Connection now. previously it expects client_secret in extra field. (#29).

      `Breaking change!`: You should recreate all Airflow connections for ODAHU server by moving the client_secret from the extra field into the password field.

      Please do not forget to remove your client_secret from the extra field for security reasons.

Bug Fixes:

  • Core:
    • Fix & add missing updatedAT/createdAT (#583, #600, #601, #602).
    • Training result doesn’t contain commit ID when using object storage as algorythm source (#584).
    • RunID is now present for model training with mlflow toolchain (#581).
    • InferenceJob objects can now be deleted correctly (#555).
    • Deployment roleName changes now applies correctly (#533).
    • X-REQUEST-ID header are now correctly handled on service mesh layer to support third-party inference servers (#525).
    • Fix packaging deletion via bulk delete command (#416).

Odahu 1.4.0, 27 February 2021

Features:

  • Core:
    • Triton Packaging Integration (Nvidia Triton Packager) added as a part of Triton pipeline (#437).
    • Local training & packaging now covered with tests (#157).
    • MLflow toolchain with custom format for model training artifact (#31).
  • UI:
    • New Play tab on Deployment page provides a way to get deployed model metadata and make inference requests from the UI (#61).
    • New Logs tab on Deployment page provides a way to browse logs of deployed model (#45).
    • User now can create packaging and deployments based on finished trainings and packagings (#38).

Updates:

  • Core:
    • Service catalog is rewritten (#457).
    • Deployed ML models performance optimized (#357).
    • OpenPolicyAgent-based RBAC for deployed models are implemented (#238).
  • CLI:
    • Option --disable-target for odahuflowctl local pack run command added. It allows you disable targets which will be passed to packager process. You can use multiple options at once. For example: odahuflowctl local pack run ... --disable-target=docker-pull --disable-target=docker-push.
    • Options --disable-package-targets/--no-disable-package-targets for odahuflowctl local pack run command are deprecated.
    • odahuflowctl local pack run behavior that implicitly disables all targets by default is deprecated.

Bug Fixes:

  • Core:
    • Knative doesn’t create multiple releases anymore when using multiple node pools (#434).
    • Liveness & readiness probes lowest values are now 0 instead of 1 (#442).
    • Correct error code now returned on failed deployment validation (#441).
    • Empty uri param is not longer validated for ecr connection type (#440).
    • Return correct error when missed uri param passed for git connection type (#436).
    • Return correct error when user has insufficient privileges (#444).
    • Default branch is now taken for VCS connection if it’s not provided by user (#148).
  • UI:
    • Auto-generated predictor value doesn’t show warning on deploy creation (#80).
    • Default deploy liveness & readiness delays are unified with server values (#74).
    • Deployment doesn’t raise error when valid predictor value passed (#46).
    • Sorting for some columns fixed (#48).
    • Secrets are now masked on review stage of connection creation (#42).
    • Interface is now works as expected with long fields on edit connection page (#65)

Odahu 1.3.0, 7 October 2020

Features:

  • Core:
    • Persistence Agent added to synchronize k8s CRDS into main storage (#268).
    • All secrets passed to ODAHU API now should be base64 encoded. Decrypted secrets retrieved from ODAHU API via /connection/:id/decrypted are now also base64 encoded. (#181, #308).
    • Positive and negative (for 404 & 409 status codes) API tests via odahuflow SDK added (#247).

Updates:

  • Core:
    • Robot tests will now output pods state after each API call to simplify debugging.

Bug Fixes:

  • Core:
    • Refactoring: some abstractions & components were renamed and moved to separate packages to facilitate future development.
    • For connection create/update operations ODAHU API will mask secrets in response body.
    • Rclone output will not reveal secrets on unit test setup stage anymore.
    • Output-dir option path is now absolute (#208).
    • Respect artifactNameTemplate for local training result directory name (#193).
    • Allow to pass Azure BLOB URI without schema on connection creation (#345)
    • Validate model deployment ID to ensure it starts with alphabetic character (#294)
  • UI:
    • State of resources now updates correctly after changing in UI (#11).
    • User aren’t able to submit training when resource request is bigger than limit ‘(#355).
    • Mask secrets on review page during conenction creation process (#42)
    • UI now responds correct in case of concurrent deletion of entities (#44).
    • Additional validation added to prevent creation of resources with unsupported names (#342, #34).
    • Sorting added for training & packaging views (#13, #48).
    • reference field become optional for VCS connection (#50).
    • Git connection hint fixed (#7).
  • CLI:
    • Configuration secrets is now masked in config output (#307).
    • Local model output path will now display correctly (#371).
    • Local training output will now print only local training results (#370).
    • Help message fixed for odahuflowctl gppi command (#375).
  • SDK:
    • All API connection errors now should be correctly handled and retried.

Odahu 1.2.0, 21 August 2020

Features:

  • Core:
    • PostgreSQL became main database backend as part of increasing project maturity (#175). You can find additional documentation in instructions.
  • ODAHU CLI:
    • Option –ignore-if-exist added for entities creation (#199).
    • Descriptions updated for commands & options (#160, #197, #209).
  • ODAHU UI:
    • ODAHU UI turned into open-source software and now available on github under Apache License Version 2.0. UDAHU UI is an WEB-interface for ODAHU based on React and TypeScript. It provides ODAHU workflows overview and controls, log browsing and entity management.

Updates:

  • Knative updated to version 0.15.0. That makes it possible to deploy model services to different node pools (#123).
  • Go dependencies was globally updated to migrate from GOPATH to go modules (#32).

Bug Fixes:

  • Core:
    • Training now will fail if wrong data path or unexisted storage bucket name is provided (#229).
    • Training log streaming is now working on log view when using native log viewer (#234).
    • ODAHU pods now redeploying during helm chart upgrade (#111).
    • ODAHU docker connection now can be created with blank username & password to install from docker public repo (#184).
  • ODAHU CLI:
    • Return training artifacts list sorted by name (#165).
    • Don’t output logs for bulk command (#200).
    • Fix local pack cleanup-containers command (#204).
    • Return correct message if entity not found (#210).
    • Return correct message if no options provided (#211).
  • ODAHU UI:
    • Fix description of replicas of Model Deployment.
    • Trim spaces for input values.
    • Fix incorrect selection of VCS connection.
    • Close ‘ODAHU components’ menu after opening link in it.

Odahu 1.1.0, 16 March 2020

New Features:

  • Jupyterhub:
    Supported the JupyterHub in our deployment scripts. JupyterHub allows spawning multiple instances of the JupyterLab server. By default, we provide the prebuilt ODAHU JupyterLab plugin in the following Docker images: base-notebook, datascience-notebook, and tensorflow-notebook. To build a custom image, you can use our Docker image template or follow the instructions.
  • GPU:
    Added the ability to deploy a model training on GPU nodes. You can find an example of training here. This is one of the official MLFlow examples that classifies flower species from photos.
  • Secuirty:
    We integrated our WEB API services with Open Policy Agent that flexibly allows managing ODAHU RBAC. Using Istio, we forbid non-authorize access to our services. You can find the ODAHU security documentation here.
  • Vault:
    ODAHU-Flow has the Connection API that allows managing credentials from Git repositories, cloud storage, docker registries, and so on. The default backend for Connection API is Kubernetes. We integrated the Vault as a storage backend for the backend for Connection API to manage your credentials securely.
  • Helm 3:
    We migrated our Helm charts to the Helm 3 version. The main goals were to simplify a deployment process to an Openshift and to get rid of the tiller.
  • ODAHU UI:
    ODAHU UI provides a user interface for the ODAHU components in a browser. It allows you to manage and view ODAHU Connections, Trainings, Deployments, and so on.
  • Local training and packaging:
    You can train and package an ML model with the odahuflowctl utility using the same ODAHU manifests, as you use for the cluster training and packaging. The whole process is described here.
  • Cache for training and packaging:
    ODAHU Flow downloads your dependencies on every model training and packaging launch. To avoid this, you can provide a prebuilt Docker image with dependencies. Read more for model training and packagings.
  • Performance improvement training and packaging:
    We fixed multiple performance issues to speed up the training and packaging processes. For our model examples, the duration of training and packaging was reduced by 30%.
  • Documentation improvement:
    We conducted a hard work to improve the documentation. For example, the following new sections were added: Security, Installation, Training, Packager, and Model Deployment.
  • Odahu-infra:
    We created the new odahu-infra Git repository, where we placed the following infra custom helm charts: Fluentd, Knative, monitoring, Open Policy Agent, Tekton.
  • Preemptible nodes:
    Preemptible nodes are priced lower than standard virtual machines of the same types. But they provide no availability guarantees. We added new deployment options to allow training and packaging pods to be deployed on preemptible nodes.
  • Third-parties updates:
    • Istio
    • Grafana
    • Prometheus
    • MLFlow
    • Terraform
    • Buildah
    • Kubernetes

Misc/Internal

  • Google Cloud Registry:
    We have experienced multiple problems while using Nexus as a main dev Docker registry. This migration also brings us additional advantages, such as in-depth vulnerability scanning.
  • Terragrunt:
    We switched to using Terragrunt for our deployment scripts. That allows reducing the complexity of our terraform modules and deployment scripts.