Deploying an End-to-End Machine Learning Solution on Kubeflow Pipelines - Kubeflow for Poets
A Kubeflow pipeline component is an implementation of a pipeline task. A component is a step in the workflow. Each task takes one or more artifacts as input and may produce one or more artifacts as output.
Each component usually includes two parts:
- Client code: The code that talks to endpoints to submit jobs. For example, code to connect with the Google Cloud Machine Learning Engine.
- Runtime code: The code that does the actual job and usually runs in the cluster. For example, the code that prepares the model for training on Cloud MLE.
A component consists of an interface (inputs/outputs), the implementation (a Docker container image and command-line arguments) and metadata (name, description).
Overview of a simple end-to-end solution pipeline
In this simple example, we will implement a deep neural regressor network to predict the closing prices of bitcoin cryptocurrency. The machine learning code itself is pretty basic as it is not the focus of this article. The goal here is to orchestrate a machine learning engineering solution using microservice architectures on Kubernetes with Kubeflow Pipelines.
The pipeline consists of the following components:
- Move raw data hosted on Github to a storage bucket.
- Transform the dataset using Google Dataflow.
- Carry-out hyper-parameter training on Cloud Machine Learning Engine.
- Train the model with the optimized hyper-parameters.
- Deploy the model for serving on Cloud MLE.
Create a container image for each component
First, we’ll package the client and runtime code into a Docker image. This image also contains the secure service account key to authenticate against GCP. For example, the component to transform the dataset using Dataflow has the following files built into its image:
__ DataflowTransform |
__ Dockerfile : Dockerfile to build the Docker image. |
__ build.sh : Script to initiate the container build and upload to Google Container Registry. |
__ dataflow_transform.py : Code to run the beam pipeline on Cloud Dataflow. |
__ service_account.json : Secure key to authenticate container on GCP. |
__ local_test.sh : Script to run the image pipeline component locally. |
Build containers before uploading to Kubeflow Pipelines
Before uploading the pipeline to Kubeflow Pipelines, be sure to build the component containers so that the latest version of the code is packaged and uploaded as images to the container registry. The code provides a handy bash
script to build all containers.
Compile the Pipeline using the Kubeflow Pipelines DSL language
The pipeline code contains a specification on how the components interact with one another. Each component has an output that serves as an input to the next component in the pipeline. The Kubeflow pipeline DSL language dsl-compile
from the Kubeflow Pipelines SDK is used to compile the pipeline code in Python for upload to Kubeflow Pipelines.
Ensure the Kubeflow Pipelines SDK is installed on the local machine by running:
# install kubeflow pipeline sdk
pip install https://storage.googleapis.com/ml-pipeline/release/0.1.12/kfp.tar.gz --upgrade
# verify the install
which dsl-compile
Compile the pipeline by running:
# compile the pipeline
python3 [path/to/python/file.py] [path/to/output/tar.gz]
For the sample code, we used:
python3 crypto_pipeline.py crypto_pipeline.tar.gz
Upload and execute the Pipeline to Kubeflow Pipelines
1 Upload the pipeline to Kubeflow Pipelines.
2 Click on the pipeline to see the static graph of the flow.
3 Create an Experiment
and Run
to execute the pipeline.
4 Completed Pipeline Run.
Completed Dataflow Pipeline
Deployed Model on Cloud MLE
Delete Resources
Delete Kubeflow
# navigate to kubeflow app
cd ${KFAPP}
# run script to delete the deployment
${KUBEFLOW_SRC}/scripts/kfctl.sh delete all
Delete the Kubernetes Cluster
# delete the kubernetes cluster
gcloud container clusters delete ekaba-gke-cluster
References
The following resources are helpful for creating and deploying a Kubeflow Pipeline.