‘Cortex’: An open source platform for deploying machine learning models as production web services -

bacubdalatitic
Aug 13, 2023
6 min read

With its ability to address different machine learning workflows, it grants you full control over model management operations. It also acts as an alternative to serving models with the SageMaker tool, and a model deployment platform on top of AWS services like Elastic Kubernetes Service (EKS), Lambda, or Fargate.

‘Cortex’: An open source platform for deploying machine learning models as production web servic

Download File

BentoML simplifies the process of building machine learning services. It offers a standard, Python-based architecture for deploying and maintaining production grade APIs. This architecture allows users to easily package trained models using any ML framework for online and offline model serving.

KFServing provides a Kubernetes Custom Resource Definition (CRD) for serving machine learning models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX.

As enterprises grow their investments in data platforms, they increasingly want to go beyond using data for internal analytics and start integrating predictions from machine learning (ML) models to create a competitive advantage for their products and services. For example, financial institutions deploy ML models to detect fraudulent transactions in real-time, and retailers use ML models to personalize product recommendations for each customer.

These mission-critical applications require an MLOps platform that can scale to process millions of predictions per second at low latency and with high availability while providing visibility into how models are performing in production. This becomes even more of a challenge with compute-intensive deep learning models that power natural language processing and computer vision applications.

To accelerate model serving and MLOps on Databricks, we are excited to announce that Cortex Labs, a Bay Area-based MLOps startup, has joined Databricks. Cortex Labs is the maker of Cortex, a popular open-source platform for deploying, managing, and scaling ML models in production. Cortex Labs was backed by leading infrastructure software investors Pitango Venture Capital, Engineering Capital, Uncorrelated Ventures, at.inc/, and Abstraction Capital, as well as angels Jeremey Schneider and Lior Gavish.

Cortex provides machine learning platform technologies, modeling expertise, and education to teams at Twitter. Its purpose is to improve Twitter by enabling advanced and ethical AI. With first-hand experience running machine learning models in production, Cortex seeks to streamline difficult ML processes, freeing engineers to focus on modeling, experimentation, and user experience.

Airflow, like most python-based open sourced systems, uses the StatsClient from statsd. Twitter uses StatsReceiver, a part of our open-source Finagle stack in github/twitter/util. However, the two models are different: in util/stats, each metric is registered, which places it on a list for future collection; in statsd, metrics are simply emitted and the right thing happens on the back end. To solve this problem we built a bridge that recognizes new metrics and register them as they appear. It is API compatible with StatsClient which allows us to inject an instance of our bridge object into Airflow as it is starting. With metrics now collected by our visualization system, we are able to provide a templated dashboard to simplify creation of a monitoring dashboard for our self-service clients.

According to the founders of Cortex Lab, they catch up with the idea of developing a uniform API to deploy the machine learning models quickly over the cloud. For that, they took all the open-source tools like Tensorflow, Docker and Kubernetes. Then they combined all of them with the AWS service like CloudWatch, EKS(Elastic Kubernetes Service) and S3 (Simple Storage Service) to achieve a single API to deploy any machine learning models.

InterpretML is an open-source package that incorporates state-of-the-art machine learning interpretability techniques under one roof. With this package, you can train interpretable glassbox models and explain blackbox systems. InterpretML helps you understand your model's global behavior, or understand the reasons behind individual predictions.

MLRun is an end-to-end open-source MLOps orchestration framework to manage and automate your entire analytics and machine learning lifecycle, from data ingestion, through model development to full pipeline deployment. MLRun eases the development of machine learning pipelines at scale and helps ML teams build a robust process for moving from the research phase to fully operational production deployments.

Seldon handles scaling to thousands of production machine learning models and provides advanced machine learning capabilities out of the box including Advanced Metrics, Request Logging, Explainers, Outlier Detectors, A/B Tests, Canaries and more.

The Iguazio Data Science Platform accelerates and scales development, deployment and management of your AI applications with MLOps and end-to-end automation of machine learning pipelines. The platform includes an online and offline feature store, fully integrated with automated model monitoring and drift detection, model serving and dynamic scaling capabilities, all packaged in an open and managed platform.

PrimeHub, an open-source pluggable MLOps platform on the top of Kubernetes for teams of data scientists and administrators. PrimeHub equips enterprises with consistent yet flexible tools to develop, train, and deploy ML models at scale. By improving the iterative process of data science, data teams can collaborate closely and innovate fast.

TensorFlow Serving is an easy-to-deploy, flexible and high performing serving system for machine learning models built for production environments. It allows easy deployment of algorithms and experiments while allowing developers to keep the same server architecture and APIs. TensorFlow Serving provides seamless integration with TensorFlow models, and can also be easily extended to other models and data.

NVIDIA Triton Inference Server simplifies the deployment of AI models at scale in production. The open-source serving software allows the deployment of trained AI models from any framework, such as TensorFlow, NVIDIA, PyTorch or ONNX, from local storage or cloud platform. It supports an HTTP/REST and GRPC protocol, allowing remote clients to request interfacing for any model managed by the server.

Multi Model Server is an open-source tool for serving deep learning and neural net models for inference, exported from MXNet or ONNX. The easy-to-use and flexible tool utilises REST-based APIs to handle state prediction requests. Multi Model Server uses java 8 or a later version to serve HTTP requests.

As the adoption of machine learning and artificial intelligence continues to spread across a wide range of software products and services, so do their best practices and tools to facilitate testing, deployment, management, and monitoring of ML models.

This article discusses an end-to-end MLOps architecture along with open source tools that can assist in accelerating each stage of your machine learning solution. By utilizing a platform-agnostic approach to discuss MLOps architecture, this article can serve as a guide for picking open source tools that can be employed in building an end-to-end MLOps solution.

The MLOps methodology allows machine learning engineers and data scientists to focus on the core development of the model rather than spending time on tasks such as preprocessing data, configuring environments, or monitoring models. This can be done with the help of commercial or open source MLOps tools. The choice of tool is influenced in most part by available resources (mostly financial) and the stability of the tool.

Kubeflow is a very popular open source tool for creating ML pipelines (i.e., workflows for building, training and deploying ML models). This open source toolkit facilitates the scaling of ML models because it runs on Kubernetes. It handles all the container orchestration and management, allowing data scientists to focus on creating their machine learning workflows. As illustrated in the architecture above, the ML workflow involves defining steps for data processing and manipulation, model training, and validation.

At this point, the model should be stored in the production environment of a model registry (the MLflow server) and ready to be deployed and served. It quickly becomes apparent that the work is not yet done once the machine learning model is deployed to production. At this point, important questions about the method of shipping the model, accessibility of the model to end-users, optimizing system metrics like latency or uptime, and model scalability should be considered. Teams can try developing their own model deployment platform on Amazon Web Services like EKS or Lambda to solve all of these problems but they can save a ton of time and effort with Cortex.

Cortex is an open source tool that lets you deploy all types of models allowing your APIs to automatically scale to handle production workloads. It supports multiple frameworks including Tensorflow, pyTorch, and scikit-learn. Cortex can run inference on both CPU or GPU infrastructure and interestingly updates APIs post-deployment without any downtime.

"@context": " ", "@type": "FAQPage", "mainEntity": [ "@type": "Question", "name": "What is an MLOps platform?", "acceptedAnswer": "@type": "Answer", "text": "Many tools have been developed to smoothen the deployment process of a machine learning model. These are called the MLOps tools and are often referred to as MLOps platforms." , "@type": "Question", "name": "What is MLOps Azure?", "acceptedAnswer": "@type": "Answer", "text": "Microsoft provides cloud services and resources through its product Microsoft Azure. One can use Microsoft Azure and MLOps tools to deploy a machine learning project pipeline successfully." ]

Customers on AWS deploy trained machine learning (ML) and deep learning (DL) models in production using Amazon SageMaker, and using other services such as AWS Lambda, AWS Fargate, AWS Elastic Beanstalk, and Amazon Elastic Compute Cloud (Amazon EC2) to name a few. Amazon SageMaker provides SDKs and a console-only workflow to deploy trained models, and includes features for creating or updating endpoints, auto scaling, model tagging, model monitoring, and for creating production variants and multi-model endpoints. 2ff7e9595c

‘Cortex’: An open source platform for deploying machine learning models as production web services -

‘Cortex’: An open source platform for deploying machine learning models as production web servic

Recent Posts

Comments