Skip to main content

MLOps: Exclusive Insights Into the Field of Machine Learning Operations

Machine Learning has been around for decades. However, the term has been interchangeably used with Artificial Intelligence, Deep Learning, and many others. With machine learning increasingly becoming part of technologies from all walks of life, software development is no exception. Therefore, we deem it necessary to explicate Machine Learning (ML) and all related terms for startup founders. Centrally, the blog addresses how Machine Learning Operations (MLOps) has now become an integral part of the Software Development Lifecycle (SDLC) for many businesses.

Introduction to MLOps and Why It Matters?

Machine Learning is a set of algorithms that learn from data, unlike explicitly programmed algorithms. It is used in many fields such as spam detection, product recommendations, object recognition, fraud detection, forecast modeling, etc.

For machine learning systems, the real challenge is not to build a machine learning model; instead, the real challenge is to build an integrated machine learning system that continuously operates seamlessly in production with quick solutions to any arising issues. Practicing machine learning operations elicits the strive to automate and monitor all parts of the data pipelines such as data collection, data cleaning, feature engineering, model development, evaluation, deployment, and infrastructure management.

Among the numerous benefits of ML, the first is the increase in development velocity. It might take time for companies to develop processes for MLOps today but it rapidly reduces the time of development for data scientists, consequently leading to the growth of the company.

Another benefit is data and model lineage and tracking. The credibility of a prediction lies in the data source, the type and model of cleaning transformations, and what metrics are used to evaluate that model. Hence, in MLOps, every part of the pipeline is tracked and versioned, thus allowing for model audits in the future.

Another MLOps benefit is reproducibility. In MLOps, the emphasis on version control, tracking, and modularization of all components enables developers to re-run the pipeline and produce the same results, models, and data. Upgrading a stack in MLOps is easier due to modularization and containerization of the code. These steps are taken as best practices in MLOps and lead to fewer issues in production as well.

What Are the Different Components of a Machine Learning System?

Machine learning has two main parts namely, data and algorithms. In machine learning systems, these two main parts are further divided into six sub-parts, the first of which is data collection and analysis. Based on its importance in ML systems, data is collected from multiple sources. This collected data could be structured or unstructured, therefore, it needs to be analyzed. The analysis tackles the following queries: the origin of the data, its range and scale of values, and its quality.

The second part is feature engineering. Once the data is developed, it needs more work before it is fed into an ML model. This feature engineering work varies as per the requirements e.g. in the spam classification model, the features such as the subject line or the email body text would be developed. Similarly, for the stock market value prediction model, the feature engineering would require features such as historic prices of the stock, market indexes, market volatility, or political stability.

The third part is model development. After the feature development, the data with highlighted features are fed into the ML model. With the evolution of ML technology, model development has become the easiest part of the pipeline. Owing to the vast and conscientious libraries, pipelines only require a few lines of code and deliver state-of-the-art ML performance.

The fourth part is the model evaluation and validation. Once the model is built, its quality and performance for the business use-case are assessed thus providing a direction for the machine learning model and how it should be optimized.

The fifth part is model deployment. The model at this stage is used for live predictions and a pipeline is built around it which continuously deploys and serves the requirement.

The last part is the monitoring and it is a vital part of the ML systems. Monitoring of ML models is performed to ensure the required performance is maintained in production. It also ensures there are little to zero deviations from offline model development.

MLOps Vs DevOps: Are They Really That Different?

DevOps normally focus on application development, but MLOps is a combination of both DevOps and machine learning. The functional features of DevOps, such as CI/CD deployment, dependable releases, load testing, and monitoring, are combined with machine learning components to facilitate MLOps.

The differences between them can be summarized as follows;

MLOps team is structured differently from a DevOps team; in that data scientists are part of the mix and they might not have software engineering knowledge.

The second difference is the experimental development. Traditional software development is roughly linear as compared to ML pipeline which is rather circular.

The third difference is the requirement of additional testing apart from integration and unit testing. This additional testing entails data and model testing on top of integration and unit testing.

The fourth difference is in the deployment process. Apart from fast deployment, the goal is to be able to re-run the deployment based on the signals that appear during production.

Another difference is the monitoring for additional metrics. The traditional metrics used to determine health, traffic, and memory in DevOps are also used in MLOps. However, in the case of MLOps, additional metrics are also required such as prediction quality and model performance, etc.

Do You Need MLOps for Your Startup or Enterprise Business?

It depends on the business and resources. If the data does not change frequently and only needs an update once or twice a year, manual processes might suffice for such businesses. For other businesses where the data is updated a few times a month such as insurance risk and disaster risk-related business use-cases, a certain degree of machine learning automation will be beneficial.

There is also the case of businesses where data is changing very frequently in which case a full MLOps should be the only way forward. For example, in the case of spam detection, the latest data in spam is required and models need to be re-trained in a full feedback loop pipeline. These factors, among many others, determine your need for MLOps.

Should You Automate the Entire MLOps Pipeline or Parts of It?

MLOps is all about monitoring and automation of the pipeline and should be carried out as required. The most frequent part of the business should be automated instead of automating the entire pipeline. Businesses that deploy once a year would not benefit from automating the entire process; however, if your deployment is fairly regular, then MLOps should be practiced.

ML Pipeline Vs CI/CD Pipeline: what are the differences and similarities?

Continuous Integration  (CI) and Continuous Deployment (CD) pipelines are relatively the same as MLOps pipelines with a few additional components. CI/CD pipelines are no longer merely testing and validating code and components but also testing and validating data schemas and models as well. It is no longer a single software package or a service but an entire system that should automatically deploy another service. Consequently, it becomes a machine learning training pipeline that, upon prompt, deploys another service, unlike traditional continuous deployment which is well-defined and linear. The continuous training component is a new feature that the MLOps pipeline exhibits. Another distinguishing factor is the continuous model monitoring in production.

What should we understand from Model or Concept Drift?

The complete performance assessment of a machine learning model in online systems is a challenge and therefore requires an offline model to train the data on a specific use-case and generate a prediction model. However, the offline model performance deviates from the online model performance implying that the model has “gone still” and needs to be re-trained on fresh data. This phenomenon is referred to as the concept drift and requires continuous training in ML systems.

How to Monitor the MLOps Deployment?

In MLOps deployment, apart from the aforementioned traditional metrics, health, memory, time variation, and latency (i.e. the time it takes the model to make a prediction) are also monitored. The throughput metric in MLOps deployment is another important metric as it deliberates on how many examples the model can predict in one instance of prediction. Yet another important metric is data schema skew because data changes need to be monitored with a great deal of attention in ML.

As a Startup Founder, Can I Hire DevOps Engineers to Work as MLOps Engineers?

It is believed in the technology community that either a  machine learning engineer or DevOps engineer should work as an MLOps engineer. Therefore, it could be that the DevOps engineer learns the machine learning systems to become an MLOps engineer and vice versa. However, without a thorough understanding of machine learning that a data scientist would have, solely a DevOps engineer might miss out on important factors, and similarly, without software engineering knowledge, the data scientist might miss out on important deployment details. So, this is turning into a newly rising field called machine learning software engineer with skills to develop basic machine learning models as well as testing and monitoring the model in deployment.

How To Determine if an ML Model Is Ready to Be Released?

As mentioned above, an ML model is trained and deployed for an offline performance evaluation based on a given data set. If the model performed well offline, it would be deployed in production. It is a very basic version of the deployment process. However, a more robust approach is to perform AB testing on the ML model that has been deployed in production, determine its performance, and update the model during production. The model is then constantly evaluated to determine its performance and readiness as opposed to previous practice.

How Do Tech Giants Use MLOps?

For instance, Google has an open-source policy and uses most of the internal libraries or frameworks. There are different levels of software that one can use based on the required sophistication. Another big part of Google culture is Auto ML which performs the activities typical to a machine learning team. The data is injected in Auto ML which automatically trains data and offers a shortcut and accelerates typical machine learning projects and their tasks.

TensorFlow data is also used by Google to monitor production data quality and to compute the basic statistics. Interactive charts allow for visualized inspection and comparison to monitor any concept drifts. On top is a Cloud Composer which essentially orchestrates all the aforementioned functions starting from data exploration and collection to model serving and monitoring among others. In a basic setup, Cloud Composer acts as the orchestrator. For more advanced processes, KubeFlow is available which is a Kubernetes-native machine learning toolkit. It can build and deploy portable and scalable end-to-end machine learning workflows based on containers. Another advanced tool is TFX, recently open-sourced by Google, which is a configuration that provides components to define, launch and monitor TensorFlow-based models and performs model training, prediction, and the serving.

Step by Step Guide for Companies to Adopt MLOps

Many Cloud Service Providers (CSPs) offer abstraction for MLOps. As mentioned earlier, Google Cloud Services offer Google ML Kit. Similarly Azure offers Azure ML service, and Amazon offers Sage Maker. These abstractions allow data scientists to focus on business logic rather than production-related problems.

To Watch the Complete Episode, Please Click on This YouTube Link:

https://www.youtube.com/watch?v=d19JfKF5Y38&t=274s

About The Author(s)

AUTHOR(S)

Related Articles

Related Articles