For AI to Succeed, MLOps Needs a Bridge to DevOps

Once an ML model is trained and ready, we should be able to work with it as we do with any other software module because it is just code and data.

Jul 29th, 2022 10:00am by Luis Ceze

Featued image for: For AI to Succeed, MLOps Needs a Bridge to DevOps

Feature image via Pixabay.

Luis Ceze

Luis is co-founder and CEO at OctoML, a University of Washington spin-out that aims to make efficient and secure machine learning accessible to everyone and every hardware target. Luis is an award-winning professor of computer science at the University of Washington, where he joined the faculty in 2007. His research focuses on the intersection of computer architecture, programming languages, molecular biology and machine learning. At UW, he co-directs the Molecular Information Systems Lab, where they are pioneering the technology to store data on synthetic DNA and exploring novel uses of DNA nanotechnology. He also co-directs the SAMPL Lab, which focuses on hardware/software co-optimization for machine learning and where the TVM deep-learning compilation and optimization stack started. He is a recipient of an NSF Career Award, a Sloan Research Fellowship, a Microsoft Research Faculty Fellowship, the IEEE TCCA Young Computer Architect Award and UIUC Distinguished Alumni Award. He is a member of the DARPA ISAT and MEC study groups.

AI has been heralded as the new “brains” for software applications, a role long held by databases. Unfortunately, AI is not so easy for application developers and operations teams to adopt and absorb. Actually, incorporating machine-learning models (which power AI) in productivity-focused applications — to make them smarter — is overly difficult and complex. Moreover, ML models depend on specific combinations of hardware and software infrastructure. Without the right infrastructure, the models either cannot perform well enough to be viable or, in some cases, become prohibitively costly.

Today there is no efficient bridge between the creation of ML models and the process of getting them into production. To illustrated this: The average time to production for ML models is 12 weeks. What’s worse, nearly half of the models are shelved for performance or cost reasons, which makes AI less transformational than many hoped.

If AI is to be the “brains” of applications, a world where ML models are heavily specialized, requiring unique and customized workflows and tools is problematic. The fact is once an ML model is trained and ready, we should be able to work with it as we do with any other software module because it is just code and data. The inclusion of an ML model should not mean that DevOps as we know it goes out the window.

The deployment requirements of ML are often tough to accommodate because the ML software stack is fully dependent on the hardware where it runs. ML developers should be able to build models without worrying about the hardware backend. It’s ironic that to implement an advanced technology like artificial intelligence, ML models must be hand-tuned to meet performance SLAs for the application.

Today the process is so challenging that even skilled data scientists and AI practitioners get it wrong — models end up in their own unique pipeline, more often than not. With few exceptions, the pipelines are custom-assembled and fragile. Changes made to the deployment hardware choice, the environment, training framework, software library, or integration stack can necessitate a thorough debugging or even a complete rebuild.

The handoff from data scientists to app developers and ops teams is characterized by trial and error. This is a drag on AI application development. To make it less of an obstacle course, the machine-learning side needs to realign and mesh with DevOps workflows and best practices.

DevOps Came First. It’s MLOps That Has to Fit into the DevOps World

In the early days of enterprise AI, MLOps originated as a term to refer to a set of best practices to design, build, deploy and maintain machine-learning models in production. As it evolves, however, the scope has expanded to the whole of ML lifecycle management. Depending on who you ask (and what they are selling), it encompasses everything from model generation, orchestration and deployment, to health, diagnostics, governance and business metrics.

Speaking as a machine-learning engineer, model creation is a distinct discipline with its own processes and toolsets. When model creation and model deployment are forced together into one mega-process, however, it limits flexibility and choice in a way that creates obstacles. By aspiring to address each step from model creation through deployment, MLOps is asking too much. It creates a parallel development process that requires special resources and expertise that are in short supply today. On the flip side, there are mature processes and abundant talent in the established discipline of DevOps.

It’s probably impractical to demand that software developers learn entire new approaches to work with ML models. They are too busy, and too expensive, and frankly should focus on their core competency. The fact is ML models are elements in smart applications, and familiar DevOps practices and tools actually work well. What we need are ways to fit ML models into the software world. Without that, the success rates for ML-based applications will likely remain unacceptably low.

Machine Learning Models Are Still Fundamentally Software

To look at what hinders the deployment of ML models, we can start with the typical dependencies between ML training frameworks, model types, low-level libraries and compilers, and the chosen hardware. Most users cannot fix dependency issues on their own; they need tools that abstract the complexity, bypass the dependencies and deliver models as production-ready software functions.

The typical ML deployment workflow is specialized and manual. That is a huge pothole in AI’s path to changing the world. Deployment procedures need to be made accessible and workable for application developers, DevOps engineers and IT operations teams. They need to be able to work with models in the same ways they approach the rest of their application stack, using their own DevOps workflows and tools.

On both the ML and the DevOps side, a bridge is needed. Platforms are emerging now that can convert ML models into high-performing, reliable and portable functions that work across different hardware.

The Future

Application developers and DevOps teams should be an integral part of ML deployment, without becoming experts in machine learning. These practitioners need an easier way to work with models as they do with software. AI needs to become more accessible, and therefore low-code/no-code solutions may play a major role in the years ahead, abstracting the complexities of AI/ML. Once this gulf between model creation and deployment into smart applications is bridged, AI’s contribution in business and other fields will start to reach its potential.