机器学习未来

介绍 (Introduction)

We live in an unbelievably rich ecosystem for creating present day web applications. The tools for delivering applications to production, monitoring performances, and deploying in real-time are countless. These tools are so indispensable that modern web application development would be almost impossible without them.

我们生活在一个令人难以置信的丰富生态系统中,用于创建当今的Web应用程序。 用于将应用程序交付到生产环境,监视性能以及实时部署的工具数不胜数。 这些工具是必不可少的,没有它们,现代Web应用程序开发几乎是不可能的。

By contrast, modern Machine Learning doesn’t yet have that same ecosystem. This anomaly arises due to a number of reasons: standardized practices are yet to be established, constant evolution of development tools, and modern Deep learning has been around only for a really miniscule amount of time in the grand scheme of things.

相比之下,现代机器学习还没有相同的生态系统。 出现这种异常的原因有很多:尚未建立标准化的做法,开发工具的不断发展,以及现代的深度学习在宏伟的计划中仅占很小的时间。

Also, since most AI projects aren’t single-person jobs, the machine learning code, training, testing, and deployment has to fit into the CI/CD pipelines that the rest of the software is using. This invariably enables data scientists to adopt the same practices that IT engineers have been using for decades.

而且,由于大多数AI项目不是单人工作,因此机器学习代码,培训,测试和部署必须适合其余软件正在使用的CI / CD管道。 这总是使数据科学家能够采用IT工程师数十年来一直使用的相同做法。

Hence, one of the key problems that most new ML operations workflow approaches are trying to solve are around ML tooling and pipeline production. And, even though container orchestration tools (e.g. Kubernetes, Docker-swarm, etc) are integral for modern Machine Learning, they form only a small part of true continuous integration and continuous deployment (CI/CD) pipelines for deep learning. Moreover, traditional CI/CD and a similar system for ML/AI have different constraints and parameters which further broadens the divide.

因此,大多数新的ML操作工作流方法试图解决的关键问题之一就是ML工具和管道生产。 而且,即使容器编排工具(例如Kubernetes,Docker-swarm等)对于现代机器学习而言都是不可或缺的,但它们仅构成用于深度学习的真正持续集成和持续部署(CI / CD)管道的一小部分。 此外,传统的CI / CD和类似的ML / AI系统具有不同的约束条件和参数,进一步扩大了分歧。

This gives rise to the concept of MLOps, which follows a similar patter of DevOps. In other words, MLOps is a framework for collaboration between data scientists and the operations or production team. It is designed to get rid of waste, reduce errors, automate extensively, and produce richer, more consistent insights with machine learning. It is also a practice that drives a seamless integration between the development cycle and the overall operations process. It can likewise change how an association handles enormous information. Just like DevOps shortens production life cycles by creating better products with each iteration, MLOps drives insights you can trust and put into play more rapidly.

这就产生了MLOps的概念,它遵循与DevOps类似的模式。 换句话说,MLOps是数据科学家与运营或生产团队之间协作的框架。 它旨在消除浪费,减少错误,实现广泛的自动化,并通过机器学习产生更丰富,更一致的见解。 这也是在开发周期和整个操作过程之间实现无缝集成的一种实践。 它同样可以改变协会处理大量信息的方式。 就像DevOps通过在每次迭代中创建更好的产品来缩短生产生命周期一样,MLOps可以驱动您可以信任的见解并更快地发挥作用。

常规CI / CD工作流程 (Conventional CI/CD Workflow)

Continuous Integration/Continuous Deployment (CI/CD) depicts a lot of best practices for application advancement pipelines. They are to a great extent executed by DevOps teams to empower software developers rapidly and dependably introduce updates to production applications. Some of the core benefits of a CI/CD pipeline include:

持续集成/持续部署(CI / CD)描述了许多用于应用程序提升管道的最佳实践。 它们在很大程度上由DevOps团队执行,以使软件开发人员能够快速,可靠地向生产应用程序引入更新。 CI / CD管道的一些核心优势包括:

  • Reliability

    可靠性
  • Reusability

    可重用性
  • Speed

    速度
  • Safety

    安全
  • Version Control

    版本控制
Basic CI/CD Pipeline
Basic workflow C/CD pipelines.
基本工作流C / CD管道。

A quick search of CI/CD tools for traditional web applications results a number of tools that you have likely used of heard of (e.g. Jenkins, CircleCI, etc). At their core, these tools attempt to standardize an established workflow:

快速搜索传统Web应用程序的CI / CD工具会产生许多您可能听说过的工具(例如Jenkins,CircleCI等)。 这些工具的核心是试图标准化已建立的工作流程:

Build➡ Test➡ Deploy.

构建➡测试➡部署。

CI/CD are frequently clubbed together, but in reality they depict two independent (yet related) ideas that are of great significance.

CI / CD经常组合在一起,但实际上,它们描述了两个非常重要的独立(至今相关)的想法。

Continuous Integration (CI) — is basically worried about testing code as it is pushed — i.e., ensures that new application features are automatically tested using unit tests.

持续集成(CI) —基本上担心在推送代码时对其进行测试—即,确保使用单元测试自动测试新的应用程序功能。

Continuous Deployment (CD) — portrays the actual release/delivery of the tested code. For example, a CD system could describe how various feature/release branches are deployed, or even how new features are selectively rolled out to new users.

持续部署(CD) -描述测试代码的实际发布/交付。 例如,CD系统可以描述如何部署各种功能/发行分支,甚至描述如何选择性地将新功能推出给新用户。

用于机器学习的CI / CD (CI/CD for Machine Learning)

CI/CD (Continuous Integration/Continuous Deployment) has for long been a fruitful procedure for most programming applications. The equivalent is possible with Machine Learning applications by offering an automated and seamless training, and continuous deployment of ML models. Utilizing CI/CD for Machine Learning applications makes a genuine start to finish pipeline that closes the feedback loop at all times by maintaining high performing ML models. It can likewise connect science and engineering tasks, causing less friction from data, to modelling, to production and back again.

对于大多数编程应用程序来说,CI / CD(连续集成/连续部署)长期以来一直是富有成果的过程。 通过提供自动化和无缝的培训以及ML模型的连续部署,对于机器学习应用程序而言,这是等效的。 通过将CI / CD用于机器学习应用程序,可以真正地完成从头到尾的流程,并通过保持高性能的ML模型来始终关闭反馈循环。 它同样可以将科学和工程任务联系起来,从而减少数据,建模,生产和生产之间的摩擦。

Image for post
A sample MLOps pipeline.
MLOps管道示例。

There are multiple tool available (e.g. Kubeflow, neptune.ai etc) that can be used to create reproducible workflows. These workflows automate the steps needed to build a ML workflow, which delivers consistency, saves iteration time, and helps in debugging and compliance requirements.

有多种可用的工具(例如Kubeflow,neptune.ai等)可用于创建可重复的工作流程。 这些工作流程可自动执行构建ML工作流程所需的步骤,从而提供一致性,节省迭代时间并有助于调试和合规性要求。

为什么我们需要MLOps? (Why do we need MLOps?)

The significance of MLOps can be acknowledged in the way that being an evolving discipline, most ML models are exploratory and inclined to disappointments. A ML model that works today may carry on eccentrically tomorrow. MLOps bolster the exercises of individuals who create ML models, send and deal with the framework. From arrangement of various pipelines of models to scaling ML applications to guaranteeing ML well-being, MLOps is the answer for safe ML tasks.

可以通过以下方式来认识到MLOps的重要性:作为一门不断发展的学科,大多数ML模型都是探索性的,容易让人失望。 今天使用的ML模型明天可能会偏心进行。 MLOps支持创建ML模型,发送和处理框架的人员的练习。 从安排各种模型流水线到扩展ML应用程序,再到保证ML健康,MLOps都是安全ML任务的答案。

Not only does MLOps make collaboration and integration easier, but it also allows your data scientists to take on more ventures, tackle more issues, and do what they excel at, develop more models. With MLOps, the retraining, testing, and arrangement is automated. The nightly build isn’t just aggregating your new code now, it’s additionally taking the entirety of the new information you accumulated for the day and retraining your models. That is work that data scientists were performing manually.

MLOps不仅使协作和集成变得更加容易,而且还使您的数据科学家能够承担更多的风险,解决更多的问题,并尽其所能,开发更多的模型。 使用MLOps,再培训,测试和安排是自动化的。 每晚构建不仅是现在汇总您的新代码,而且还会吸收您当天累积的全部新信息并重新训练模型。 那是数据科学家手动执行的工作。

Image for post
A complete CI/CD pipeline.
完整的CI / CD管道。

结论 (Conclusion)

Retraining a machine learning model is important when data properties change to overcome discrepancies. To eventually achieve CI/CD in machine learning, you can start by automating some parts of your current process. Ultimately, CI/CD and associated practices contribute to enhanced development. A team that commits to a reproducible pipe-lining tool can build faster, deploy faster, and as a team grow faster.

当数据属性更改以克服差异时,重新训练机器学习模型非常重要。 为了最终在机器学习中实现CI / CD,您可以从当前过程的某些部分自动化开始。 最终,CI / CD和相关实践有助于促进发展。 致力于可重复使用的流水线工具的团队可以更快地构建,更快地部署,并且随着团队的成长而更快。

Present day ML is a rapidly advancing space that gets a wide range of experts from conventional software engineers, to mathematicians, and business leaders. This makes it an exciting field to venture into and also a future proof career option.

当今的ML是一个快速发展的领域,拥有从传统软件工程师到数学家和业务领导者的广泛专家。 这使它成为一个令人兴奋的领域,成为一个冒险的领域,也是一个未来的职业选择。

翻译自: https://medium.com/data-science-community-srm/the-future-of-machine-learning-mlops-b6116b84eb07

机器学习未来

Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐