当前位置:   article > 正文

owin 怎么部署在云中_从几乎未能在云中部署简单的机器学习模型中学到的教训...

train on the cloud

owin 怎么部署在云中

Through this article we are going to present you with the main mistakes we made during — what was supposed to be — a simple project which aimed to deploy an age guessing Neural Network model in the Cloud.

通过本文,我们将向您介绍我们在一个简单的项目中犯的主要错误,该错误原本是一个简单的项目,旨在在云中部署年龄猜测神经网络模型。

Thankfully, we were able to release a good enough version of our project which can be accessed here. Its source code can be found in the dedicated GitLab link.

幸运的是,我们能够发布项目的足够好的版本,可以在此处访问。 可以在专用的GitLab链接中找到其源代码。

项目背景,目标和方法 (The project context, objectives and approach)

As a cybersecurity consultant and a junior software engineer working for large corporations, we have always had an interest in cloud and seeing how this technology can be leveraged for developing simple AI applications.

作为在大型公司工作的网络安全顾问和初级软件工程师,我们一直对云感兴趣,并看到如何利用此技术来开发简单的AI应用程序。

With Covid-19 and lockdowns, we started having many ideas which would combine the skills we learned during our short careers. However, we decided to start with a very simple — and unoriginal — idea before jumping into a more complex project: To deploy a simple and production-ready web application that would guess people’s age thanks to a trained neural network.

借助Covid-19和锁定功能,我们开始有了许多想法,这些想法将结合我们在短暂职业生涯中学到的技能。 但是,我们决定从一个非常简单且原始的想法开始,然后跳入一个更复杂的项目:部署一个简单且可投入生产的Web应用程序,该应用程序将通过训练有素的神经网络来猜测人们的年龄。

We focused on the creation process and learn the “best way” of doing things. We defined the following constraints:

我们专注于创作过程,并学习做事的“最佳方式”。 我们定义了以下约束:

  • Follow “state of the art” practices to train our model and deploy our product;

    遵循“最新技术”实践来训练我们的模型并部署我们的产品;
  • Have a running cost of 0€ per month;

    每月的运营费用为0欧元;
  • Have a product which could scale “infinitely” thanks to serverless services (in this context, we knew that the 0€/month running costs would only be possible with little to no traffic);

    有了一款可以借助无服务器服务“无限扩展”的产品(在这种情况下,我们知道0欧元/月的运行成本只有在流量很小或没有流量的情况下才有可能);
  • Build a user-friendly “final product” allowing people to have fun with it; and

    制作易于使用的“最终产品”,使人们可以从中获得乐趣; 和
  • Enjoy the building process.

    享受建造过程。

Based on our project context and objectives, our approach was to follow the below steps:

根据我们的项目背景和目标,我们的方法是遵循以下步骤:

  1. Obtain a large data set of images with their associated ages

    获取具有相关年龄的大量图像数据集
  2. Select a state of the art Convolutional Neural Network (CNN) to train (on the cloud) a functional age predictor

    选择最先进的卷积神经网络(CNN)以训练(在云上)功能性年龄预测器
  3. Migrate this trained Neural Network model and serve it freely in AWS SageMaker

    迁移经过训练的神经网络模型并在AWS SageMaker中免费提供
  4. Create an Angular Front-End application and serve it via an S3 bucket

    创建一个Angular前端应用程序并通过S3存储桶提供服务
  5. Create an API with API Gateway and use lambda functions to upload the users pictures to SageMaker and retrieve a result

    使用API​​网关创建一个API,并使用lambda函数将用户图片上传到SageMaker并检索结果
  6. Voilà! A 0€-running-cost fast and scalable age recognition service

    瞧! 0欧元运行成本的快速可扩展年龄识别服务

This project looked quite easy to build and deploy, doesn’t it?

这个项目看起来很容易构建和部署,不是吗?

四个主要错误 (The four major mistakes)

When looking at this approach, there are some key words for which we underestimated the underlying difficulty and didn’t ask ourselves some relevant questions such as:

在研究这种方法时,有些关键词被我们低估了潜在的困难,并且没有问自己一些相关的问题,例如:

  • Step 1 — “Obtain”: What is the quality of the dataset? Is the data equally distributed and well balanced? Has the dataset been tested by other users?

    步骤1 —“获取”:数据集的质量是什么? 数据是否分布均匀且平衡良好? 数据集是否已由其他用户测试过?

  • Step 2 — “Select”: Is it that simple to select a Neural Network model? How do we train it? How do we know that we are following the right path?

    步骤2 —“选择”:选择神经网络模型是否这么简单? 我们如何训练它? 我们怎么知道我们在走正确的道路?

  • Step 3 — “Migrate”: How easy is it to port a model to SageMaker? Are training scripts similar to the ones used locally? Is it free to host a trained model?

    步骤3 —“迁移”:将模型移植到SageMaker有多容易? 培训脚本是否与本地使用的脚本相似? 托管训练有素的模型是否免费?

When thinking about how to write this article, we realized that most of the mistakes me made could be grouped into four major mistakes. If you are new to AI and/or Cloud, we hope that our very short experience in the field can help you avoid making the same mistakes.

在考虑如何撰写本文时,我们意识到我犯的大多数错误可以归为四个主要错误。 如果您不熟悉AI和/或云,我们希望我们在该领域的短暂经验可以帮助您避免犯同样的错误。

I.根据流行语指导您的技术选择 (I. Guiding your tech choices based on buzzwords)

Over the past couple of years, we have been hearing a lot about the wonderful world of fully managed PaaS (Platform as a Service) and SaaS (Software as a Service) products. For our project, we wanted to use a service that would allow us to deploy our trained model without needing to worry about infrastructure or costs.

在过去的几年中,我们已经听到了很多有关完全托管的PaaS(平台即服务)和SaaS(软件即服务)产品的奇妙世界的信息。 对于我们的项目,我们希望使用一种服务,该服务将使我们能够部署训练有素的模型,而无需担心基础架构或成本。

Thanks to a quick google search, we understood that the most fashionable and cheapest (via free tier advantages) service at the moment was called AWS SageMaker. Our decision to pick this service was based on the very first introduction sentence on the product’s page: “Amazon SageMaker is a fully managed service that provides every developer (…) with the ability to build, train, and deploy machine learning (ML) models quickly”.

借助Google的快速搜索,我们了解到,目前最时尚,最便宜的服务(通过免费套餐优势)称为AWS SageMaker。 我们决定选择此服务的决定是基于产品页面上的第一句话:“ Amazon SageMaker是一项完全托管的服务,为每个开发人员(...)提供构建,训练和部署机器学习(ML)模型的能力。很快”。

Image for post
SageMaker main page SageMaker主页

As mentioned previously, our goal was to “quickly” develop a TensorFlow Python project and migrate this project to SageMaker in order to train our model once it was functional. We believed that only minor refactoring would be required to migrate our source code to SageMake.

如前所述,我们的目标是“快速”开发一个TensorFlow Python项目并将该项目迁移到SageMaker,以便在模型运行后对其进行训练。 我们认为将源代码迁移到SageMake只需进行少量重构即可。

However, we realized after developing our training scripts that the migration was a non-trivial process which would require major refactoring of our code. In fact, we figured out that our preprocessing would simply not work following SageMaker’s training workflow. In order to train our model, we would have to completely rework our training scripts.

但是,在开发了培训脚本之后,我们意识到迁移是一个不平凡的过程,需要对代码进行重大重构。 实际上,我们发现,按照SageMaker的培训工作流程,预处理根本无法进行。 为了训练我们的模型,我们必须完全重做我们的训练脚本。

Our main (and obvious) mistake was jumping right into code development before validating our understanding and assumptions about how SageMaker works. On top of that, we did not take into consideration SageMaker’s pricing model. Although we knew that SageMaker was free for the first two months (which was supposed to be enough to train our model), we discovered that SageMaker simply relies on EC2 instances. We understood that a Jupyter notebook would have to be constantly running in order to be used in conjunction with API gateway. This would break two of our main requirements:

在验证我们对SageMaker工作方式的理解和假设之前,我们的主要(也是显而易见的)错误是直接进入代码开发。 最重要的是,我们没有考虑SageMaker的定价模型。 尽管我们知道SageMaker在最初的两个月内是免费的(这本来足以训练我们的模型),但我们发现SageMaker仅依赖于EC2实例。 我们知道,Jupyter笔记本必须经常运行才能与API网关结合使用。 这将打破我们的两个主要要求:

  • Having no running costs; and

    没有运行费用; 和
  • Being serverless and scalable.

    无服务器且可扩展。

Very quickly, we realized that this could have been avoided by simply reading SageMaker’s documentation and pricing model. “Duh!” you might be thinking. However, we were blindly convinced that such a popular service would provide us with everything we needed to succeed.

很快,我们意识到只需阅读SageMaker的文档和定价模型就可以避免这种情况。 “ Du!” 你可能在想。 但是,我们盲目地相信,如此受欢迎的服务将为我们提供成功所需的一切。

TL;DR: If you do not know the tech, do not assume that it will meet your way of working based on its popularity. In our context, we should have simply adapted our training scripts to SageMaker requirements rather than hoping that the tool could import any training scripts.

TL; DR:如果您不了解这项技术,请不要以其受欢迎程度为前提,认为它会满足您的工作方式。 在我们的上下文中,我们应该简单地根据SageMaker要求调整培训脚本,而不是希望该工具可以导入任何培训脚本。

二。 重新发明轮子 (II. Reinventing the wheel)

This second major mistake is linked to the first one as it is in fact its cause. In this section we will talk about the classic mistake of trying to build things that have already been built by experts in the past. During our project, we realized that we spent too much time trying to re-engineer a well-known and efficient solution: Neural Network Models that provide age recognition.

第二个主要错误与第一个错误有关,因为这实际上是其原因。 在本节中,我们将讨论尝试构建过去由专家构建的事物的经典错误。 在我们的项目中,我们意识到我们花了太多时间尝试重新设计一个众所周知的有效解决方案:提供年龄识别的神经网络模型。

Image for post
Image by Alan O’Rourke. Source: Flickr
图片由 艾伦·奥罗克 ( Alan O'Rourke)提供 。 资料来源: Flickr

The open source community is awesome, and most of the time, it is very likely that people will have already built awesome tools that do what you’re looking for.

开源社区很棒,而且在大多数情况下,人们很可能已经构建了可以满足您需求的出色工具。

Even if you feel — like us — very excited about the project and you want to build everything yourself, it might be a bad idea to blindly start coding. As you probably know, it is highly recommendable do some research on what has been done about the topic you want to tackle.

即使您像我们一样对项目感到非常兴奋,并且您想自己构建所有内容,盲目地开始编码也可能不是一个好主意。 如您所知,强烈建议您对要解决的主题进行一些研究。

This might sound obvious at first, but you will understand the importance of these words when you experiment and suffer the consequences of not doing it. In our case, we started working on our neural network immediately since we believed we could train and deploy such model without much effort.

乍一看这听起来似乎很明显,但是当您进行实验并遭受不这样做的后果时,您将理解这些词的重要性。 在我们的案例中,我们立即开始研究神经网络,因为我们相信我们可以轻松地训练和部署这种模型。

In fact, even though we were not convinced by our training scripts or the quality of the data, we tried to give it a go on AWS via GPU-optimized instances. The result? After spending 10€ and countless tweaks, our top result was a model with a 20% accuracy on the training dataset. Although we learned a lot about the process, this turned out to be a disaster.

实际上,即使我们对培训脚本或数据质量不满意,我们仍尝试通过GPU优化的实例在AWS上进行尝试。 结果? 在花费了10欧元并进行了无数次调整之后,我们的最高成绩是在训练数据集上获得了20%准确度的模型。 尽管我们了解了很多有关该过程的信息,但事实证明这是一场灾难。

You might be thinking: “If you don’t want to learn how to train a model, why don’t you directly use a public API such as AWS Rekognition or Google Cloud Vision?”. The answer to that question is fairly simple: The main objective of our project was to learn how to deploy an AI model to the cloud and make it scalable and production-ready.

您可能会想:“如果您不想学习如何训练模型,为什么不直接使用AWS Rekognition或Google Cloud Vision之类的公共API?”。 这个问题的答案非常简单:我们项目的主要目标是学习如何 AI模型部署到云中并使之具有可扩展性和生产就绪性。

After failing to build the local project (the causes of which are explained in the next section), we faced the truth and started looking for an alternative. We started doing some research, and we found what we were looking for with: DEX: Deep EXpectation of apparent age from a single image.

在无法构建本地项目(其原因将在下一节中进行说明)之后,我们面对了事实,并开始寻找替代方案。 我们开始进行一些研究,找到了我们想要的东西: DEX:从一张图像中对表观年龄的深层期望

Rasmus Rothe, Radu Timofte and Luc Van Gool published a dataset with more than 500k images with age and gender labels as well as their respective trained model. And the cherry on the cake was that the paper was the winner of the NVIDIA apparent age competition in 2015. We went from trying to build everything on our own to having one of the best ready-to-use age recognition models 声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop博客】

推荐阅读
相关标签