openai-gpt
Disclaimer: My opinions are informed by my experience maintaining Cortex, an open source platform for machine learning engineering.
免责声明:我的看法是基于我维护 机器学习工程的开源平台 Cortex的 经验而 得出 的。
If you frequent any part of the tech internet, you’ve come across GPT-3, OpenAI’s new state of the art language model. While hype cycles forming around new technology isn’t new—GPT-3’s predecessor, GPT-2, generated quite a few headlines as well—GPT-3 is in a league of its own.
如果您经常光顾技术互联网的任何部分,就会遇到OpenAI的最新语言模型GPT-3。 尽管围绕新技术形成的炒作周期并不新鲜-GPT-3的前身GPT-2也引起了许多关注,但GPT-3却是一个联盟。
Looking at Hacker News for the last couple months, there have been dozens of hugely popular posts, all about GPT-3:
在最近几个月的Hacker News中,有数十篇非常受欢迎的帖子,都是关于GPT-3的:
If you’re on Twitter, you’ve no doubt seen projects built on GPT-3 going viral, like this Apple engineer who used GPT-3 to write Javascript using a specific 3D rendering library:
如果您在Twitter上,那么您无疑会看到基于GPT-3构建的项目正在蓬勃发展,例如这位苹果工程师使用GPT-3使用特定的3D渲染库编写Javascript:
And of course, there have been plenty of “Is this the beginning of SkyNet?” articles written:
当然,有很多“这是天网的开始吗?” 撰写文章:
The excitement over GPT-3 is just a piece of an bigger trend. Every month, we see more and more new initiatives release, all built on machine learning.
对GPT-3的兴奋只是一大趋势。 每个月,我们都会看到越来越多的新计划发布,它们都是基于机器学习的。
To understand why this is happening, and what the trend’s broader implications are, GPT-3 serves as a useful study.
要了解这种情况的发生原因以及趋势的更广泛含义,GPT-3是一项有用的研究。
GPT-3有什么特别之处? (What’s so special about GPT-3?)
The obvious take here is that GPT-3 is simply more powerful than any other language model, and that the increase in production machine learning lately can be chalked up to similar improvements across the field.
显而易见,GPT-3比其他任何语言模型都强大,并且最近生产机器学习的增加可以归结为该领域的类似改进。
Undoubtedly, yes. This is a factor. But, and this is crucial, GPT-3 isn’t so popular just because it’s powerful. GPT-3 is ubiquitous because it is usable.
毫无疑问,是的。 这是一个因素。 但是,这很关键,GPT-3并不是因为它强大而流行。 GPT-3因其可用而无处不在。
By “usable,” I mean that anyone can build with it, and it’s easy. For context, after the full GPT-2 was released, most of the popular projects built on it were built by machine learning specialists, and required substantial effort:
“可用”是指任何人都可以使用它进行构建,而且很容易。 就上下文而言,在完整的GPT-2发布之后,基于它的大多数流行项目都是由机器学习专家构建的,并且需要大量的精力:
Comparatively, it has only been a couple of months since GPT-3's announcement, and we’re already seeing dozens of viral projects built on it, often of the “I got bored and built this in an afternoon” variety:
相比较而言,距GPT-3发布仅两个月,我们已经看到了数十个病毒式项目,这些项目通常是“我无聊并在下午建造的”这类项目:
Anyone with some basic engineering chops can now build an application leveraging state of the art machine learning, and this increase in the usability of models—not just their raw power—is an industry-wide phenomenon.
现在,任何具有基本工程知识的人都可以利用最先进的机器学习来构建应用程序,并且模型可用性 (不仅是原始能力)的这种增加是整个行业的现象。
为什么用机器学习突然变得如此容易 (Why it’s suddenly so easy to build with machine learning)
One of the biggest blockers to using machine learning in production has been infrastructure. We’ve had models capable of doing incredible things for a long time, but actually building with them has remained a major challenge.
基础设施是在生产中使用机器学习的最大障碍之一。 我们拥有的模型能够长时间执行令人难以置信的工作,但实际上如何构建它们仍然是一个重大挑战。
For example, consider GPT-2. How would you build a GPT-2 application?
例如,考虑使用GPT-2。 您将如何构建GPT-2应用程序?
Intuitively, the model is more or less an input-output machine, and the most logical thing to do would be to treat it as some sort of microservice, a predict()
function your application could call. Pass in some text and receive GPT-2 generated text in return, just like any other API.
直观地讲,该模型或多或少是一台输入/输出机器,最合乎逻辑的事情是将其视为某种微服务,即应用程序可以调用的predict()
函数。 与其他任何API一样,传递一些文本并接收GPT-2生成的文本作为回报。
This is the main way of deploying GPT-2 (what is known as realtime inference), and it comes with some serious challenges:
这是部署GPT-2(称为实时推断)的主要方式,并且面临一些严峻的挑战:
GPT-2 is massive. The fully trained model is roughly 6 GB. Hosting a GPT-2 microservice requires a lot of disk space.
GPT-2非常庞大 。 经过全面训练的模型大约为6 GB。 托管GPT-2微服务需要大量磁盘空间。
GPT-2 is compute hungry. Without at least one GPU, you will not be able to generate predictions with anywhere near acceptable latency.
GPT-2非常饿 。 如果没有至少一个GPU,您将无法在接近可接受延迟的任何位置生成预测。
GPT-2 is expensive. Given the above, you need to deploy GPT-2 to a cluster provisioned with large GPU instances—very expensive at scale.
GPT-2价格昂贵。 鉴于上述情况,您需要将GPT-2部署到配备了大型GPU实例的集群上,这在规模上非常昂贵。
And this is just for the vanilla, pretrained GPT-2 model. If you want to fine tune GPT-2 for other tasks, that too will be its own technical challenge.
这仅适用于经过预训练的原始GPT-2模型。 如果您想对GPT-2进行微调以完成其他任务,那也将是其自身的技术挑战。
This is why machine learning has been so unusable. Using it in production required you not only to be versed in machine learning, but also DevOps and backend development. This describes very few people.
这就是为什么机器学习如此无法使用的原因。 在生产中使用它不仅需要精通机器学习,还需要DevOps和后端开发。 这说明很少有人。
Over the last several years, this has changed. There has been an emphasis in the community to improve infrastructure, and as a result, it’s gotten much easier to actually use models. Now, you can take a new model, write your API, and hit deploy
—no DevOps needed.
在过去的几年中,这种情况发生了变化。 社区一直在强调改善基础结构,因此,实际使用模型变得更加容易。 现在,您可以采用新模型,编写API并进行deploy
-无需DevOps。
GPT-3 is an extreme example of this trend. The model, which is almost certainly too large for most teams to host, was actually released as an API.
GPT-3是这种趋势的极端例子。 该模型几乎可以肯定对于大多数团队来说太大了,实际上是作为API发布的。
While this move rankled many, it had a secondary effect. All of a sudden, using the most powerful language model in the world was easier than sending a text message with Twilio or setting up payments with Stripe.
尽管此举激怒了许多人,但产生了辅助作用。 突然之间,使用世界上最强大的语言模型比通过Twilio发送短信或通过Stripe设置付款要容易得多。
In other words, you could call GPT-3 the most complex language model in history, but you could also call it just another API.
换句话说,您可以将GPT-3称为历史上最复杂的语言模型,但也可以将其称为另一个API 。
The number of people who can query an API, as it turns out, is orders of magnitude higher than the number of people that can deploy GPT-2 to production, hence the huge number of GPT-3 projects.
事实证明,可以查询API的人数比可以将GPT-2部署到生产环境的人数高出几个数量级,因此存在大量的GPT-3项目。
机器学习工程现已成为主流 (Machine learning engineering is mainstream now)
GPT-3’s hype train is a convergence of things. It does have unprecedented accuracy, but it is also incredibly usable, and was released at a time when machine learning engineering has matured as an ecosystem and discipline.
GPT-3的炒作是事物的融合。 它确实具有史无前例的准确性,但也非常有用,并且是在机器学习工程作为一种生态系统和学科成熟时发布的。
For context, machine learning engineering is a field focused on building applications out of models. “How can I train a model to most accurately generate text?” is an ML research question. “How can I use GPT-2 to write folk music?” is a machine learning engineering question.
就上下文而言,机器学习工程是一个专注于用模型构建应用程序的领域。 “如何训练模型以最准确地生成文本?” 是一个机器学习研究问题。 “如何使用GPT-2 编写民间音乐 ?” 是一个机器学习工程问题。
Because the machine learning engineering community is growing rapidly, companies are releasing new models like web frameworks, hoping to attract engineers to build with them. A consideration, therefore, has to be usability—they want to release not just the most powerful, but the most used model.
由于机器学习工程界正在Swift发展,因此公司正在发布诸如Web框架之类的新模型,希望吸引工程师与之一起构建。 因此,必须考虑可用性-他们不仅要发布功能最强大的模型,而且要发布使用最多的模型。
Obviously, the proliferation of machine learning has many implications, but for engineers, there are two big conclusions to draw from this GPT-3 situation:
显然,机器学习的普及具有很多含义,但是对于工程师来说,从GPT-3的情况可以得出两个大结论:
- It is easier than ever for you to actually build with machine learning. 使用机器学习进行实际构建比以往任何时候都容易。
- It is unlikely that in the near future you will be working on a piece of software that doesn’t not incorporate machine learning in some way. 在不久的将来,您不太可能会开发一款不会以某种方式并入机器学习的软件。
Machine learning is becoming a standard part of the software stack, and that trend is only accelerating. If you’re not already, it’s time to get familiar with production machine learning.
机器学习正在成为软件堆栈的标准部分,而且这种趋势还在加速发展。 如果您还不是,请该熟悉生产机器学习了。
翻译自: https://towardsdatascience.com/why-are-you-seeing-gpt-3-everywhere-f156a71b77b0
openai-gpt