当前位置:   article > 正文

大语言模型生成式AI学习笔记——1. 1.3 大语言模型及生成式AI项目生命周期简介——​​​​​​​生成式AI及LLM,LLM用例及任务

大语言模型生成式AI学习笔记——1. 1.3 大语言模型及生成式AI项目生命周期简介——​​​​​​​生成式AI及LLM,LLM用例及任务

Generative AI & LLMs

Okay, let's get started, in this lesson, we're going to set the scene. We'll talk about large language models, their use cases, how the models work, prompt engineering, how to make creative text outputs, and outline a project lifecycle for generative AI projects. Given your interest in this course, it's probably safe to say that you've had a chance to try out a generative AI tool or would like to. Whether it be a chat bot, generating images from text, or using a plugin to help you develop code, what you see in these tools is a machine that is capable of creating content that mimics or approximates human ability. Generative AI is a subset of traditional machine learning. And the machine learning models that underpin generative AI have learned these abilities by finding statistical patterns in massive datasets of content that was originally generated by humans. Large language models have been trained on trillions of words over many weeks and months, and with large amounts of compute power. These foundation models, as we call them, with billions of parameters, exhibit emergent properties beyond language alone, and researchers are unlocking their ability to break down complex tasks, reason, and problem solve.

Here are a collection of foundation models, sometimes called base models, and their relative size in terms of their parameters. You'll cover these parameters in a little more detail later on, but for now, think of them as the model's memory. And the more parameters a model has, the more memory, and as it turns out, the more sophisticated the tasks it can perform. Throughout this course, we'll represent LLMs with these purple circles, and in the labs, you'll make use of a specific open source model, flan-T5, to carry out language tasks. By either using these models as they are or by applying fine tuning techniques to adapt them to your specific use case, you can rapidly build customized solutions without the need to train a new model from scratch.

Now, while generative AI models are being created for multiple modalities, including images, video, audio, and speech, in this course you'll focus on large language models and their uses in natural language generation. You will see how they are built and trained, how you can interact with them via text known as prompts. And how to fine tune models for your use case and data, and how you can deploy them with applications to solve your business and social tasks. The way you interact with language models is quite different than other machine learning and programming paradigms. In those cases, you write computer code with formalized syntax to interact with libraries and APIs. In contrast, large language models are able to take natural language or human written instructions and perform tasks much as a human would. The text that you pass to an LLM is known as a prompt. The space or memory that is available to the prompt is called the context window, and this is typically large enough for a few thousand words, but differs from model to model. In this example, you ask the model to determine where Ganymede is located in the solar system. The prompt is passed to the model, the model then predicts the next words, and because your prompt contained a question, this model generates an answer. The output of the model is called a completion, and the act of using the model to generate text is known as inference. The completion is comprised of the text contained in the original prompt, followed by the generated text. You can see that this model did a good job of answering your question. It correctly identifies that Ganymede is a moon of Jupiter and generates a reasonable answer to your question stating that the moon is located within Jupiter's orbit. You'll see lots of examples of prompts and completions in this style throughout the course.

生成式AI及LLM

要开始这节课了,我们将设定场景。我们会讨论大型语言模型、它们的用例、模型如何工作、提示工程、如何生成创意文本输出,并概述生成式AI项目的项目生命周期。鉴于你对本课程的兴趣,可以肯定地说,你有机会尝试过一个生成式AI工具,或者想要尝试。无论是聊天机器人、从文本生成图像,还是使用插件帮助你开发代码,你在这些工具中看到的是一台能够创造模仿或接近人类能力的内容的机器。生成式AI是传统机器学习的一个子集。支撑生成式AI的机器学习模型通过在最初由人类生成的大量数据集中找到统计模式来学习这些能力。大型语言模型已经在数万亿个单词上训练了许多周和月,并且需要大量的计算能力。我们称之为基础模型的这些模型,拥有数十亿个参数,展现出超越语言本身的突发属性,研究人员正在解锁它们分解复杂任务、推理和解决问题的能力。

这里收集了一些基础模型,有时被称为基本模型,以及它们的参数相对大小。稍后你会更详细地了解这些参数,但现在,将它们视为模型的记忆。模型拥有的参数越多,记忆就越多,结果显示,它能执行的任务就越复杂。在本课程中,我们将用这些紫色圆圈表示LLMs,在实验室中,你将使用一个特定的开源模型flan-T5来执行语言任务。通过直接使用这些模型或应用微调技术使它们适应你的特定用例,你可以快速构建定制解决方案,无需从头开始训练新模型。

现在,虽然生成式AI模型正在为多种模态创建,包括图像、视频、音频和语音,但在这门课程中,你将专注于大型语言模型及其在自然语言生成中的用途。你将看到它们是如何构建和训练的,如何通过称为提示的文本与它们交互。以及如何为你的用例和数据微调模型,以及如何将它们与应用程序部署以解决你的商业和社会任务。与语言模型的交互方式与其他机器学习和编程范式有很大不同。在那些情况下,你用形式化的语法编写计算机代码与库和API进行交互。相比之下,大型语言模型能够接受自然语言或人类书写的指令,并像人类一样执行任务。你传递给LLM的文本被称为提示。提示可用的空间或内存称为上下文窗口,这通常足够容纳几千个单词,但因模型而异。在这个例子中,你要求模型确定太阳系中伽利略的位置。提示被传递给模型,模型然后预测下一个单词,因为你的提示包含了一个问题,这个模型生成了一个答案。模型的输出被称为完成,使用模型生成文本的行为被称为推理。完成包括原始提示中的文本,后跟生成的文本。你可以看到这个模型很好地回答了你的问题。它正确识别出伽利略是木星的卫星,并为你的问题生成了一个合理的答案,说明这颗卫星位于木星的轨道内。在本课程中,你会看到很多这种风格的提示和完成的例子。

LLM use cases and tasks

You could be forgiven for thinking that LLMs and generative AI are focused on chats tasks. After all, chatbots are highly visible and getting a lot of attention.

Next word prediction is the base concept behind a number of different capabilities, starting with a basic chatbot. However, you can use this conceptually simple technique for a variety of other tasks within text generation. For example, you can ask a model to write an essay based on a prompt, to summarize conversations where you provide the dialogue as part of your prompt and the model uses this data along with its understanding of natural language to generate a summary. You can use models for a variety of translation tasks from traditional translation between two different languages, such as French and German, or English and Spanish. Or to translate natural language to machine code. For example, you could ask a model to write some Python code that will return the mean of every column in a DataFrame and the model will generate code that you can pass to an interpreter. You can use LLMs to carry out smaller, focused tasks like information retrieval. In this example, you ask the model to identify all of the people and places identified in a news article. This is known as named entity recognition, a word classification. The understanding of knowledge encoded in the model's parameters allows it to correctly carry out this task and return the requested information to you. Finally, an area of active development is augmenting LLMs by connecting them to external data sources or using them to invoke external APIs. You can use this ability to provide the model with information it doesn't know from its pre-training and to enable your model to power interactions with the real-world. You'll learn much more about how to do this in week 3 of the course.

Developers have discovered that as the scale of foundation models grows from hundreds of millions of parameters to billions, even hundreds of billions, the subjective understanding of language that a model possesses also increases. This language understanding stored within the parameters of the model is what processes, reasons, and ultimately solves the tasks you give it, but it's also true that smaller models can be fine tuned to perform well on specific focused tasks. You'll learn more about how to do this in week 2 of the course. The rapid increase in capability that LLMs have exhibited in the past few years is largely due to the architecture that powers them. Let's move on to the next video to take a closer look.

LLM用例及任务

你可以原谅自己有这样的想法:LLMs(大型语言模型)和生成式AI主要关注聊天任务。毕竟,聊天机器人非常显眼并且受到了很多关注。下一个单词预测是许多不同能力背后的基本概念,从基本的聊天机器人开始。然而,你可以将这个概念上简单的技术用于文本生成中的其他多种任务。例如,你可以要求模型根据提示写一篇论文,总结对话,其中你提供对话作为提示的一部分,模型利用这些数据及其对自然语言的理解来生成摘要。你可以使用模型进行各种翻译任务,从两种不同语言之间的传统翻译,如法语和德语,或英语和西班牙语。或者将自然语言翻译成机器代码。例如,你可以要求模型编写一些Python代码,这些代码将返回DataFrame中每列的平均值,模型将生成你可以传递给解释器的代码。你可以使用LLMs来执行更小、更专注的任务,如信息检索。在这个例子中,你要求模型识别新闻文章中提到的所有人名和地名。这被称为命名实体识别,一种词分类。模型参数中编码的知识理解使其能够正确执行此任务并将请求的信息返回给你。最后,一个活跃的研究领域是通过将LLMs连接到外部数据源或使用它们调用外部API来增强LLMs。你可以利用这种能力为模型提供它在预训练中不知道的信息,并使你的模型能够与现实世界进行交互。你将在课程的第3周了解更多关于如何做到这一点的信息。

开发者们发现,随着基础模型的规模从数亿个参数增长到数十亿甚至数千亿个参数,模型所具有的主观语言理解也在增加。存储在模型参数中的语言理解是处理、推理并最终解决你给出的任务的过程,但同样真实的是,较小的模型也可以被微调以在特定的专注任务上表现良好。你将在课程的第2周了解更多关于如何做到这一点的信息。过去几年中,LLMs展现出的能力的迅速增长在很大程度上归功于支撑它们的架构。让我们继续下一个视频来仔细看看。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家小花儿/article/detail/532668
推荐阅读
相关标签
  

闽ICP备14008679号