赞
踩
大家好,我是木易,一个持续关注AI领域的互联网技术产品经理,国内Top2本科,美国Top10 CS研究生,MBA。我坚信AI是普通人变强的“外挂”,所以创建了“AI信息Gap”这个公众号,专注于分享AI全维度知识,包括但不限于AI科普,AI工具测评,AI效率提升,AI行业洞察。关注我,AI之路不迷路,2024我们一起变强。
语言支持:Llama 3支持中文,但并不完美。官方提到预训练数据集中有超过5%的非英语数据,覆盖超过30种语言,但主要语料库是英文的。Meta计划未来优化多语言支持。
使用区域:Llama 3是一个开源模型。Meta AI平台提供了Llama 3的直接使用方式,服务区域包括美国和其他13个国家。
测试结果:在英文提示下,Llama 3表现亮眼。但在中文提示下,表现不稳定,有时会出现幻觉。
先说结论:支持,但又不是完美支持那种。
Meta官方并没有明确说明Llama 3模型所支持的语言类别,但在官方博文中确实也提到了用于预训练的数据集中,有超过5%的数据是非英语数据,包括了30种语言。实测Llama 3是能够理解中文的提示词的,但在中文输出方面,略有不足,毕竟预训练语料库大多是英文的。
To prepare for upcoming multilingual use cases, over 5% of the Llama 3 pretraining dataset consists of high-quality non-English data that covers over 30 languages. However, we do not expect the same level of performance in these languages as in English.
并且,Meta也表示接下来会着手优化多模态、多语言支持、更长的上下文窗口等方面。
Over the coming months, we’ll release multiple models with new capabilities including multimodality, the ability to converse in multiple languages, a much longer context window, and stronger overall capabilities.
在上一代的Llama 2相关论文里,Meta就明确表示了和上述Llama 3描述类似的结论。Llama 2的训练数据主要为英语,但也包含了其他27种语言。从下面的表格里可以看过,中文语料(zh)占比0.13%。
冷知识:zh
是ISO 639-1标准中对中文的两字母缩写代码。在这个标准中,大多数语言都有各自的两字母代码,zh
实际上是 "Zhōngwén" 的缩写。
Language Identification. While our pretraining data is mostly English, it also includes text from a small number of other languages. Table 10 shows the distribution of languages in our corpus, subsetted to those found in more than 0.005% of the documents. Our analysis uses the fastText (Bojanowski et al., 2016) language identification tool and a threshold of 0.5 for the language detection. A training corpus with a majority in English means that the model may not be suitable for use in other languages.
首先,Llama 3是一个开源模型,这意味着所有的个人用户/开发者/研究人员都可以直接下载Llama 3所有的代码及相关资料。但运行这样的LLM模型所需要的硬件资源要求很高,对于个人用户来说是一个挑战。为此,Meta也将Llama 3接入了Meta AI,地址是https://www.meta.ai/
,用户可以直接使用Llama 3进行对话,甚至无需登录。同时,我们也可以在Facebook, Instagram, WhatsApp, Messenger这些应用里直接使用Meta AI。
Meta AI服务区域包括以下国家。
We’re rolling out Meta AI in English in more than a dozen countries outside of the US. Now, people will have access to Meta AI in Australia, Canada, Ghana, Jamaica, Malawi, New Zealand, Nigeria, Pakistan, Singapore, South Africa, Uganda, Zambia and Zimbabwe — and we’re just getting started.
Meta AI提供服务的区域:美国、澳大利亚、加拿大、加纳、牙买加、马拉维、新西兰、尼日利亚、巴基斯坦、新加坡、南非、乌干达、赞比亚和津巴布韦
首先来测试一道经典题目,之前测试中,GPT-4、Claude 3、讯飞星火和通义千问均通过了这个测试,而谷歌Gemini Pro,Gemini Ultra,文心一言以及Kimi Chat回答错误,未通过测试。详情可以看我之前的这篇文章:Claude 3有点东西!这个看似简单的问题只有GPT-4和Claude 3回答正确!通义千问和讯飞星火表现同样亮眼!。
英文题目:If there are 1000 books in a room and I read 2, how many books are still in the room? Answer succinctly
中文题目:如果一个房间里有1000本书,我读了2本,房间里还有多少本书?请简洁回答。
在英文提示词下,回答完全正确,逻辑也清晰;但中文提示词下明显出现了幻觉,并且Meta AI会提醒:I am still improving my command of non-English languages, and I may make errors while attempting them. I will be most useful to you if I can assist you in English.
英文题目:You have three switches on the ground floor, each of which controls one of three light bulbs upstairs, but you don't know which switch corresponds to which bulb. You can only go upstairs once. How should you operate the switches and then go upstairs one time to determine which switch corresponds to each bulb?
中文题目:你有三个开关在楼下,它们各自控制楼上的三个灯泡,但你不知道哪个开关对应哪个灯泡。你只能上楼一次。你该如何操作开关并上楼一次,以确定每个开关对应哪个灯泡?
这是一个经典的逻辑推理题,对于LLM模型来说有一定的难度,因为不仅需要处理纯逻辑运算,还需要关联到物理世界中的常识——灯泡使用后会发热这一事实。要确定楼下三个开关各自控制的楼上三个灯泡,可以采用以下方法:首先,开启第一个开关一段时间,使相应的灯泡变热后关闭;然后立即打开第二个开关,并上楼检查。此时,温热的灯泡表示是由第一个开关控制的,正在亮着的灯泡由第二个开关控制,而冷的灯泡则是由第三个开关控制。这样一次上楼即可通过触感辨认每个开关对应的灯泡。
A classic puzzle! Here's the solution:
Step 1: Operate the switches
Turn switch 1 to ON for 5 minutes.
Turn switch 1 to OFF.
Turn switch 2 to ON.
Turn switch 3 to ON for 5 seconds.
Turn switch 3 to OFF.
Step 2: Go upstairs
Observe the bulbs:
One bulb will be HOT (but off). This corresponds to switch 1.
One bulb will be ON. This corresponds to switch 2.
One bulb will be OFF, but WARM. This corresponds to switch 3.
By following these steps, you'll be able to determine which switch corresponds to each bulb in just one trip upstairs!
A classic puzzle! Here's the solution:
Step 1: Initial Setup
Turn switch 1 to ON for 5 minutes.
Turn switch 1 to OFF.
Turn switch 2 to ON.
Step 2: Go Upstairs
Observe the bulbs:
One will be HOT (recently off).
One will be ON.
One will be OFF (and cold).
Step 3: Deduce the Switch-Bulb Mapping
The HOT bulb corresponds to switch 1.
The ON bulb corresponds to switch 2.
The OFF (and cold) bulb corresponds to switch 3.
Explanation
By initially turning switch 1 to ON for 5 minutes, we ensure that the corresponding bulb is hot even after it's turned off.
By then turning switch 2 to ON, we create a distinction between the two remaining bulbs.
When we go upstairs, we can identify the HOT bulb as corresponding to switch 1, the ON bulb as corresponding to switch 2, and the remaining OFF (and cold) bulb as corresponding to switch 3.
英文提示和中文提示下,Llama 3均通过测试。但通过多次实验,我发现在中文提示下,Llama 3的表现比较不稳定,有时会出现幻觉,有时又能够回答正确。
都读到这里了,点个赞鼓励一下吧,小手一赞,年薪百万!
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。