赞
踩
在本文中,我们将介绍如何使用Anthropic多模态LLM进行图像理解和推理。Anthropic最近发布了其最新的多模态模型:Claude 3 Opus和Claude 3 Sonnet。我们将展示如何使用这些模型进行图像推理操作,并提供一些相关的代码示例。
在开始之前,我们需要安装一些必要的Python库:
!pip install llama-index-multi-modal-llms-anthropic
!pip install llama-index-vector-stores-qdrant
!pip install matplotlib
首先,我们将展示如何使用Anthropic的API来理解本地目录中的图像。
import os from PIL import Image import matplotlib.pyplot as plt from llama_index.core import SimpleDirectoryReader from llama_index.multi_modal_llms.anthropic import AnthropicMultiModal # 设置API密钥 os.environ["ANTHROPIC_API_KEY"] = "" # 在此处填入你的ANTHROPIC API密钥 # 读取本地图像 img = Image.open("../data/images/prometheus_paper_card.png") plt.imshow(img) # 加载图像数据 image_documents = SimpleDirectoryReader( input_files=["../data/images/prometheus_paper_card.png"] ).load_data() # 初始化Anthropic多模态类 anthropic_mm_llm = AnthropicMultiModal(max_tokens=300) # 推理图像 response = anthropic_mm_llm.complete( prompt="Describe the images as an alternative text", image_documents=image_documents, ) print(response)
接下来,我们将展示如何使用AnthropicMultiModal类来从URL加载并推理图像。
from PIL import Image import requests from io import BytesIO import matplotlib.pyplot as plt from llama_index.core.multi_modal_llms.generic_utils import load_image_urls image_urls = [ "https://venturebeat.com/wp-content/uploads/2024/03/Screenshot-2024-03-04-at-12.49.41%E2%80%AFAM.png", # 添加你自己的URL ] img_response = requests.get(image_urls[0]) img = Image.open(BytesIO(img_response.content)) plt.imshow(img) image_url_documents = load_image_urls(image_urls) response = anthropic_mm_llm.complete( prompt="Describe the images as an alternative text", image_documents=image_url_documents, ) print(response)
我们还可以使用多模态Pydantic程序从图像生成结构化输出。
from llama_index.core import SimpleDirectoryReader from PIL import Image import matplotlib.pyplot as plt from pydantic import BaseModel from typing import List class TickerInfo(BaseModel): direction: str ticker: str company: str shares_traded: int percent_of_total_etf: float class TickerList(BaseModel): fund: str tickers: List[TickerInfo] image_documents = SimpleDirectoryReader( input_files=["../data/images/ark_email_sample.PNG"] ).load_data() img = Image.open("../data/images/ark_email_sample.PNG") plt.imshow(img) from llama_index.multi_modal_llms.anthropic import AnthropicMultiModal from llama_index.core.program import MultiModalLLMCompletionProgram from llama_index.core.output_parsers import PydanticOutputParser prompt_template_str = """ Can you get the stock information in the image \ and return the answer? Pick just one fund. Make sure the answer is a JSON format corresponding to a Pydantic schema. The Pydantic schema is given below. """ anthropic_mm_llm = AnthropicMultiModal(max_tokens=300) llm_program = MultiModalLLMCompletionProgram.from_defaults( output_cls=TickerList, image_documents=image_documents, prompt_template_str=prompt_template_str, multi_modal_llm=anthropic_mm_llm, verbose=True, ) response = llm_program() print(str(response))
如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!
参考资料:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。