当前位置:   article > 正文

工具系列:PandasAI介绍_快速入门

pandasai

PandasAI

PandasAI是一个使数据分析变得富有对话性和有趣的库。它利用pandas数据框和最先进的LLMs的强大功能,让用户以对话方式进行数据分析。

pandas所做的类似(10分钟入门pandas -> https://pandas.pydata.org/docs/user_guide/10min.html),我们希望创建最简单的方式来学习如何掌握PandasAI。

让我们开始吧!

设置

要开始使用,我们需要安装最新版本的PandasAI。

# 安装pandasai库
!pip install pandasai
  • 1
  • 2
Collecting pandasai
  Downloading pandasai-1.2.7-py3-none-any.whl (73 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/73.2 kB[0m [31m?[0m eta [36m-:--:--[0m
[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m71.7/73.2 kB[0m [31m2.2 MB/s[0m eta [36m0:00:01[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.2/73.2 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting astor<0.9.0,>=0.8.1 (from pandasai)
  Downloading astor-0.8.1-py2.py3-none-any.whl (27 kB)
Requirement already satisfied: duckdb<0.9.0,>=0.8.1 in /usr/local/lib/python3.10/dist-packages (from pandasai) (0.8.1)
Collecting ipython<9.0.0,>=8.13.1 (from pandasai)
  Downloading ipython-8.15.0-py3-none-any.whl (806 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m806.6/806.6 kB[0m [31m15.7 MB/s[0m eta [36m0:00:00[0m
[?25hRequirement already satisfied: matplotlib<4.0.0,>=3.7.1 in /usr/local/lib/python3.10/dist-packages (from pandasai) (3.7.1)
Collecting openai<0.28.0,>=0.27.5 (from pandasai)
  Downloading openai-0.27.10-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25hRequirement already satisfied: pandas==1.5.3 in /usr/local/lib/python3.10/dist-packages (from pandasai) (1.5.3)
Requirement already satisfied: pydantic<2,>=1 in /usr/local/lib/python3.10/dist-packages (from pandasai) (1.10.12)
Collecting python-dotenv<2.0.0,>=1.0.0 (from pandasai)
  Downloading python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Requirement already satisfied: scipy<2.0.0,>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from pandasai) (1.11.2)
Collecting sqlalchemy<2.0.0,>=1.4.49 (from pandasai)
  Downloading SQLAlchemy-1.4.49-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m43.8 MB/s[0m eta [36m0:00:00[0m
[?25hRequirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->pandasai) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->pandasai) (2023.3.post1)
Requirement already satisfied: numpy>=1.21.0 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->pandasai) (1.23.5)
Requirement already satisfied: backcall in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai) (0.2.0)
Requirement already satisfied: decorator in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai) (4.4.2)
Collecting jedi>=0.16 (from ipython<9.0.0,>=8.13.1->pandasai)
  Downloading jedi-0.19.0-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m63.3 MB/s[0m eta [36m0:00:00[0m
[?25hRequirement already satisfied: matplotlib-inline in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai) (0.1.6)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30 in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai) (3.0.39)
Requirement already satisfied: pygments>=2.4.0 in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai) (2.16.1)
Collecting stack-data (from ipython<9.0.0,>=8.13.1->pandasai)
  Downloading stack_data-0.6.2-py3-none-any.whl (24 kB)
Requirement already satisfied: traitlets>=5 in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai) (5.7.1)
Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai) (1.1.3)
Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai) (4.8.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai) (1.1.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai) (4.42.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai) (23.1)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai) (3.1.1)
Requirement already satisfied: requests>=2.20 in /usr/local/lib/python3.10/dist-packages (from openai<0.28.0,>=0.27.5->pandasai) (2.31.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from openai<0.28.0,>=0.27.5->pandasai) (4.66.1)
Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from openai<0.28.0,>=0.27.5->pandasai) (3.8.5)
Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<2,>=1->pandasai) (4.5.0)
Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from sqlalchemy<2.0.0,>=1.4.49->pandasai) (2.0.2)
Requirement already satisfied: parso<0.9.0,>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from jedi>=0.16->ipython<9.0.0,>=8.13.1->pandasai) (0.8.3)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.10/dist-packages (from pexpect>4.3->ipython<9.0.0,>=8.13.1->pandasai) (0.7.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.10/dist-packages (from prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30->ipython<9.0.0,>=8.13.1->pandasai) (0.2.6)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas==1.5.3->pandasai) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai<0.28.0,>=0.27.5->pandasai) (3.2.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai<0.28.0,>=0.27.5->pandasai) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai<0.28.0,>=0.27.5->pandasai) (2.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai<0.28.0,>=0.27.5->pandasai) (2023.7.22)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai<0.28.0,>=0.27.5->pandasai) (23.1.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai<0.28.0,>=0.27.5->pandasai) (6.0.4)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai<0.28.0,>=0.27.5->pandasai) (4.0.3)
Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai<0.28.0,>=0.27.5->pandasai) (1.9.2)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai<0.28.0,>=0.27.5->pandasai) (1.4.0)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai<0.28.0,>=0.27.5->pandasai) (1.3.1)
Collecting executing>=1.2.0 (from stack-data->ipython<9.0.0,>=8.13.1->pandasai)
  Downloading executing-1.2.0-py2.py3-none-any.whl (24 kB)
Collecting asttokens>=2.1.0 (from stack-data->ipython<9.0.0,>=8.13.1->pandasai)
  Downloading asttokens-2.4.0-py2.py3-none-any.whl (27 kB)
Collecting pure-eval (from stack-data->ipython<9.0.0,>=8.13.1->pandasai)
  Downloading pure_eval-0.2.2-py3-none-any.whl (11 kB)
Installing collected packages: pure-eval, executing, sqlalchemy, python-dotenv, jedi, asttokens, astor, stack-data, openai, ipython, pandasai
  Attempting uninstall: sqlalchemy
    Found existing installation: SQLAlchemy 2.0.20
    Uninstalling SQLAlchemy-2.0.20:
      Successfully uninstalled SQLAlchemy-2.0.20
  Attempting uninstall: ipython
    Found existing installation: ipython 7.34.0
    Uninstalling ipython-7.34.0:
      Successfully uninstalled ipython-7.34.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires ipython==7.34.0, but you have ipython 8.15.0 which is incompatible.
ipython-sql 0.5.0 requires sqlalchemy>=2.0, but you have sqlalchemy 1.4.49 which is incompatible.[0m[31m
[0mSuccessfully installed astor-0.8.1 asttokens-2.4.0 executing-1.2.0 ipython-8.15.0 jedi-0.19.0 openai-0.27.10 pandasai-1.2.7 pure-eval-0.2.2 python-dotenv-1.0.0 sqlalchemy-1.4.49 stack-data-0.6.2

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86

SmartDataframe

SmartDataframe是一个继承了pd.DataFrame的pandas(或polars)数据框,它除了具有pd.DataFrame的所有属性和方法外,还添加了对话功能。

# 导入pandasai库中的SmartDataframe类
from pandasai import SmartDataframe
  • 1
  • 2

您可以通过从多个不同的来源实例化一个数据框架(pandas或polars数据框架、csv、xlsx或Google Sheets)。

从pandas数据框导入

要从pandas dataframe导入数据,您需要先导入pandas库并创建一个dataframe。

# 导入pandas库
import pandas as pd
  • 1
  • 2

# 创建一个DataFrame对象,包含国家、GDP和幸福指数的数据
df = pd.DataFrame({
    "country": [
        "United States",  # 美国
        "United Kingdom",  # 英国
        "France",  # 法国
        "Germany",  # 德国
        "Italy",  # 意大利
        "Spain",  # 西班牙
        "Canada",  # 加拿大
        "Australia",  # 澳大利亚
        "Japan",  # 日本
        "China",  # 中国
    ],
    "gdp": [
        19294482071552,  # GDP数据
        2891615567872,
        2411255037952,
        3435817336832,
        1745433788416,
        1181205135360,
        1607402389504,
        1490967855104,
        4380756541440,
        14631844184064,
    ],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12],  # 幸福指数数据
})


  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31

由于PandasAI由LLM提供支持,您应该导入您想要用于您的用例的LLM。在这种情况下,我们将使用OpenAI。

要使用OpenAI,您需要一个API令牌。按照以下简单步骤生成您的API_TOKEN:
openai

  1. 访问https://openai.com/api/并使用您的电子邮件地址注册或连接您的Google帐户。
  2. 在个人帐户设置的左侧,点击"View API Keys"。
  3. 选择"Create new Secret key"。

访问openai的API是一个付费服务。在进行实验之前,请阅读Pricing信息。

# 导入OpenAI类
from pandasai.llm import OpenAI

# 创建一个OpenAI对象,并传入api_token参数
llm = OpenAI(api_token="YOUR TOKEN")
  • 1
  • 2
  • 3
  • 4
  • 5

现在我们已经实例化了LLM,我们终于可以实例化SmartDataframe了。

# 创建一个SmartDataframe对象,并传入一个DataFrame对象df和一个配置参数config={"llm": llm}
sdf = SmartDataframe(df, config={"llm": llm})
  • 1
  • 2

一个SmartDataframe继承了原始数据框的所有方法和属性。例如:

# 使用条件筛选,返回country列为'United States'的行
result = sdf[sdf['country'] == 'United States']

# 打印结果
print(result)
  • 1
  • 2
  • 3
  • 4
  • 5

但是您也可以用自然语言进行查询。


# 调用chat函数,参数为"Return the top 5 countries by GDP"
sdf.chat("Return the top 5 countries by GDP")
  • 1
  • 2
  • 3


# 调用chat函数,并传入一个问题作为参数
sdf.chat("What's the sum of the gdp of the 2 unhappiest countries?")
  • 1
  • 2
  • 3
# 打印出sdf对象的last_code_generated属性的值
print(sdf.last_code_generated)
  • 1
  • 2
def analyze_data(dfs: list[pd.DataFrame]) ->dict:
    df_combined = pd.concat(dfs)
    df_sorted = df_combined.sort_values('happiness_index')
    sum_gdp = df_sorted.head(2)['gdp'].sum()
    return {'type': 'number', 'value': sum_gdp}


result = analyze_data(dfs)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

绘制图表

您还可以使用PandasAI轻松绘制图表


# 调用chat函数,传入参数"Plot a chart of the gdp by country",并输出结果
sdf.chat("Plot a chart of the gdp by country")
  • 1
  • 2
  • 3

您还可以提供额外的指示。例如,假设您想为每个柱状图使用不同的颜色。您只需要向PandasAI提出要求即可。


# 使用seaborn库中的chat函数绘制直方图
# 参数为gdp和country,表示按照国家绘制gdp的直方图
# 每个直方图的颜色不同
sdf.chat("Plot a histogram of the gdp by country, using a different color for each bar")
  • 1
  • 2
  • 3
  • 4
  • 5

作为一种替代方法,您可以使用shortcuts。快捷方式是一种函数,可以避免您编写提示并在幕后为您执行"魔法"。

例如,您可以使用.plot_bar_chart()生成相同的图表,提供字段:



# 绘制柱状图
sdf.plot_bar_chart(x="country", y="gdp")
  • 1
  • 2
  • 3
  • 4

因此,例如,如果我们想要将其可视化为饼图,您可以调用plot_pie_chart快捷方式,传递我们想要用作标签的字段和我们想要用作值的字段。



# 绘制饼图
sdf.plot_pie_chart(labels="country", values="gdp")
  • 1
  • 2
  • 3
  • 4

智能数据湖

有时候,您可能希望同时处理多个数据框,让LLM来协调使用哪一个来回答您的查询。在这种情况下,您应该使用SmartDatalake而不是SmartDataframe

这个概念与SmartDataframe非常相似,但是它可以接受多个数据框作为输入,而不仅仅是一个。

# 导入SmartDatalake模块
from pandasai import SmartDatalake
  • 1
  • 2

例如,在这个例子中,我们提供了两个不同的数据框。
在第一个数据框中,每个员工报告了员工编号、姓名和部门。
而在第二个数据框中,提供了员工编号和每个员工的薪水。

询问PandasAI,它将通过员工编号将这两个不同的数据框连接起来,并找出薪水最高的员工的姓名。


# 创建员工信息的数据框
employees_df = pd.DataFrame(
    {
        "EmployeeID": [1, 2, 3, 4, 5],
        "Name": ["John", "Emma", "Liam", "Olivia", "William"],
        "Department": ["HR", "Sales", "IT", "Marketing", "Finance"],
    }
)

# 创建薪资信息的数据框
salaries_df = pd.DataFrame(
    {
        "EmployeeID": [1, 2, 3, 4, 5],
        "Salary": [5000, 6000, 4500, 7000, 5500],
    }
)

# 创建SmartDatalake对象,并将员工信息和薪资信息作为参数传入
lake = SmartDatalake(
    [employees_df, salaries_df],
    config={"llm": llm}
)

# 调用chat方法,传入问题"Who gets paid the most?",返回结果
lake.chat("Who gets paid the most?")
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26

这是一个生成的代码示例:


# 打印变量lake中存储的最后一次执行的代码
print(lake.last_code_executed)
  • 1
  • 2
  • 3
def analyze_data(dfs: list[pd.DataFrame]) ->dict:
    """
    Analyze the data
    1. Prepare: Preprocessing and cleaning data if necessary
    2. Process: Manipulating data for analysis (grouping, filtering, aggregating, etc.)
    3. Analyze: Conducting the actual analysis (if the user asks to plot a chart save it to an image in exports/charts/temp_chart.png and do not show the chart.)
    4. Output: return a dictionary of:
    - type (possible values "text", "number", "dataframe", "plot")
    - value (can be a string, a dataframe or the path of the plot, NOT a dictionary)
    Example output: { "type": "text", "value": "The average loan amount is $15,000." }
    """
    merged_df = pd.merge(dfs[0], dfs[1], on='EmployeeID')
    max_salary_employee = merged_df.loc[merged_df['Salary'].idxmax()]
    employee_name = max_salary_employee['Name']
    return {'type': 'text', 'value': f'The employee who gets paid the most is {employee_name}.'}

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

好的,在这种情况下很容易:两个表都共享一个名为EmployeeID的公共值,对吗?

让我们试试更复杂的情况


# 创建一个包含用户信息的DataFrame
users_df = pd.DataFrame(
    {
        "id": [1, 2, 3, 4, 5],
        "name": ["John", "Emma", "Liam", "Olivia", "William"]
    }
)

# 创建一个名为"users"的SmartDataframe对象,用于处理用户信息
users = SmartDataframe(users_df, name="users")

# 创建一个包含照片信息的DataFrame
photos_df = pd.DataFrame(
    {
        "id": [31, 32, 33, 34, 35],
        "user_id": [1, 1, 2, 4, 5]
    }
)

# 创建一个名为"photos"的SmartDataframe对象,用于处理照片信息
photos = SmartDataframe(photos_df, name="photos")

# 创建一个SmartDatalake对象,将"users"和"photos"作为参数传入,并设置配置项
lake = SmartDatalake([users, photos], config={"llm": llm})

# 调用SmartDatalake对象的chat方法,向其提问"John上传了多少张照片?"
lake.chat("How many photos has been uploaded by John?")
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28

在这种情况下,我们为每个数据框提供了一个表名,这样LLM就有了一些上下文,并且可以更好地执行连接操作。正如您在下面的示例中所看到的,它成功地找出了正确的连接方式。实际上,用户"John"实际上有2张照片。



# 打印lake变量中存储的最后一次执行的代码
print(lake.last_code_executed)
  • 1
  • 2
  • 3
  • 4
def analyze_data(dfs: list[pd.DataFrame]) ->dict:
    users = dfs[0]
    photos = dfs[1]
    merged_df = pd.merge(users, photos, left_on='id', right_on='user_id')
    john_photos = merged_df[merged_df['name'] == 'John']
    num_photos = john_photos.shape[0]
    return {'type': 'number', 'value': num_photos}


result = analyze_data(dfs)

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

不同的LLM

尽管目前OpenAI GPT3.5和GPT4是推荐的模型,我们也支持其他模型,如Starcoder和Falcon。

您可以按照以下方式使用它们:

# 导入所需的库
from pandasai import SmartDataframe
from pandasai.llm import Starcoder, Falcon

# 创建一个Starcoder对象,并传入API令牌
starcoder_llm = Starcoder(api_token="YOUR TOKEN")

# 创建一个Falcon对象,并传入API令牌
falcon_llm = Falcon(api_token="YOUR TOKEN")

# 使用Starcoder对象创建一个SmartDataframe对象,并传入数据框和配置参数
df1 = SmartDataframe(df, config={"llm": starcoder_llm})

# 使用Falcon对象创建一个SmartDataframe对象,并传入数据框和配置参数
df2 = SmartDataframe(df, config={"llm": falcon_llm})

# 打印使用df1对象进行的聊天操作的结果
print(df1.chat("Which country has the highest GDP?"))

# 打印使用df2对象进行的聊天操作的结果
print(df2.chat("Which one is the unhappiest country?"))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

LangChain LLMs

在某些情况下,您可能希望使用LangChain LLMs。

# 安装pandasai[langchain]模块
!pip install pandasai[langchain]
  • 1
  • 2
Requirement already satisfied: pandasai[langchain] in /usr/local/lib/python3.10/dist-packages (1.1.1)
Requirement already satisfied: astor<0.9.0,>=0.8.1 in /usr/local/lib/python3.10/dist-packages (from pandasai[langchain]) (0.8.1)
Requirement already satisfied: ipython<9.0.0,>=8.13.1 in /usr/local/lib/python3.10/dist-packages (from pandasai[langchain]) (8.15.0)
Requirement already satisfied: matplotlib<4.0.0,>=3.7.1 in /usr/local/lib/python3.10/dist-packages (from pandasai[langchain]) (3.7.1)
Requirement already satisfied: openai<0.28.0,>=0.27.5 in /usr/local/lib/python3.10/dist-packages (from pandasai[langchain]) (0.27.10)
Requirement already satisfied: pandas==1.5.3 in /usr/local/lib/python3.10/dist-packages (from pandasai[langchain]) (1.5.3)
Requirement already satisfied: pydantic<2,>=1 in /usr/local/lib/python3.10/dist-packages (from pandasai[langchain]) (1.10.12)
Requirement already satisfied: python-dotenv<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from pandasai[langchain]) (1.0.0)
Requirement already satisfied: scipy<2.0.0,>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from pandasai[langchain]) (1.10.1)
Collecting langchain<0.0.200,>=0.0.199 (from pandasai[langchain])
  Downloading langchain-0.0.199-py3-none-any.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25hRequirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->pandasai[langchain]) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->pandasai[langchain]) (2023.3)
Requirement already satisfied: numpy>=1.21.0 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->pandasai[langchain]) (1.23.5)
Requirement already satisfied: backcall in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[langchain]) (0.2.0)
Requirement already satisfied: decorator in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[langchain]) (4.4.2)
Requirement already satisfied: jedi>=0.16 in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[langchain]) (0.19.0)
Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[langchain]) (0.1.6)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[langchain]) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30 in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[langchain]) (3.0.39)
Requirement already satisfied: pygments>=2.4.0 in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[langchain]) (2.16.1)
Requirement already satisfied: stack-data in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[langchain]) (0.6.2)
Requirement already satisfied: traitlets>=5 in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[langchain]) (5.7.1)
Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[langchain]) (1.1.3)
Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[langchain]) (4.8.0)
Requirement already satisfied: PyYAML>=5.4.1 in /usr/local/lib/python3.10/dist-packages (from langchain<0.0.200,>=0.0.199->pandasai[langchain]) (6.0.1)
Requirement already satisfied: SQLAlchemy<3,>=1.4 in /usr/local/lib/python3.10/dist-packages (from langchain<0.0.200,>=0.0.199->pandasai[langchain]) (2.0.20)
Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /usr/local/lib/python3.10/dist-packages (from langchain<0.0.200,>=0.0.199->pandasai[langchain]) (3.8.5)
Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from langchain<0.0.200,>=0.0.199->pandasai[langchain]) (4.0.3)
Collecting dataclasses-json<0.6.0,>=0.5.7 (from langchain<0.0.200,>=0.0.199->pandasai[langchain])
  Downloading dataclasses_json-0.5.14-py3-none-any.whl (26 kB)
Collecting langchainplus-sdk>=0.0.9 (from langchain<0.0.200,>=0.0.199->pandasai[langchain])
  Downloading langchainplus_sdk-0.0.20-py3-none-any.whl (25 kB)
Requirement already satisfied: numexpr<3.0.0,>=2.8.4 in /usr/local/lib/python3.10/dist-packages (from langchain<0.0.200,>=0.0.199->pandasai[langchain]) (2.8.5)
Collecting openapi-schema-pydantic<2.0,>=1.2 (from langchain<0.0.200,>=0.0.199->pandasai[langchain])
  Downloading openapi_schema_pydantic-1.2.4-py3-none-any.whl (90 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.0/90.0 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[?25hRequirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.10/dist-packages (from langchain<0.0.200,>=0.0.199->pandasai[langchain]) (2.31.0)
Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /usr/local/lib/python3.10/dist-packages (from langchain<0.0.200,>=0.0.199->pandasai[langchain]) (8.2.3)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai[langchain]) (1.1.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai[langchain]) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai[langchain]) (4.42.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai[langchain]) (1.4.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai[langchain]) (23.1)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai[langchain]) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai[langchain]) (3.1.1)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from openai<0.28.0,>=0.27.5->pandasai[langchain]) (4.66.1)
Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<2,>=1->pandasai[langchain]) (4.7.1)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain<0.0.200,>=0.0.199->pandasai[langchain]) (23.1.0)
Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain<0.0.200,>=0.0.199->pandasai[langchain]) (3.2.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain<0.0.200,>=0.0.199->pandasai[langchain]) (6.0.4)
Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain<0.0.200,>=0.0.199->pandasai[langchain]) (1.9.2)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain<0.0.200,>=0.0.199->pandasai[langchain]) (1.4.0)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain<0.0.200,>=0.0.199->pandasai[langchain]) (1.3.1)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.6.0,>=0.5.7->langchain<0.0.200,>=0.0.199->pandasai[langchain])
  Downloading marshmallow-3.20.1-py3-none-any.whl (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.6.0,>=0.5.7->langchain<0.0.200,>=0.0.199->pandasai[langchain])
  Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
Requirement already satisfied: parso<0.9.0,>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from jedi>=0.16->ipython<9.0.0,>=8.13.1->pandasai[langchain]) (0.8.3)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.10/dist-packages (from pexpect>4.3->ipython<9.0.0,>=8.13.1->pandasai[langchain]) (0.7.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.10/dist-packages (from prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30->ipython<9.0.0,>=8.13.1->pandasai[langchain]) (0.2.6)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas==1.5.3->pandasai[langchain]) (1.16.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain<0.0.200,>=0.0.199->pandasai[langchain]) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain<0.0.200,>=0.0.199->pandasai[langchain]) (2.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain<0.0.200,>=0.0.199->pandasai[langchain]) (2023.7.22)
Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from SQLAlchemy<3,>=1.4->langchain<0.0.200,>=0.0.199->pandasai[langchain]) (2.0.2)
Requirement already satisfied: executing>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from stack-data->ipython<9.0.0,>=8.13.1->pandasai[langchain]) (1.2.0)
Requirement already satisfied: asttokens>=2.1.0 in /usr/local/lib/python3.10/dist-packages (from stack-data->ipython<9.0.0,>=8.13.1->pandasai[langchain]) (2.3.0)
Requirement already satisfied: pure-eval in /usr/local/lib/python3.10/dist-packages (from stack-data->ipython<9.0.0,>=8.13.1->pandasai[langchain]) (0.2.2)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.6.0,>=0.5.7->langchain<0.0.200,>=0.0.199->pandasai[langchain])
  Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)
Installing collected packages: mypy-extensions, marshmallow, typing-inspect, openapi-schema-pydantic, langchainplus-sdk, dataclasses-json, langchain
Successfully installed dataclasses-json-0.5.14 langchain-0.0.199 langchainplus-sdk-0.0.20 marshmallow-3.20.1 mypy-extensions-1.0.0 openapi-schema-pydantic-1.2.4 typing-inspect-0.9.0

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76

然后您可以将它们用作PandasAI LLMs。

# 导入所需的库
from pandasai import SmartDataframe
from langchain.llms import OpenAI
# from langchain.llms import Anthropic
# from langchain.llms import LlamaCpp

# 创建一个OpenAI实例,传入你的API密钥和最大token数
langchain_llm = OpenAI(openai_api_key="YOUR TOKEN", max_tokens=1000)

# 创建一个SmartDataframe实例,传入数据框和配置参数
langchain_sdf = SmartDataframe(df, config={"llm": langchain_llm})

# 调用chat方法,向模型提问
langchain_sdf.chat("Which are the top 5 countries by GPD?")
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

连接器

PandasAI提供了许多连接器,允许您连接到不同的数据源。这些连接器被设计成易于使用,即使您对数据源或PandasAI不熟悉。

要使用连接器,您首先需要安装所需的依赖项。您可以通过运行以下命令来完成此操作:

# 安装pandasai[connectors]包
!pip install pandasai[connectors]
  • 1
  • 2
Requirement already satisfied: pandasai[connectors] in /usr/local/lib/python3.10/dist-packages (1.2.7)
Requirement already satisfied: astor<0.9.0,>=0.8.1 in /usr/local/lib/python3.10/dist-packages (from pandasai[connectors]) (0.8.1)
Requirement already satisfied: duckdb<0.9.0,>=0.8.1 in /usr/local/lib/python3.10/dist-packages (from pandasai[connectors]) (0.8.1)
Requirement already satisfied: ipython<9.0.0,>=8.13.1 in /usr/local/lib/python3.10/dist-packages (from pandasai[connectors]) (8.15.0)
Requirement already satisfied: matplotlib<4.0.0,>=3.7.1 in /usr/local/lib/python3.10/dist-packages (from pandasai[connectors]) (3.7.1)
Requirement already satisfied: openai<0.28.0,>=0.27.5 in /usr/local/lib/python3.10/dist-packages (from pandasai[connectors]) (0.27.10)
Requirement already satisfied: pandas==1.5.3 in /usr/local/lib/python3.10/dist-packages (from pandasai[connectors]) (1.5.3)
Requirement already satisfied: pydantic<2,>=1 in /usr/local/lib/python3.10/dist-packages (from pandasai[connectors]) (1.10.12)
Requirement already satisfied: python-dotenv<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from pandasai[connectors]) (1.0.0)
Requirement already satisfied: scipy<2.0.0,>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from pandasai[connectors]) (1.11.2)
Requirement already satisfied: sqlalchemy<2.0.0,>=1.4.49 in /usr/local/lib/python3.10/dist-packages (from pandasai[connectors]) (1.4.49)
Requirement already satisfied: psycopg2<3.0.0,>=2.9.7 in /usr/local/lib/python3.10/dist-packages (from pandasai[connectors]) (2.9.7)
Collecting pymysql<2.0.0,>=1.1.0 (from pandasai[connectors])
  Downloading PyMySQL-1.1.0-py3-none-any.whl (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.8/44.8 kB[0m [31m709.3 kB/s[0m eta [36m0:00:00[0m
[?25hCollecting snowflake-sqlalchemy<2.0.0,>=1.5.0 (from pandasai[connectors])
  Downloading snowflake_sqlalchemy-1.5.0-py2.py3-none-any.whl (33 kB)
Collecting sqlalchemy-databricks<0.3.0,>=0.2.0 (from pandasai[connectors])
  Downloading sqlalchemy_databricks-0.2.0-py3-none-any.whl (4.3 kB)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->pandasai[connectors]) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->pandasai[connectors]) (2023.3.post1)
Requirement already satisfied: numpy>=1.21.0 in /usr/local/lib/python3.10/dist-packages (from pandas==1.5.3->pandasai[connectors]) (1.23.5)
Requirement already satisfied: backcall in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[connectors]) (0.2.0)
Requirement already satisfied: decorator in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[connectors]) (4.4.2)
Requirement already satisfied: jedi>=0.16 in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[connectors]) (0.19.0)
Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[connectors]) (0.1.6)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[connectors]) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30 in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[connectors]) (3.0.39)
Requirement already satisfied: pygments>=2.4.0 in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[connectors]) (2.16.1)
Requirement already satisfied: stack-data in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[connectors]) (0.6.2)
Requirement already satisfied: traitlets>=5 in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[connectors]) (5.7.1)
Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[connectors]) (1.1.3)
Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.10/dist-packages (from ipython<9.0.0,>=8.13.1->pandasai[connectors]) (4.8.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai[connectors]) (1.1.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai[connectors]) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai[connectors]) (4.42.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai[connectors]) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai[connectors]) (23.1)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai[connectors]) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4.0.0,>=3.7.1->pandasai[connectors]) (3.1.1)
Requirement already satisfied: requests>=2.20 in /usr/local/lib/python3.10/dist-packages (from openai<0.28.0,>=0.27.5->pandasai[connectors]) (2.31.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from openai<0.28.0,>=0.27.5->pandasai[connectors]) (4.66.1)
Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from openai<0.28.0,>=0.27.5->pandasai[connectors]) (3.8.5)
Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<2,>=1->pandasai[connectors]) (4.5.0)
Collecting snowflake-connector-python<4.0.0 (from snowflake-sqlalchemy<2.0.0,>=1.5.0->pandasai[connectors])
  Downloading snowflake_connector_python-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (24.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m34.3 MB/s[0m eta [36m0:00:00[0m
[?25hRequirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from sqlalchemy<2.0.0,>=1.4.49->pandasai[connectors]) (2.0.2)
Collecting PyHive<1,>=0 (from sqlalchemy-databricks<0.3.0,>=0.2.0->pandasai[connectors])
  Downloading PyHive-0.7.0.tar.gz (46 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.5/46.5 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting databricks-sql-connector<3,>=2 (from sqlalchemy-databricks<0.3.0,>=0.2.0->pandasai[connectors])
  Downloading databricks_sql_connector-2.9.3-py3-none-any.whl (297 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m297.3/297.3 kB[0m [31m25.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting alembic<2.0.0,>=1.0.11 (from databricks-sql-connector<3,>=2->sqlalchemy-databricks<0.3.0,>=0.2.0->pandasai[connectors])
  Downloading alembic-1.12.0-py3-none-any.whl (226 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.0/226.0 kB[0m [31m22.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting lz4<5.0.0,>=4.0.2 (from databricks-sql-connector<3,>=2->sqlalchemy-databricks<0.3.0,>=0.2.0->pandasai[connectors])
  Downloading lz4-4.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m56.0 MB/s[0m eta [36m0:00:00[0m
[?25hRequirement already satisfied: oauthlib<4.0.0,>=3.1.0 in /usr/local/lib/python3.10/dist-packages (from databricks-sql-connector<3,>=2->sqlalchemy-databricks<0.3.0,>=0.2.0->pandasai[connectors]) (3.2.2)
Requirement already satisfied: openpyxl<4.0.0,>=3.0.10 in /usr/local/lib/python3.10/dist-packages (from databricks-sql-connector<3,>=2->sqlalchemy-databricks<0.3.0,>=0.2.0->pandasai[connectors]) (3.1.2)
Requirement already satisfied: pyarrow>=6.0.0 in /usr/local/lib/python3.10/dist-packages (from databricks-sql-connector<3,>=2->sqlalchemy-databricks<0.3.0,>=0.2.0->pandasai[connectors]) (9.0.0)
Collecting thrift<0.17.0,>=0.16.0 (from databricks-sql-connector<3,>=2->sqlalchemy-databricks<0.3.0,>=0.2.0->pandasai[connectors])
  Downloading thrift-0.16.0.tar.gz (59 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.6/59.6 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Requirement already satisfied: urllib3>=1.0 in /usr/local/lib/python3.10/dist-packages (from databricks-sql-connector<3,>=2->sqlalchemy-databricks<0.3.0,>=0.2.0->pandasai[connectors]) (2.0.4)
Requirement already satisfied: parso<0.9.0,>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from jedi>=0.16->ipython<9.0.0,>=8.13.1->pandasai[connectors]) (0.8.3)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.10/dist-packages (from pexpect>4.3->ipython<9.0.0,>=8.13.1->pandasai[connectors]) (0.7.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.10/dist-packages (from prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30->ipython<9.0.0,>=8.13.1->pandasai[connectors]) (0.2.6)
Requirement already satisfied: future in /usr/local/lib/python3.10/dist-packages (from PyHive<1,>=0->sqlalchemy-databricks<0.3.0,>=0.2.0->pandasai[connectors]) (0.18.3)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas==1.5.3->pandasai[connectors]) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai<0.28.0,>=0.27.5->pandasai[connectors]) (3.2.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai<0.28.0,>=0.27.5->pandasai[connectors]) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.20->openai<0.28.0,>=0.27.5->pandasai[connectors]) (2023.7.22)
Collecting asn1crypto<2.0.0,>0.24.0 (from snowflake-connector-python<4.0.0->snowflake-sqlalchemy<2.0.0,>=1.5.0->pandasai[connectors])
  Downloading asn1crypto-1.5.1-py2.py3-none-any.whl (105 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.0/105.0 kB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[?25hRequirement already satisfied: cffi<2.0.0,>=1.9 in /usr/local/lib/python3.10/dist-packages (from snowflake-connector-python<4.0.0->snowflake-sqlalchemy<2.0.0,>=1.5.0->pandasai[connectors]) (1.15.1)
Requirement already satisfied: cryptography<42.0.0,>=3.1.0 in /usr/local/lib/python3.10/dist-packages (from snowflake-connector-python<4.0.0->snowflake-sqlalchemy<2.0.0,>=1.5.0->pandasai[connectors]) (41.0.3)
Collecting oscrypto<2.0.0 (from snowflake-connector-python<4.0.0->snowflake-sqlalchemy<2.0.0,>=1.5.0->pandasai[connectors])
  Downloading oscrypto-1.3.0-py2.py3-none-any.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.6/194.6 kB[0m [31m18.8 MB/s[0m eta [36m0:00:00[0m
[?25hRequirement already satisfied: pyOpenSSL<24.0.0,>=16.2.0 in /usr/local/lib/python3.10/dist-packages (from snowflake-connector-python<4.0.0->snowflake-sqlalchemy<2.0.0,>=1.5.0->pandasai[connectors]) (23.2.0)
Collecting pycryptodomex!=3.5.0,<4.0.0,>=3.2 (from snowflake-connector-python<4.0.0->snowflake-sqlalchemy<2.0.0,>=1.5.0->pandasai[connectors])
  Downloading pycryptodomex-3.19.0-cp35-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m43.4 MB/s[0m eta [36m0:00:00[0m
[?25hRequirement already satisfied: pyjwt<3.0.0 in /usr/lib/python3/dist-packages (from snowflake-connector-python<4.0.0->snowflake-sqlalchemy<2.0.0,>=1.5.0->pandasai[connectors]) (2.3.0)
Collecting urllib3>=1.0 (from databricks-sql-connector<3,>=2->sqlalchemy-databricks<0.3.0,>=0.2.0->pandasai[connectors])
  Downloading urllib3-1.26.16-py2.py3-none-any.whl (143 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.1/143.1 kB[0m [31m14.7 MB/s[0m eta [36m0:00:00[0m
[?25hRequirement already satisfied: filelock<4,>=3.5 in /usr/local/lib/python3.10/dist-packages (from snowflake-connector-python<4.0.0->snowflake-sqlalchemy<2.0.0,>=1.5.0->pandasai[connectors]) (3.12.2)
Requirement already satisfied: sortedcontainers>=2.4.0 in /usr/local/lib/python3.10/dist-packages (from snowflake-connector-python<4.0.0->snowflake-sqlalchemy<2.0.0,>=1.5.0->pandasai[connectors]) (2.4.0)
Collecting platformdirs<3.9.0,>=2.6.0 (from snowflake-connector-python<4.0.0->snowflake-sqlalchemy<2.0.0,>=1.5.0->pandasai[connectors])
  Downloading platformdirs-3.8.1-py3-none-any.whl (16 kB)
Collecting tomlkit (from snowflake-connector-python<4.0.0->snowflake-sqlalchemy<2.0.0,>=1.5.0->pandasai[connectors])
  Downloading tomlkit-0.12.1-py3-none-any.whl (37 kB)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai<0.28.0,>=0.27.5->pandasai[connectors]) (23.1.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai<0.28.0,>=0.27.5->pandasai[connectors]) (6.0.4)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai<0.28.0,>=0.27.5->pandasai[connectors]) (4.0.3)
Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai<0.28.0,>=0.27.5->pandasai[connectors]) (1.9.2)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai<0.28.0,>=0.27.5->pandasai[connectors]) (1.4.0)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->openai<0.28.0,>=0.27.5->pandasai[connectors]) (1.3.1)
Requirement already satisfied: executing>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from stack-data->ipython<9.0.0,>=8.13.1->pandasai[connectors]) (1.2.0)
Requirement already satisfied: asttokens>=2.1.0 in /usr/local/lib/python3.10/dist-packages (from stack-data->ipython<9.0.0,>=8.13.1->pandasai[connectors]) (2.4.0)
Requirement already satisfied: pure-eval in /usr/local/lib/python3.10/dist-packages (from stack-data->ipython<9.0.0,>=8.13.1->pandasai[connectors]) (0.2.2)
Collecting Mako (from alembic<2.0.0,>=1.0.11->databricks-sql-connector<3,>=2->sqlalchemy-databricks<0.3.0,>=0.2.0->pandasai[connectors])
  Downloading Mako-1.2.4-py3-none-any.whl (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.7/78.7 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[?25hRequirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi<2.0.0,>=1.9->snowflake-connector-python<4.0.0->snowflake-sqlalchemy<2.0.0,>=1.5.0->pandasai[connectors]) (2.21)
Requirement already satisfied: et-xmlfile in /usr/local/lib/python3.10/dist-packages (from openpyxl<4.0.0,>=3.0.10->databricks-sql-connector<3,>=2->sqlalchemy-databricks<0.3.0,>=0.2.0->pandasai[connectors]) (1.1.0)
Requirement already satisfied: MarkupSafe>=0.9.2 in /usr/local/lib/python3.10/dist-packages (from Mako->alembic<2.0.0,>=1.0.11->databricks-sql-connector<3,>=2->sqlalchemy-databricks<0.3.0,>=0.2.0->pandasai[connectors]) (2.1.3)
Building wheels for collected packages: PyHive, thrift
  Building wheel for PyHive (setup.py) ... [?25l[?25hdone
  Created wheel for PyHive: filename=PyHive-0.7.0-py3-none-any.whl size=53872 sha256=1d2a90767825eb44f25f15a386a3191b47df6b0fe2ee1c8b3718ad3c9e9c3592
  Stored in directory: /root/.cache/pip/wheels/d3/fc/31/6974270c69ccc5bf8f848e2e41b527d0e8f5b9b973696a29a9
  Building wheel for thrift (setup.py) ... [?25l[?25hdone
  Created wheel for thrift: filename=thrift-0.16.0-cp310-cp310-linux_x86_64.whl size=373871 sha256=9b6ad9cfb506732a6582e3d6f87721c28d3255de8eeeadf47ed54445c360ad89
  Stored in directory: /root/.cache/pip/wheels/52/f8/d2/acfd995e8247eb0cad372fa6a640a5fcf279ab2ed7c5c4490e
Successfully built PyHive thrift
Installing collected packages: asn1crypto, urllib3, tomlkit, thrift, pymysql, pycryptodomex, platformdirs, oscrypto, Mako, lz4, PyHive, alembic, databricks-sql-connector, sqlalchemy-databricks, snowflake-connector-python, snowflake-sqlalchemy
  Attempting uninstall: urllib3
    Found existing installation: urllib3 2.0.4
    Uninstalling urllib3-2.0.4:
      Successfully uninstalled urllib3-2.0.4
  Attempting uninstall: platformdirs
    Found existing installation: platformdirs 3.10.0
    Uninstalling platformdirs-3.10.0:
      Successfully uninstalled platformdirs-3.10.0
Successfully installed Mako-1.2.4 PyHive-0.7.0 alembic-1.12.0 asn1crypto-1.5.1 databricks-sql-connector-2.9.3 lz4-4.3.2 oscrypto-1.3.0 platformdirs-3.8.1 pycryptodomex-3.19.0 pymysql-1.1.0 snowflake-connector-python-3.2.0 snowflake-sqlalchemy-1.5.0 sqlalchemy-databricks-0.2.0 thrift-0.16.0 tomlkit-0.12.1 urllib3-1.26.16

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
# 导入MySQLConnector和PostgreSQLConnector类
from pandasai.connectors import MySQLConnector, PostgreSQLConnector

# 使用MySQL数据库
loan_connector = MySQLConnector(
    config={
        "host": "localhost", # 主机名
        "port": 3306, # 端口号
        "database": "mydb", # 数据库名
        "username": "root", # 用户名
        "password": "root", # 密码
        "table": "loans", # 表名
        "where": [
            # 这是可选的,用于过滤数据以减少数据框的大小
            ["loan_status", "=", "PAIDOFF"], # 过滤条件
        ],
    }
)

# 使用PostgreSQL数据库
payment_connector = PostgreSQLConnector(
    config={
        "host": "localhost", # 主机名
        "port": 5432, # 端口号
        "database": "mydb", # 数据库名
        "username": "root", # 用户名
        "password": "root", # 密码
        "table": "payments", # 表名
        "where": [
            # 这是可选的,用于过滤数据以减少数据框的大小
            ["payment_status", "=", "PAIDOFF"], # 过滤条件
        ],
    }
)

# 创建SmartDatalake对象,将MySQLConnector和PostgreSQLConnector对象作为参数传入
df_connector = SmartDatalake([loan_connector, payment_connector], config={"llm": llm})

# 调用chat方法,传入问题作为参数,返回答案
response = df_connector.chat("How many loans from the United states?")
print(response)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
# 导入YahooFinanceConnector模块
from pandasai.connectors.yahoo_finance import YahooFinanceConnector

# 创建一个YahooFinanceConnector对象,参数为股票代码"MSFT"
yahoo_connector = YahooFinanceConnector("MSFT")

# 使用YahooFinanceConnector对象创建一个SmartDataframe对象,同时传入配置参数{"llm": llm}
df = SmartDataframe(yahoo_connector, config={"llm": llm})

# 使用SmartDataframe对象的chat方法进行对话,参数为询问昨天的收盘价
response = df.chat("What is the closing price for yesterday?")

# 打印返回的结果
print(response)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
The closing price for yesterday was $319.53.

  • 1
  • 2

# 创建一个YahooFinanceConnector对象,传入参数为股票代码"TSLA"
yahoo_connector = YahooFinanceConnector("TSLA")

# 创建一个SmartDataframe对象,传入参数为yahoo_connector和配置参数{"llm": llm}
df_connector = SmartDataframe(yahoo_connector, config={"llm": llm})

# 调用df_connector的chat方法,传入参数为"Plot the chart of tesla over time",返回结果赋值给response
response = df_connector.chat("Plot the chart of tesla over time")
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

您可以在此处找到有关连接器(以及更多连接器)的更多信息:https://docs.pandas-ai.com/en/latest/connectors/

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/从前慢现在也慢/article/detail/601116
推荐阅读
相关标签
  

闽ICP备14008679号