赞
踩
基于之前的文章Text2SQL研究-Chat2DB体验与剖析-CSDN博客,我们了解了Text2Sql基本原理和实现要点,下一步我们就对市面各个商业大模型做了SQL生成测试对比,我们挑选了9条语句,
序号 | 问题 |
1 | 2023年度,累计加油金额是多少元? |
2 | 2023年,每个季度的加油金额各是多少元? |
3 | 2023年,每个月的加油金额各是多少元? |
4 | 2023年,每个月的加油金额总体的变化趋势是怎样的? |
5 | 请列出2023年每个月加油金额相比上个月的环比变化百分比 |
6 | 请列出2023年加油金额最高的十台车,并显示每台车的年度加油费用 |
7 | 2023年加油金额最高的十台车,他们的使用性质各是什么 |
8 | 2023年,加油金额最高的月份是哪个月?这个月哪十辆车加油费用最高 |
9 | 请计算2023年车辆的平均油耗并列出油耗较低的前十辆车 |
并针对千帆Qianfan-Chinese-Llama-2-13B、通义千问-Max、讯飞星火-3.5、智普GLM-4做了测试结果分别如下:
千帆Qianfan-Chinese-Llama-2-13B | ||
语句 | 生成的SQL语句 | 准确性 |
1 | SELECT SUM(ROUND(fuel_consumption * cost, 2)) AS total_refueling_cost, SUM(ROUND(mileage * 100)) AS total_driven_distance FROM (SELECT SUM(fuel_consumption * cost) AS fuel_consumption_cost, SUM(mileage) AS mileage, organization_id, vehicle_id, license_plate, year FROM (SELECT id AS refueling_order_id, order_number, license_plate, organization_id,。。。 | 错误 |
2 | SELECT YEAR(refueling_order.refueling_time) as year,QUARTER(refueling_order.refueling_time) as quarter,SUM(refueling_order.cost) as total_amount FROM refueling_order GROUP BY YEAR(refueling_order.refueling_time), QUARTER(refueling_order.refueling_time)ORDER BY YEAR(refueling_order.refueling_time), quarter | 正确 |
3 | SELECT DATE_FORMAT(refueling_order.refueling_time, '%Y-%m') AS year, SUM(refueling_order.fuel_consumption * refueling_order.cost) AS total_amountFROM refueling_orderGROUP BY DATE_FORMAT(refueling_order.refueling_time, '%Y-%m') | 正确 |
4 | SELECT YEAR(refueling_order.refueling_time) as ‘Year’, MONTH(refueling_order.refueling_time) as ‘Month’,SUM(refueling_order.cost) as ‘Total_Refueling_Amount’FROM refueling_order WHERE YEAR(refueling_order.refueling_time) = 2023 GROUP BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time) ORDER BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time) | 错误 |
5 | SELECT YEAR(refueling_order.refueling_time) AS year, MONTH(refueling_order.refueling_time) AS month, SUM(refueling_order.cost) AS total_amount, SUM(refueling_order.cost - LAG(refueling_order.cost) OVER (PARTITION BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time)) AS prev_month_amount) AS prev_month_total_amount, (SUM(refueling_order.cost) - SUM(refueling_order.cost - LAG(refueling_order.cost) OVER (PARTITION BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time))) / SUM(refueling_order.cost - SUM(refueling_order.cost - LAG(refueling_order.cost) OVER (PARTITION BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time)))) * 100 AS percentage_change FROM refueling_order GROUP BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time) ORDER BY YEAR(refueling_order.refueling_time), | 错误 |
6 | SELECT YEAR(refueling_order.refueling_time) AS year, MONTH(refueling_order.refueling_time) AS month, SUM(refueling_order.cost) AS total_amount, SUM(refueling_order.cost - LAG(refueling_order.cost) OVER (PARTITION BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time)) AS prev_month_amount) AS prev_month_total_amount, (SUM(refueling_order.cost) - SUM(refueling_order.cost - LAG(refueling_order.cost) OVER (PARTITION BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time))) / SUM(refueling_order.cost - SUM(refueling_order.cost - LAG(refueling_order.cost) OVER (PARTITION BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time)))) * 100 AS percentage_change FROM refueling_order GROUP BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time) ORDER BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time) | 正确 |
7 | SELECT YEAR(refueling_order.refueling_time) AS year, MONTH(refueling_order.refueling_time) AS month, SUM(refueling_order.cost) AS total_amount, SUM(refueling_order.cost - LAG(refueling_order.cost) OVER (PARTITION BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time)) AS prev_month_amount) AS prev_month_total_amount, (SUM(refueling_order.cost) - SUM(refueling_order.cost - LAG(refueling_order.cost) OVER (PARTITION BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time))) / SUM(refueling_order.cost - SUM(refueling_order.cost - LAG(refueling_order.cost) OVER (PARTITION BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time)))) * 100 AS percentage_change FROM refueling_order GROUP BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time) ORDER BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time) | 错误 |
8 | SELECT YEAR(refueling_order.refueling_time) AS year, MONTH(refueling_order.refueling_time) AS month, SUM(refueling_order.cost) AS total_cost FROM refueling_order GROUP BY YEAR(refueling_order.refueling_time), MONTH(refueling_order.refueling_time) ORDER BY year DESC,month DESC LIMIT 10; | 错误 |
9 | SELECT r.license_plate, r.organization_name, r.vehicle_id, r.refueling_time, r.fuel_consumption, r.cost, r.year, AVG(r.mileage) AS avg_fuel_consumption FROM refueling_order r JOIN driven_distance d ON r.vehicle_id = d.vehicle_id WHERE YEAR(r.refueling_time) = 2023 GROUP BY r.vehicle_id ORDER BY avg_fuel_consumption ASC LIMIT 10; | 错误 |
SQLCoder-34B-Beta | ||
语句 | 生成的SQL语句 | 准确性 |
1 | SELECT SUM(r.cost) AS total_cost FROM refueling_order r WHERE EXTRACT(YEAR,FROM r.refueling_time) = 2023; | 正确 |
2 | SELECT EXTRACT(QUARTER FROM r.refueling_time) AS QUARTER,SUM(r.cost) AS total_cost FROM refueling_order r WHERE EXTRACT(YEAR FROM r.refueling_time) = 2023 GROUP BY QUARTER ORDER BY QUARTER NULLS LAST; | 错误 |
3 | SELECT date_trunc('month', r.refueling_time) AS MONTH, SUM(r.cost) AS total_cost FROM refueling_order r WHERE date_part('year', r.refueling_time) = 2023 GROUP BY MONTH ORDER BY MONTH; | 错误 |
4 | SELECT date_trunc('month', r.refueling_time) AS MONTH, SUM(r.cost) AS total_cost FROM refueling_order r WHERE date_part('year', r.refueling_time) = 2023 GROUP BY MONTH ORDER BY MONTH; | 错误 |
5 | SELECT date_trunc('month', r.refueling_time) AS MONTH, SUM(r.cost) AS total_cost, lag(SUM(r.cost)) OVER ( ORDER BY date_trunc('month', r.refueling_time)) AS previous_month_total_cost FROM refueling_order r WHERE date_part('year', r.refueling_time) = 2023 GROUP BY MONTH ORDER BY MONTH; | 错误 |
6 | SELECT r.license_plate,SUM(r.cost) AS total_cost FROM refueling_order r WHERE EXTRACT(YEAR FROM r.refueling_time) = 2023 GROUP BY r.license_plate ORDER BY total_cost DESC NULLS LAST LIMIT 10; | 错误 |
7 | SELECT v.usage_type,SUM(r.cost) AS total_cost FROM refueling_order r JOIN vehicle v ON r.vehicle_id = v.id WHERE EXTRACT(YEAR FROM r.refueling_time) = 2023 GROUP BY v.usage_type ORDER BY total_cost DESC LIMIT 10; | 错误 |
8 | SELECT date_trunc('month', o.refueling_time) AS MONTH, COUNT(*) AS number_of_orders,SUM(o.cost) AS total_cost FROM refueling_order o WHERE date_part('year', o.refueling_time) = 2023 GROUP BY MONTH ORDER BY total_cost DESC LIMIT 1; | 错误 |
9 | SELECT AVG(r.fuel_consumption) AS average_fuel_consumption FROM refueling_order r WHERE r.year = 2023 ORDER BY r.fuel_consumption ASC NULLS LAST LIMIT 10; | 错误 |
通义千问-Max | ||
语句 | 生成的SQL语句 | 准确性 |
1 | SELECT v.license_plate, SUM(ro.fuel_consumption) AS total_fuel_consumption, SUM(dd.mileage) AS total_mileage FROM vehicle v JOIN refueling_order ro ON v.id = ro.vehicle_id JOIN driven_distance dd ON v.id = dd.vehicle_id AND dd.year = [指定年份] WHERE v.organization_id = [某个组织ID] GROUP BY v.license_plate; | 错误 |
2 | SELECT v.license_plate, SUM(ro.fuel_consumption) AS total_fuel_consumption, SUM(dd.mileage) AS total_mileageFROM vehicle vJOIN refueling_order ro ON v.id = ro.vehicle_idJOIN driven_distance dd ON v.id = dd.vehicle_id AND dd.year = [指定年份]WHERE v.organization_id = [某个组织ID]GROUP BY v.license_plate; | 错误 |
3 | SELECT v.license_plate, SUM(ro.fuel_consumption) AS total_fuel_consumption, SUM(dd.mileage) AS total_mileageFROM vehicle vJOIN refueling_order ro ON v.id = ro.vehicle_idJOIN driven_distance dd ON v.id = dd.vehicle_id AND dd.year = [指定年份]WHERE v.organization_id = [某个组织ID]GROUP BY v.license_plate; | 错误 |
4 | SELECT v.license_plate, SUM(ro.fuel_consumption) AS total_fuel_consumption, MAX(dd.mileage) AS year_mileageFROM vehicle vJOIN refueling_order ro ON v.id = ro.vehicle_idJOIN driven_distance dd ON v.id = dd.vehicle_id AND dd.year = 2021WHERE ro.refueling_time BETWEEN '2021-01-01' AND '2021-12-31'GROUP BY v.license_plate; | 错误 |
5 | SELECT v.license_plate, SUM(ro.fuel_consumption) AS total_fuel_consumption, SUM(dd.mileage) AS total_mileageFROM vehicle vJOIN refueling_order ro ON v.id = ro.vehicle_idJOIN driven_distance dd ON v.id = dd.vehicle_id AND dd.year = [指定年份]WHERE v.organization_id = [某个组织ID]GROUP BY v.license_plate; | 错误 |
6 | SELECT v.license_plate, SUM(ro.fuel_consumption) AS total_fuel_consumption, SUM(dd.mileage) AS total_mileageFROM vehicle vJOIN refueling_order ro ON v.id = ro.vehicle_idJOIN driven_distance dd ON v.id = dd.vehicle_id AND dd.year = [指定年份]WHERE v.organization_id = [某个组织ID]GROUP BY v.license_plate; | 错误 |
7 | SELECT v.license_plate, SUM(ro.fuel_consumption) AS total_fuel_consumption, SUM(dd.mileage) AS total_mileageFROM vehicle vJOIN refueling_order ro ON v.id = ro.vehicle_idJOIN driven_distance dd ON v.id = dd.vehicle_id AND dd.year = [指定年份]WHERE v.organization_id = [某个组织ID]GROUP BY v.license_plate; | 错误 |
8 | SELECT v.license_plate, SUM(ro.fuel_consumption) AS total_fuel_consumption, SUM(dd.mileage) AS total_mileageFROM vehicle vJOIN refueling_order ro ON v.id = ro.vehicle_idJOIN driven_distance dd ON v.id = dd.vehicle_id AND dd.year = [指定年份]WHERE v.organization_id = [某个组织ID]GROUP BY v.license_plate; | 错误 |
9 | SELECT v.license_plate, SUM(ro.fuel_consumption) AS total_fuel_consumption, SUM(dd.mileage) AS total_mileageFROM vehicle vJOIN refueling_order ro ON v.id = ro.vehicle_idJOIN driven_distance dd ON v.id = dd.vehicle_id AND dd.year = [指定年份]WHERE v.organization_id = [某个组织ID]GROUP BY v.license_plate; | 错误 |
讯飞星火-3.5 | ||
语句 | 生成的SQL语句 | 准确性 |
1 | SELECT SUM(cost) FROM refueling_order WHERE YEAR(refueling_time) = 2023; | 正确 |
2 | SELECT QUARTER(refueling_time) AS quarter, SUM(cost) AS total_cost FROM refueling_order WHERE YEAR(refueling_time) = 2023 GROUP BY quarter; | 正确 |
3 | SELECT MONTH(refueling_time) as month, SUM(cost) as total_cost FROM refueling_order WHERE YEAR(refueling_time) = 2023 GROUP BY month | 正确 |
4 | SELECT MONTH(refueling_time) AS month, SUM(cost) AS total_cost FROM refueling_order WHERE YEAR(refueling_time) = 2023 GROUP BY month | 错误 |
5 | SELECT t1.month, ((t1.total_cost - t2.total_cost) / t2.total_cost) * 100 AS month_on_month_percentage FROM (SELECT MONTH(refueling_time) AS month, SUM(cost) AS total_cost FROM refueling_order WHERE YEAR(refueling_time) = 2023 GROUP BY month) t1 JOIN (SELECT MONTH(refueling_time) AS month, SUM(cost) AS total_cost FROM refueling_order WHERE YEAR(refueling_time) = 2023 AND MONTH(refueling_time) < 12 GROUP BY month) t2 ON t1.month = t2.month + 1; | 正确 |
6 | SELECT vehicle_id, SUM(cost) AS total_cost FROM refueling_order WHERE YEAR(refueling_time) = 2023 GROUP BY vehicle_id ORDER BY total_cost DESC LIMIT 10 | 正确 |
7 | SELECT v.license_plate, v.usage_type FROM vehicle v JOIN refueling_order r ON v.id = r.vehicle_id WHERE YEAR(r.refueling_time) = 2023 GROUP BY v.license_plate ORDER BY SUM(r.cost) DESC LIMIT 10 | 正确 |
8 | SELECT MONTH(refueling_time) AS month, license_plate, SUM(cost) AS total_cost FROM refueling_order WHERE YEAR(refueling_time) = 2023 GROUP BY month, license_plate ORDER BY total_cost DESC LIMIT 10; | 错误 |
9 | SELECT vehicle_id, AVG(fuel_consumption) as avg_fuel_consumption FROM refueling_order WHERE YEAR(refueling_time) = 2023 GROUP BY vehicle_id ORDER BY avg_fuel_consumption ASC LIMIT 10; | 错误 |
智普GLM-4 | ||
语句 | 生成的SQL语句 | 准确性 |
1 | SELECT SUM(cost) AS total_cost FROM refueling_order WHERE YEAR(refueling_time) = 2023 | 正确 |
2 | SELECT QUARTER(refueling_time) AS quarter, SUM(cost) AS total_costFROM refueling_orderWHERE YEAR(refueling_time) = 2023GROUP BY quarterORDER BY quarter | 正确 |
3 | SELECT DATE_FORMAT(refueling_time, '%Y-%m') AS month, -- 获取年-月 SUM(cost) AS total_cost -- 计算每个月的加油金额总和FROM refueling_orderWHERE YEAR(refueling_time) = 2023 -- 确保只选取2023年的数据GROUP BY monthORDER BY month | 正确 |
4 | SELECT DATE_FORMAT(refueling_time, '%Y-%m') AS month, SUM(cost) AS total_costFROM refueling_orderWHERE DATE_FORMAT(refueling_time, '%Y') = '2023' -- 仅选择2023年的数据GROUP BY monthORDER BY month | 正确 |
5 | SELECT DATE_FORMAT(refueling_time, '%Y-%m') AS month, -- 格式化日期到年-月 SUM(cost) AS monthly_cost, -- 每个月的加油总金额 LAG(SUM(cost)) OVER (ORDER BY DATE_FORMAT(refueling_time, '%Y-%m')) AS previous_month_cost, -- 上个月的加油总金额 ( (SUM(cost) - LAG(SUM(cost)) OVER (ORDER BY DATE_FORMAT(refueling_time, '%Y-%m'))) / NULLIF(LAG(SUM(cost)) OVER (ORDER BY DATE_FORMAT(refueling_time, '%Y-%m')), 0) ) * 100 AS percentage_change -- 计算环比变化百分比FROM refueling_orderWHERE YEAR(refueling_time) = 2023 -- 筛选2023年的数据GROUP BY monthORDER BY month | 正确 |
6 | SELECT ro.license_plate, SUM(ro.cost) AS annual_fueling_costFROM refueling_order roJOIN vehicle v ON ro.vehicle_id = v.idWHERE YEAR(ro.refueling_time) = 2023GROUP BY ro.license_plate ORDER BY annual_fueling_cost DESCLIMIT 10 | 正确 |
7 | SELECT vo.license_plate, v.usage_type, SUM(vo.cost) AS total_costFROM refueling_order voINNER JOIN vehicle v ON vo.license_plate = v.license_plate WHERE YEAR(vo.refueling_time) = 2023 GROUP BY vo.license_plateORDER BY total_cost DESCLIMIT 10 | 正确 |
8 | SELECT MAX(total_cost_by_month.month) AS max_cost_month, vehicles_with_max_cost.license_plate, vehicles_with_max_cost.total_costFROM ( SELECT EXTRACT(MONTH FROM refueling_time) AS month, SUM(cost) AS total_cost FROM refueling_order WHERE EXTRACT(YEAR FROM refueling_time) = 2023 GROUP BY month) AS total_cost_by_monthJOIN ( SELECT license_plate, SUM(cost) AS total_cost, EXTRACT(MONTH FROM refueling_time) AS month FROM refueling_order WHERE EXTRACT(YEAR FROM refueling_time) = 2023 GROUP BY license_plate, month) AS vehicles_with_max_cost ON total_cost_by_month.month = vehicles_with_max_cost.monthWHERE total_cost_by_month.total_cost = ( SELECT MAX(total_cost) FROM ( SELECT EXTRACT(MONTH FROM refueling_time) AS month, SUM(cost) AS total_cost FROM refueling_order WHERE EXTRACT(YEAR FROM refueling_time) = 2023 GROUP BY month ) AS max_cost_per_month )ORDER BY vehicles_with_max_cost.total_cost DESCLIMIT 10 | 错误 |
9 | SELECT v.id AS vehicle_id, v.license_plate, AVG(ro.fuel_consumption) AS average_fuel_consumption FROM vehicle v JOIN refueling_order ro ON v.license_plate = ro.license_plate WHERE YEAR(ro.refueling_time) = 2023 -- 确保只考虑2023年的数据 GROUP BY v.id, v.license_plate-- 然后选择油耗较低的前十辆车SELECT vehicle_id, license_plate, average_fuel_consumptionFROM VehicleAverageFuelConsumptionORDER BY average_fuel_consumption ASC -- 升序排列,以获取油耗最低的车辆LIMIT 10 | 正确 |
【结果评价】8对1错,准确率很高,有概率识别复合问题
根据您以上测试结果,对市面上几款商业大模型在SQL生成方面的性能进行了比较,可以看出各个模型的表现存在明显差异。
综合评价来看,各个模型在SQL生成任务上的表现各不相同,体现了在自然语言理解、问题转换、以及特定数据库知识方面的差异。智普GLM-4和讯飞星火-3.5在测试中表现较好,显示出较高的准确性和一定程度上处理复杂问题的能力。而其他模型如SQLCoder-34B-Beta和通义千问-Max则在这些测试中遇到了较大挑战。这些差异可能源于模型的设计、训练数据的丰富性、以及对特定数据库语言的理解等方面的不同。未来的发展可能会看到这些模型在性能上的进一步优化和改进。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。