当前位置:   article > 正文

使用NLTK将自然语言查询转化为SQL查询_nltk sql

nltk sql

文章目录


假设我们有一个sql表,如下所示:

CityCountryPopulation
athensgreece1368
bangkokthailand1178
barcelonaspain1280
berlineast_germany3481
birminghamunited_kingdom1112

当我们要查找某个athens在哪个国家时,可以使用如下的sql语句

SELECT Country FROM city_table WHERE City = 'athens'
  • 1

如果用户使用了自然语言进行查询

What cities are located in China
  • 1

我们可以通过句法分析,将该自然语言转化为对应的sql语句。
首先,我们需要编写符合乔姆斯基范式的上下文无关文法:

% start S
S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]
VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp]
VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap]
NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]
PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np]
AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp]
NP[SEM='Country="greece"'] -> 'Greece'
NP[SEM='Country="china"'] -> 'China'
Det[SEM='SELECT'] -> 'Which' | 'What'
N[SEM='City FROM city_table'] -> 'cities'
IV[SEM=''] -> 'are'
A[SEM=''] -> 'located'
P[SEM=''] -> 'in'
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

我们将该文法保存在文件sql0.fcfg文件中,然后使用NLTK来根据这个文法规则来解析用户输入的自然语言。

from nltk import load_parser
cfg_path='grammars/book_grammars/sql0.fcfg'
cp=load_parser(cfg_path,trace=3)
query = 'What cities are located in China'
trees = list(cp.parse(query.split()))
answer = trees[0].label()['SEM']
answer = [s for s in answer if s]
q = ' '.join(answer)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

输出如下:

SELECT City FROM city_table WHERE Country="china"
  • 1

因为在load_parser指定了参数trace=3,语法树的构建过程也会被打印出来

Leaf Init Rule:
|[-] . . . . .| [0:1] 'What'
|. [-] . . . .| [1:2] 'cities'
|. . [-] . . .| [2:3] 'are'
|. . . [-] . .| [3:4] 'located'
|. . . . [-] .| [4:5] 'in'
|. . . . . [-]| [5:6] 'China'
Feature Bottom Up Predict Combine Rule:
|[-] . . . . .| [0:1] Det[SEM='SELECT'] -> 'What' *
Feature Bottom Up Predict Combine Rule:
|[-> . . . . .| [0:1] NP[SEM=(?det+?n)] -> Det[SEM=?det] * N[SEM=?n] {?det: 'SELECT'}
Feature Bottom Up Predict Combine Rule:
|. [-] . . . .| [1:2] N[SEM='City FROM city_table'] -> 'cities' *
Feature Single Edge Fundamental Rule:
|[---] . . . .| [0:2] NP[SEM=(SELECT, City FROM city_table)] -> Det[SEM='SELECT'] N[SEM='City FROM city_table'] *
Feature Bottom Up Predict Combine Rule:
|[---> . . . .| [0:2] S[SEM=(?np+WHERE+?vp)] -> NP[SEM=?np] * VP[SEM=?vp] {?np: (SELECT, City FROM city_table)}
Feature Bottom Up Predict Combine Rule:
|. . [-] . . .| [2:3] IV[SEM=''] -> 'are' *
Feature Bottom Up Predict Combine Rule:
|. . [-> . . .| [2:3] VP[SEM=(?v+?pp)] -> IV[SEM=?v] * PP[SEM=?pp] {?v: ''}
|. . [-> . . .| [2:3] VP[SEM=(?v+?ap)] -> IV[SEM=?v] * AP[SEM=?ap] {?v: ''}
Feature Bottom Up Predict Combine Rule:
|. . . [-] . .| [3:4] A[SEM=''] -> 'located' *
Feature Bottom Up Predict Combine Rule:
|. . . [-> . .| [3:4] AP[SEM=?pp] -> A[SEM=?a] * PP[SEM=?pp] {?a: ''}
Feature Bottom Up Predict Combine Rule:
|. . . . [-] .| [4:5] P[SEM=''] -> 'in' *
Feature Bottom Up Predict Combine Rule:
|. . . . [-> .| [4:5] PP[SEM=(?p+?np)] -> P[SEM=?p] * NP[SEM=?np] {?p: ''}
Feature Bottom Up Predict Combine Rule:
|. . . . . [-]| [5:6] NP[SEM='Country="china"'] -> 'China' *
Feature Bottom Up Predict Combine Rule:
|. . . . . [->| [5:6] S[SEM=(?np+WHERE+?vp)] -> NP[SEM=?np] * VP[SEM=?vp] {?np: 'Country="china"'}
Feature Single Edge Fundamental Rule:
|. . . . [---]| [4:6] PP[SEM=(, Country="china")] -> P[SEM=''] NP[SEM='Country="china"'] *
Feature Single Edge Fundamental Rule:
|. . . [-----]| [3:6] AP[SEM=(, Country="china")] -> A[SEM=''] PP[SEM=(, Country="china")] *
Feature Single Edge Fundamental Rule:
|. . [-------]| [2:6] VP[SEM=(, , Country="china")] -> IV[SEM=''] AP[SEM=(, Country="china")] *
Feature Single Edge Fundamental Rule:
|[===========]| [0:6] S[SEM=(SELECT, City FROM city_table, WHERE, , , Country="china")] -> NP[SEM=(SELECT, City FROM city_table)] VP[SEM=(, , Country="china")] *
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42

值得注意的是,这里使用的线图分析器自底向上地构建整棵语法树,生成(i,j)位置的树结构,是通过合并(i,k)和(k,j)的结果来实现的,直到规约到句子 S S S时结束。

为了更清楚的看到这棵语法树,我们可以将它打印出来

print(trees)
  • 1

输出如下(为了看的更清楚,我对输出进行了缩进处理,下载地址):

[
	Tree(
		S[
			SEM=(SELECT, City FROM city_table, WHERE, , , Country="china")
		], 
		[
			Tree(
				NP[SEM=(SELECT, City FROM city_table)], 
				[
					Tree(
						Det[SEM='SELECT'], 
						['What']
					), 
					Tree(
						N[SEM='City FROM city_table'], 
						['cities']
					)
				]
			), 
			Tree(
				VP[SEM=(, , Country="china")],
				[
					Tree(
						IV[SEM=''], 
						['are']
					), 
					Tree(
						AP[SEM=(, Country="china")], 
						[
							Tree(
								A[SEM=''], 
								['located']
							),
							Tree(
								PP[SEM=(, Country="china")], 
								[
									Tree(
										P[SEM=''], 
										['in']
									), 
									Tree(
										NP[SEM='Country="china"'], 
										['China']
									)
								]
							)
						]
					 )
				]
			)
		]
	)
 ]

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54

完整代码可以从我的github上下载
参考文献:

[1] Natural Language Processing with Python(http://www.nltk.org/book/ch10.html)

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/知新_RL/article/detail/342043
推荐阅读
相关标签
  

闽ICP备14008679号