赞
踩
City | Country | Population |
---|---|---|
athens | greece | 1368 |
bangkok | thailand | 1178 |
barcelona | spain | 1280 |
berlin | east_germany | 3481 |
birmingham | united_kingdom | 1112 |
当我们要查找某个athens在哪个国家时,可以使用如下的sql语句:
SELECT Country FROM city_table WHERE City = 'athens'
如果用户使用了自然语言进行查询
What cities are located in China
我们可以通过句法分析,将该自然语言转化为对应的sql语句。
首先,我们需要编写符合乔姆斯基范式的上下文无关文法:
% start S
S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]
VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp]
VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap]
NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]
PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np]
AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp]
NP[SEM='Country="greece"'] -> 'Greece'
NP[SEM='Country="china"'] -> 'China'
Det[SEM='SELECT'] -> 'Which' | 'What'
N[SEM='City FROM city_table'] -> 'cities'
IV[SEM=''] -> 'are'
A[SEM=''] -> 'located'
P[SEM=''] -> 'in'
我们将该文法保存在文件sql0.fcfg文件中,然后使用NLTK来根据这个文法规则来解析用户输入的自然语言。
from nltk import load_parser
cfg_path='grammars/book_grammars/sql0.fcfg'
cp=load_parser(cfg_path,trace=3)
query = 'What cities are located in China'
trees = list(cp.parse(query.split()))
answer = trees[0].label()['SEM']
answer = [s for s in answer if s]
q = ' '.join(answer)
输出如下:
SELECT City FROM city_table WHERE Country="china"
因为在load_parser
指定了参数trace=3
,语法树的构建过程也会被打印出来
Leaf Init Rule: |[-] . . . . .| [0:1] 'What' |. [-] . . . .| [1:2] 'cities' |. . [-] . . .| [2:3] 'are' |. . . [-] . .| [3:4] 'located' |. . . . [-] .| [4:5] 'in' |. . . . . [-]| [5:6] 'China' Feature Bottom Up Predict Combine Rule: |[-] . . . . .| [0:1] Det[SEM='SELECT'] -> 'What' * Feature Bottom Up Predict Combine Rule: |[-> . . . . .| [0:1] NP[SEM=(?det+?n)] -> Det[SEM=?det] * N[SEM=?n] {?det: 'SELECT'} Feature Bottom Up Predict Combine Rule: |. [-] . . . .| [1:2] N[SEM='City FROM city_table'] -> 'cities' * Feature Single Edge Fundamental Rule: |[---] . . . .| [0:2] NP[SEM=(SELECT, City FROM city_table)] -> Det[SEM='SELECT'] N[SEM='City FROM city_table'] * Feature Bottom Up Predict Combine Rule: |[---> . . . .| [0:2] S[SEM=(?np+WHERE+?vp)] -> NP[SEM=?np] * VP[SEM=?vp] {?np: (SELECT, City FROM city_table)} Feature Bottom Up Predict Combine Rule: |. . [-] . . .| [2:3] IV[SEM=''] -> 'are' * Feature Bottom Up Predict Combine Rule: |. . [-> . . .| [2:3] VP[SEM=(?v+?pp)] -> IV[SEM=?v] * PP[SEM=?pp] {?v: ''} |. . [-> . . .| [2:3] VP[SEM=(?v+?ap)] -> IV[SEM=?v] * AP[SEM=?ap] {?v: ''} Feature Bottom Up Predict Combine Rule: |. . . [-] . .| [3:4] A[SEM=''] -> 'located' * Feature Bottom Up Predict Combine Rule: |. . . [-> . .| [3:4] AP[SEM=?pp] -> A[SEM=?a] * PP[SEM=?pp] {?a: ''} Feature Bottom Up Predict Combine Rule: |. . . . [-] .| [4:5] P[SEM=''] -> 'in' * Feature Bottom Up Predict Combine Rule: |. . . . [-> .| [4:5] PP[SEM=(?p+?np)] -> P[SEM=?p] * NP[SEM=?np] {?p: ''} Feature Bottom Up Predict Combine Rule: |. . . . . [-]| [5:6] NP[SEM='Country="china"'] -> 'China' * Feature Bottom Up Predict Combine Rule: |. . . . . [->| [5:6] S[SEM=(?np+WHERE+?vp)] -> NP[SEM=?np] * VP[SEM=?vp] {?np: 'Country="china"'} Feature Single Edge Fundamental Rule: |. . . . [---]| [4:6] PP[SEM=(, Country="china")] -> P[SEM=''] NP[SEM='Country="china"'] * Feature Single Edge Fundamental Rule: |. . . [-----]| [3:6] AP[SEM=(, Country="china")] -> A[SEM=''] PP[SEM=(, Country="china")] * Feature Single Edge Fundamental Rule: |. . [-------]| [2:6] VP[SEM=(, , Country="china")] -> IV[SEM=''] AP[SEM=(, Country="china")] * Feature Single Edge Fundamental Rule: |[===========]| [0:6] S[SEM=(SELECT, City FROM city_table, WHERE, , , Country="china")] -> NP[SEM=(SELECT, City FROM city_table)] VP[SEM=(, , Country="china")] *
值得注意的是,这里使用的线图分析器自底向上地构建整棵语法树,生成(i,j)位置的树结构,是通过合并(i,k)和(k,j)的结果来实现的,直到规约到句子 S S S时结束。
为了更清楚的看到这棵语法树,我们可以将它打印出来
print(trees)
输出如下(为了看的更清楚,我对输出进行了缩进处理,下载地址):
[ Tree( S[ SEM=(SELECT, City FROM city_table, WHERE, , , Country="china") ], [ Tree( NP[SEM=(SELECT, City FROM city_table)], [ Tree( Det[SEM='SELECT'], ['What'] ), Tree( N[SEM='City FROM city_table'], ['cities'] ) ] ), Tree( VP[SEM=(, , Country="china")], [ Tree( IV[SEM=''], ['are'] ), Tree( AP[SEM=(, Country="china")], [ Tree( A[SEM=''], ['located'] ), Tree( PP[SEM=(, Country="china")], [ Tree( P[SEM=''], ['in'] ), Tree( NP[SEM='Country="china"'], ['China'] ) ] ) ] ) ] ) ] ) ]
完整代码可以从我的github上下载
参考文献:
[1] Natural Language Processing with Python(http://www.nltk.org/book/ch10.html)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。