不正经

这个屌丝很懒，什么也没留下！

热门标签

Python 图算法系列6-使用Py2neo操作neo4j pro - 待续_python处理neo4j proc

作者：不正经 | 2024-02-28 19:19:41

踩

python处理neo4j proc

说明

进一步的使用py2neo操作neo4j，为搭建一个操作图库的方法做准备。上一篇的内容Python 图算法系列5-使用Py2neo操作neo4j

以关系为核心，以子图为单元。

1 节点

1.1 直接创建

关于连接端口。neo4j默认的连接端口是7474(网页访问的端口，以及py2neo连接的端口)。7687端口用于用户输入登录认证信息(网页版）。
使用py2neo可以创建节点

import pandas as pd
import numpy as np
import time
from py2neo import Graph, Node, Relationship, Subgraph, NodeMatcher, RelationshipMatcher

# 创建一个节点
some_node_dict = {'name': 'andy', 'age':123}

graph = Graph("http://123.123.123.123:7474",
              username="neo4j",
              password="andy123")

some_node = Node( **some_node_dict)
graph.create(some_node)
graph.push(some_node)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

登录网页，可以使用cypher语句直接创建

create ({'name':'andy', 'age':124})
1

也可以使用py2neo的壳执行

res = graph.run('''create ({name:'andy', age:124})''')
1

在这里插入图片描述
这时候有两个同样name = 'andy’的节点

统计节点的数量

把所有的节点都删掉

# 把数据库整个清空
graph.delete_all() 
1
2

在这里插入图片描述
这种方式可以删掉数据，但是标签会残留。（更彻底的方法是删掉库文件）

1.2 匹配创建

这种方式类似于create node if not exists。
如果andy节点不存在，则创建一个节点(andy)

# -- create if not exists
## 节点匹配对象
nmatcher = NodeMatcher(graph)
## 匹配节点
some_node = nmatcher.match(name='andy').first()
print('节点是否存在', some_node)
if some_node is None:
    some_node = Node(**some_node_dict)
    graph.create(some_node)
    graph.push(some_node)
# 第一次 ---
节点是否存在 None
# 第二次 ---
节点是否存在 (_41 {age: 123, name: 'andy'})
1
2
3
4
5
6
7
8
9
10
11
12
13
14

在这里插入图片描述
再次把图库清空后，使用cypher创建,结果是一样的

the_cypher = '''
merge (n{name:'andy'})
on create set n.age = 123
'''
res = graph.run(the_cypher)
1
2
3
4
5

1.3 Input/Output

从服务的模式来说，以Flask+Py2neo建立一个核数据库的IO桥梁。视图函数准备几个不同的操作：

在这里插入图片描述

1 查询 Query
2 创建 Create
3 删除 Delete
4 更新 Modify

请求数据以json形式发过来，传到某个视图函数，视图函数完成操作。上面已经完成了创建，下面执行查询、更新和删除。

查询(Query)节点andy

# -- query
## 节点匹配对象
nmatcher = NodeMatcher(graph)
## 匹配节点
some_node = nmatcher.match(name='andy').first()
---
In [40]: some_node.nodes                                                                                             
Out[40]: ((_1 {age: 123, name: 'andy'}),)

In [41]: some_node.values()                                                                                          
Out[41]: dict_values(['andy', 123])

In [42]: some_node.keys()                                                                                            
Out[42]: dict_keys(['name', 'age'])
1
2
3
4
5
6
7
8
9
10
11
12
13
14

使用cypher查询(注意返回的是一个迭代器，数据弹出一次就没了）

the_cypher ='''
match (n{name:'andy'}) return n
'''
res = graph.run(the_cypher)
---
In [45]: res.to_table()                                                                                              
Out[45]: 
 n                             
-------------------------------
 (_1 {age: 123, name: 'andy'})
In [48]: res.to_data_frame()                                                                                         
Out[48]: 
                              n
0  {'name': 'andy', 'age': 123}
1
2
3
4
5
6
7
8
9
10
11
12
13
14

更新(Modify)
(把age改为124)

## 匹配并修改
nmatcher = NodeMatcher(graph)
## 匹配节点
some_node = nmatcher.match(name='andy').first()
if some_node is not None:
    some_node['age'] = 124
    print(some_node)
    graph.create(some_node)
    graph.push(some_node)
1
2
3
4
5
6
7
8
9

使用cypher修改(将age 改为125)

# cypher
the_cypher = '''
merge (n{name:'andy'})
on match set n.age = 125 
return true
'''
res = graph.run(the_cypher)
1
2
3
4
5
6
7

删除(Delete)，可以删除节点（这个在关系中删除，先拆边再删），也可以删除属性
增加了一个test_attr属性

# -- delete 删除
## 先增加一个属性再删除
## 匹配并修改
nmatcher = NodeMatcher(graph)
## 匹配节点
some_node = nmatcher.match(name='andy').first()
if some_node is not None:
    some_node['test_attr'] = 'I am test'
    graph.create(some_node)
    graph.push(some_node)
1
2
3
4
5
6
7
8
9
10

在这里插入图片描述
删除

## 删除test_attr
nmatcher = NodeMatcher(graph)
## 匹配节点
some_node = nmatcher.match(name='andy').first()
if some_node is not None:
    del some_node['test_attr']
    graph.create(some_node)
    graph.push(some_node)
1
2
3
4
5
6
7
8

在这里插入图片描述
使用cypher做同样的事情

# cypher - 增加测试属性
the_cypher = '''
merge (n{name:'andy'})
on match set n.test_attr = 'I am test'
return n.name
'''
res = graph.run(the_cypher)

# cypher - 删除测试属性
the_cypher = '''
match (n{name:'andy'}) remove n.test_attr
return n.name
'''
res = graph.run(the_cypher)
1
2
3
4
5
6
7
8
9
10
11
12
13
14

2 关系（边）

1.1 直接创建

对于py2neo来说，这里的边事实上已经按照子图的方式在存了。在没有匹配的情况下，每次都会创建新的节点（所以需要声明唯一性约束以及匹配创建）

a = Node("Person", name="Alice")
b = Node("Person", name="Bob")
r = Relationship(a, "KNOWS", b)

rel_list = []
rel_list.append(r)

A = Subgraph(relationships=rel_list)
graph.create(A)
graph.push(A)
1
2
3
4
5
6
7
8
9
10

在这里插入图片描述
使用cypher创建

the_cypher ='''
create (n:Person{name:'alice'})-[r:Knows]->(n1:Person{name:'Bob'})
'''
res = graph.run(the_cypher)
1
2
3
4

1.2 匹配创建

强烈推荐使用原生语句，效率不是一般的高

1.2.1 匹配

在py2neo的RelationshipMatcher下面有一个match方法
在这里插入图片描述
当前的图：

现在库中有”Knows“关系，我们按照这个关系来查找，返回一条边和两个节点。

rmatcher = RelationshipMatcher(graph)
some_rel = rmatcher.match(r_type='Knows')
---
In [8]: some_rel.first()                                                                                              
Out[8]: (alice)-[:Knows {}]->(Bob)

In [9]: some_rel.first()                                                                                              
Out[9]: (alice)-[:Knows {}]->(Bob)

In [10]: a = some_rel.first()                                                                                         

In [11]: a.relationships                                                                                              
Out[11]: ((alice)-[:Knows {}]->(Bob),)

In [12]: a.keys()                                                                                                     
Out[12]: dict_keys([])

In [13]: a.nodes                                                                                                      
Out[13]: ((_41:Person {name: 'alice'}), (_42:Person {name: 'Bob'}))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

使用cypher查找

# cypher - 查找关系
the_cypher = '''
match (n1)-[r1:Knows]->(n2) return n1, r1, n2
'''
res = graph.run(the_cypher)
---
In [16]: res.data()                                                                                                   
Out[16]: 
[{'n1': (_41:Person {name: 'alice'}),
  'r1': (alice)-[:Knows {}]->(Bob),
  'n2': (_42:Person {name: 'Bob'})}]
1
2
3
4
5
6
7
8
9
10
11

1.2.2 如果不存在则创建

关系的匹配结果应该是一个三元组，start, end, relation。只有这三者唯一才能确定一条边。为了更好的测试对比，我们再加入一条aa-Knows-bb的边。
bob（已存在节点）Knows cici（不存在）的关系。到目前为止，我们没有引入节点id和关系id，下面看看一个错误操作：

1 选出所有Knows的关系
2 查到a和b看是否已经存在
3 构建一条新的关键
4 查看关系是否在已有的列表中，没有就创建

rmatcher = RelationshipMatcher(graph)
some_rel = rmatcher.match(r_type='Knows')

# 新的关系
a = nmatcher.match('Person', name='Bob').first() if nmatcher.match('Person', name='Bob').first() else Node("Person", name="Bob")
b = nmatcher.match('Person', name='cici').first() if nmatcher.match('Person', name='cici').first() else Node("Person", name="cici")

new_rel = Relationship(a, "Knows", b)

# 循环遍历关系
for rel in some_rel:
    print(rel)
    print('关系是否存在' ,rel == new_rel)
    graph.create(new_rel)
    graph.push(new_rel)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

在这里插入图片描述
本质上，我们的操作无法保证bob一定是bob（有很多），只要节点id不同，就被认为不同。在引入节点id和关系id之前，获取可以这么处理（先把库清空，只保留bob和alice)

rmatcher = RelationshipMatcher(graph)
some_rel = rmatcher.match(r_type='Knows')

# 初始匹配状态
is_bob_existed = False
is_cici_existed = False
is_rel_existed = False

# 关系检索
for rel in some_rel:
    print(rel)
    print('开始节点：', rel.start_node)
    print('结束节点：', rel.end_node)
    # 初始节点
    if rel.start_node['name'] == 'Bob':
        is_bob_existed = True 
        print('Bob exists')
        bob = rel.start_node
    if rel.start_node['name'] == 'cici':
        is_cici_existed = True
        print('cici exists')
        cici = rel.start_node
    # 目标节点
    if rel.end_node['name'] == 'Bob':
        is_bob_existed = True
        print('Bob exists')
        bob = rel.end_node
    if rel.end_node['name'] == 'cici':
        is_cici_existed = True
        print('cici exists')
        cici = rel.end_node
    # 当前新关系匹配
    if rel.start_node['name'] == 'Bob' and rel.end_node['name'] == 'cici':
        is_rel_existed = True
# 不存在则创建
if not is_bob_existed:
    bob = Node('Person', name='Bob')
if not is_cici_existed:
    cici = Node('Person', name='cici')
if not is_rel_existed:
    new_rel = Relationship(bob,'Knows', cici)
    graph.create(new_rel)
    graph.push(new_rel)
--- 第一次
(alice)-[:Knows {}]->(Bob)
开始节点： (_20:Person {name: 'alice'})
结束节点： (_21:Person {name: 'Bob'})
Bob exists

--- 第二次
(alice)-[:Knows {}]->(Bob)
开始节点： (_20:Person {name: 'alice'})
结束节点： (_21:Person {name: 'Bob'})
Bob exists
(Bob)-[:Knows {}]->(cici)
开始节点： (_21:Person {name: 'Bob'})
结束节点： (_6:Person {name: 'cici'})
Bob exists
cici exists


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

在这里插入图片描述
这个是我们想要的结果。不过这样的过程过于冗长（也意味着数据库的效率极低），如果使用了节点(ID)和关系(ID)，那么对于指向性的操作很容易，但是对于模式匹配的方式仍然不是很好（效率），cypher可能比较好。仍然清库，并只保留bob和alice。

# 1 cypher 确保节点存在
the_cypher = '''
merge (n1:Person{name:'Bob'})
on create set n1.name='Bob'
'''
res = graph.run(the_cypher)
# -
the_cypher = '''
merge (n2: Person{name: 'cici'})
on create set n2.name='cici'
'''
res = graph.run(the_cypher)

# 2 cypher - 查找关系并创建
the_cypher = '''
match (n1:Person{name:'Bob'}),(n2:Person{name:'cici'}) 
merge (n1)-[r1:Knows]->(n2)
'''
res = graph.run(the_cypher)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

首先用两个merge确保节点存在，最后匹配两个节点再进行merge创建边。merge不会导致重复创建，问题是当merge指定的键值不唯一时（name不唯一）就会有麻烦。所以归根到底，还是需要有id。然后结合后面，声明唯一性约束，避免意外。
在这里插入图片描述

1.3 Input/Output

根据应用要求，准备参数化的，查询特定关系的接口。cypher语句还支持查”几跳“的关系，这个py2neo似乎是不可以的。

3 子图

对于大量的插入一般是很费时的，首先我们可以使用事务，加快一定速度，而插入的方法一样重要，我们很多时候是遍历一个文件然后生成图，例子中我们生成每个Node后，先把他们放入一个List中，再变为Subgraph实例,然后再create(),耗时比一条条插入至少快10倍以上
对应Cypher的方法是Unwind(列表循环)

通常来说，全局的图非常大，不利于存储和利用，通常我们都是以子图为单位进行操作。为了便于操作，我们引入eid(节点id)和rid(关系id)两个概念。neo4j本身会为节点和关系创建id，但那个是系统自己用于检索和算法的。从业务的角度出发，我们最好自己定义一套id。我倾向于使用abc123的方式创建id。

# 此处为引用资料2
tx = graph.begin()
nodes=[]
for line in lineLists:
    oneNode = Node()
    ........
    #这里的循环，一般是把文件的数据存入node中
    nodes.append(oneNode)
nodes=neo.Subgraph(nodes)
tx.create(nodes)
tx.commit()
1
2
3
4
5
6
7
8
9
10
11

… 电脑重启，中间更新的很多内容都没了，我也不想补了…
简单来说，py2neo 的Relationship和Subgraph都是merge的。也就是说A和B之前R关系只会有一条边，很像建模中主数据的概念。
通过cypher可以建立多条。

match (n1:Person{name:'Bob'}),(n2:Person{name:'cici'}) 
create (n1)-[r1:Knows]->(n2)
1
2

在这里插入图片描述
下面通过批量的方式创建(neo4j版本 3.5.8)：

NWIND会将大量的数据（高达10k或者50k条）分散成一行一行的，每一行都会包含每一次更新所需要的全部信息。
unwind语句

the_cypher = '''with [{id: 29243202, name: '大王', type: 2, innode: False},
         {id: 107606295, name: '小王', type: 1, innode: False,
          regno: '111', esdate: '2010-04-26', creditcode: '222',
          regcapcur_desc: '人民币元', regcapcur: '156', regcap: '10238.000000', islist: '0'}] as data
UNWIND data as row
merge (n{id:row.id})
on match set n.id = row.id , n.name= row.name, n.type = row.type
on create set n.id = row.id, n.name = row.name, n.type = row.type
'''
graph.run(the_cypher)

1
2
3
4
5
6
7
8
9
10
11

在这里插入图片描述
把输出列表的值distinct一下，返回一个true就好了

因此，问题就变成了如何把数据变成cypher的问题。这里我打算使用jinja，我的另一篇文章。

jinja常和flask搭配，是一种模板语言。这里通过jinja生成我们希望要的cypher语句格式。

简单看一下模板文件的形式。在python文件中传入一个字典列表(node_list), 每个node都是一个节点。

1 通过loop.first判断是否是第一个列表元组（那么前面就不用加逗号）
2 通过判断节点的属性是否为空来决定是否赋值。（在之前需要把属性补齐，缺失属性为None）
3 通过for循环，if把数据构架好，然后执行就可以了。

with 
[
{% for node in node_list %}
{%if not loop.first%}
,
{%endif%}
{id:{{node.id}}
,name:'{{node.name}}'
{%if node['properties.regno']%}
,regno:'{{node['properties.regno']}}' 
{%endif%}
}
{% endfor %}
] as data 
UNWIND data as row
merge (n{id:row.id})
on match set n.id = row.id , n.name= row.name, n.regno = row.regno
on create set n.id = row.id, n.name = row.name, n.regno = row.regno
return distinct(true) as status
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

如果两个字典的属性有差异，可以这样补全

# 将两个节点的属性扁平化
node1 = dm.flat_dict(node1)
node2 = dm.flat_dict(node2)
# 取出所有的字典键值
node_keys = set(node1.keys()) | set(node2.keys())
# 构建属性全空的字典模板
node_template = dict(zip(node_keys,[None]*len(node_keys)))

# 用节点1去更新模板
node1d = node_template.copy()
node1d.update(node1)
# 用节点2去更新模板
node2d = node_template.copy()
node2d.update(node2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14

接下来就按照模板方法，将之前的示例数据进行如下处理。（这种方式是按照new的方式，只要关系id不同就会创建。同种关系可能有多条边。）

1 批量导入节点
2 批量导入关系

3.1 批量导入节点

先按照原始数据将节点进行分类，例如分为个人和企业。
分别准备个人的的j2模板和企业的j2模板。
使用py2neo的壳将其存入neo4j

3.1.1 个人

对应比较简单的节点，模板

person_node.j2

with 
[
{% for node in node_list %}
{%if not loop.first%}
,
{%endif%}
{name:'{{node.name}}',id:{{node.id}}}
{% endfor %}
] as data 
UNWIND data as row
merge (n{id:row.id})
on match set n:`个人`, n.id = row.id , n.name= row.name
on create set n:`个人`,n.id = row.id, n.name = row.name
return distinct(true) as status
1
2
3
4
5
6
7
8
9
10
11
12
13
14

对应的python程序

import DataManipulation as dm

import json
import pandas as pd
import numpy as np
import time
from py2neo import Graph, Node, Relationship, Subgraph, NodeMatcher, RelationshipMatcher


graph = Graph("http://111.111.111.111:17000",
              username="neo4j",
              password="mima")


with open('企业族谱-xx.txt', 'r') as f:
    fconent = f.read()

fconent1 = json.loads(fconent)

print('当前的节点数', len(fconent1['nodes']))
print('当前的边数', len(fconent1['links']))

# 1 个人节点集合 - 限制id和name不允许空
person_attrs = ['name', 'id']
person_nodes_list = []
for n in fconent1['nodes']:
    tem_node = {}
    for a in person_attrs:
        tem_node[a] =  n.get(a)
    if n['type'] ==2:
        person_nodes_list.append(tem_node)

person_dict = {
    'searchpath': './',
    'template_name': 'person_node.j2',
    'node_list': person_nodes_list
}

#语句
person_cypher = dm.gen_by_j2(**person_dict)
print(graph.run(person_cypher).data())
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

在neo4j中的结果
在这里插入图片描述
Note: 节点的属性可以被这样置空(删掉)

MATCH (n:`企业`{id:129907004})  set n.islist=Null RETURN n LIMIT 25
1

3.1.2 企业

企业（或者可以认为是一个有较多属性的节点）的属性比较多，要去写j2文件也很麻烦。最好的方式是jinja可以支持jinja变量作为其宏，或者字典键值。结果是我想多了… 这方面还是SAS的宏编程让人印象深刻。
以下的写法是不行的：

{% for node in node_list %}
{%if not loop.first%}
,
{%endif%}
    {
        {#字符型的属性在前，且至少有一个name#}
        {%for attr in str_attr_list%}
            {%if not loop.first%}
            ,
            {%endif%}

            {%if node.{{attr}}%}
            ,{{attr}}:'{{node.{{attr}}}}' 
            {%endif%}

        {%endfor%}

        {#如果有数值型的属性#}
        {%if num_attr_list|length >0%}
        {%for attr in num_attr_list%}
            {%if node['{{attr}}']%}
            ,{{attr}}:{{node['{{attr}}']}}
            {%endif%}

        {%endfor%}
        {%endif%}

    }

{% endfor %}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

那就只剩下一个办法了，先用python生成一个jinja模板，再填充这个模板。
第一步，从样例数据中析取节点数据

# 2 企业节点集合
str_ent_attrs = ['name', 'regno', 'esdate', 'creditcode','regcapcur_desc','regcapcur','regcap','islist']
num_ent_attrs = []
ent_nodes_list = []
for n in fconent1['nodes']:
    tem_node = {}
    tem_node['id'] = n['id']
    ent_attrs = str_ent_attrs + num_ent_attrs
    for a in ent_attrs:
        tem_node[a] = n['properties'].get(a)
    if n['type'] == 1:
        ent_nodes_list.append(tem_node)
num_ent_attrs.append('id')
1
2
3
4
5
6
7
8
9
10
11
12
13

第二步，使用python生成j2模板。

# 使用python先生成jinja模板
j2head = '''
with 
[
{% for node in node_list %}
{%if not loop.first%}
,
{%endif%}
    {
'''

j2taila ='''
    }

{% endfor %}
] as data 
UNWIND data as row
merge (n{id:row.id})
'''

j2tailz = '''return distinct(true) as status'''

# 一个用于生成可能存在的字符串属性，一个则是数值型变量
str_if_template = '''{%% if node['%s'] %%}
            %s%s:'{{node['%s']}}'
            {%% endif %%}'''
num_if_template = '''{%% if node['%s'] %%}
            %s%s:{{node['%s']}}
            {%% endif %%}'''
# --- body 这部分是每个节点需要设置的属性
j2body =''
for i, v in enumerate(str_ent_attrs):
    if i ==0:
        tems = str_if_template % (v, '', v, v)
    else:
        tems = str_if_template % (v, ',', v, v)
    j2body += tems

for v in num_ent_attrs:
    tems = num_if_template % (v,',',v,v)
    j2body += tems
# set 部分 ：这部分可以设置标签，以及所有的变量值 
set_part = ''
for x in str_ent_attrs:
    set_part += ',n.%s=row.%s' %(x,x)
for x in num_ent_attrs:
    set_part += ',n.%s=row.%s' % (x, x)
j2tailb = 'on match set n:`企业`' + set_part + '\n'
j2tailc = 'on create set n:`企业`' + set_part +'\n'

with open('ent_node1.j2','w') as f:
    f.write(j2head+j2body + j2taila + j2tailb + j2tailc + j2tailz)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

生成的j2可以看看，这样省去很多书写的功夫（但是第一次做模板花时间）
ent_node1.j2


with 
[
{% for node in node_list %}
{%if not loop.first%}
,
{%endif%}
    {
{% if node['name'] %}
            name:'{{node['name']}}'
            {% endif %}{% if node['regno'] %}
            ,regno:'{{node['regno']}}'
            {% endif %}{% if node['esdate'] %}
            ,esdate:'{{node['esdate']}}'
            {% endif %}{% if node['creditcode'] %}
            ,creditcode:'{{node['creditcode']}}'
            {% endif %}{% if node['regcapcur_desc'] %}
            ,regcapcur_desc:'{{node['regcapcur_desc']}}'
            {% endif %}{% if node['regcapcur'] %}
            ,regcapcur:'{{node['regcapcur']}}'
            {% endif %}{% if node['regcap'] %}
            ,regcap:'{{node['regcap']}}'
            {% endif %}{% if node['islist'] %}
            ,islist:'{{node['islist']}}'
            {% endif %}{% if node['id'] %}
            ,id:{{node['id']}}
            {% endif %}
    }

{% endfor %}
] as data 
UNWIND data as row
merge (n{id:row.id})
on match set n:`企业`,n.name=row.name,n.regno=row.regno,n.esdate=row.esdate,n.creditcode=row.creditcode,n.regcapcur_desc=row.regcapcur_desc,n.regcapcur=row.regcapcur,n.regcap=row.regcap,n.islist=row.islist,n.id=row.id
on create set n:`企业`,n.name=row.name,n.regno=row.regno,n.esdate=row.esdate,n.creditcode=row.creditcode,n.regcapcur_desc=row.regcapcur_desc,n.regcapcur=row.regcapcur,n.regcap=row.regcap,n.islist=row.islist,n.id=row.id
return distinct(true) as status
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

第三部，使用生成的模板灌入节点数据，并提交数据库

ent_dict = {
    'searchpath': './',
    'template_name': 'ent_node1.j2',
    'node_list': ent_nodes_list,
#语句
ent_cypher = dm.gen_by_j2(**ent_dict)
print(ent_cypher)
with open('tem.txt', 'w') as f:
    f.write(ent_cypher)
print(graph.run(ent_cypher).data())
1
2
3
4
5
6
7
8
9
10

结果，78个节点都导入了。（因为服务器在公网上，存数据大概花了50ms。之后会用更大的数据试试，理论上单次应该1万个节点应该没问题）
在这里插入图片描述

3.2 批量导入关系

先将关系分类，例如任职和投资
分别准备这两种关系的模板
使用py2neo的壳将其导入

3.2.1 任职

例子
{'id': 23036792,
 'from': 34488229,
 'to': 118525322,
 'position_desc': '董事',
 'position': '432A'}

match (n1{id:34488229}),(n2{id:118525322})
create (n1)-[:`任职`{id:23036792,position_desc:'董事',position:'432A'}]->(n2)
1
2
3
4
5
6
7
8
9

把任职视为相对简单的关系，创建方法如下
work_rel.j2

with 
[
{% for rel in rel_list %}
{%if not loop.first%}
,
{%endif%}

{id:'{{rel.id}}',
from:{{rel.from}},
to:{{rel.to}}

{%if rel['position_desc']%}
,position_desc:'{{rel['position_desc']}}' 
{%endif%}

{%if rel['position']%}
,position:'{{rel['position']}}' 
{%endif%}


}
{% endfor %}
] as data 
UNWIND data as row
match (n1{id:row.from}),(n2{id:row.to})
create (n1)-[:`任职`{id:row.id,position_desc:row.position_desc, position:row.position}]->(n2)
return distinct(true) as status
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

在这里插入图片描述

3.2.2投资

投资的关系相对复杂一些，其实本身是有数值型变量的，保存成了字符型

 {'id': 288095445,
  'from': 153951135,
  'to': 108535191,
  'currency_desc': '人民币元',
  'subconam': '3990813.182000',
  'conprop': '1.0000',
  'currency': '156'}
1
2
3
4
5
6
7

还是和节点一样，我们假设有一个字符型属性列表和数值型属性列表，根据其属性值的情况自动循环填充属性。

# 4 投资关系集合
j2_head = '''
with 
[
{% for rel in rel_list %}
{%if not loop.first%}
,
{%endif%}
    {
'''

j2_taila='''
}
{% endfor %}
] as data 
UNWIND data as row
match (n1{id:row.from}), (n2{id:row.to})
'''

j2_tailb='''
create (n1)-[:`%s`{%s}]->(n2)
'''

j2_tailc='''
return distinct(true) as status
'''
# 字符型属性
str_invest_list = ['currency_desc', 'subconam', 'conprop', 'currency','condate']
# 数值型变量
num_invest_list = ['id','from','to']

rel2_list1 = []
for rel in rel2_list:
    for k in str_invest_list:
        if rel.get(k) is not None:
            rel[k] = str(rel[k])
        else:
            rel[k] = None
    if rel['currency_desc'] is None:
        rel['currency_desc'] ='人民币'
    rel2_list1.append(rel)
# 字符型变量if模板
str_if_template_rel = '''{%% if rel['%s'] %%}
            %s%s:'{{rel['%s']}}'
            {%% endif %%}'''
# 数值型变量if模板
num_if_template_rel = '''{%% if rel['%s'] %%}
            %s%s:{{rel['%s']}}
            {%% endif %%}'''

# --- body
j2body = ''
for i, v in enumerate(str_invest_list):
    if i == 0:
        tems = str_if_template_rel % (v, '', v, v)
    else:
        tems = str_if_template_rel % (v, ',', v, v)
    j2body += tems

for v in num_invest_list:
    tems = num_if_template_rel % (v, ',', v, v)
    j2body += tems


# --- 关系属性
num_invest_list.remove('from')
num_invest_list.remove('to')
attr_str = ''
for i, attr in enumerate(str_invest_list+num_invest_list):
    if i ==0:
        tems = '{0}:row.{0}'.format(attr)
    else:
        tems = ',{0}:row.{0}'.format(attr)
    attr_str += tems


with open('invest_node1.j2', 'w') as f :
    j2_tailb_content = j2_tailb % ('投资', attr_str)
    f.write(j2_head+j2body + j2_taila + j2_tailb_content + j2_tailc)

invest_dict = {
    'searchpath': './',
    'template_name': 'invest_node1.j2',
    'rel_list': rel2_list1}
#语句
invest_cypher = dm.gen_by_j2(**invest_dict)
print(invest_cypher)
print(graph.run(invest_cypher).data())
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88

总结一下，这种方式去存还是比较方便，中间调试也出了一些小错误（某个节点没有货币描述）。如果加上下面的约束/约定就不会有这样的问题：

1 节点必须有eid属性，并加上唯一性约束
2 关系必须有rid属性，并加上唯一性约束

更多的一点考虑：

1 py2neo的subgraph是节点唯一，关系类型唯一
2 cypher（目前的做法），节点唯一，关系可以多样
所以，每条记录还要加上create_time
1 每个节点可能需要加上下面提到的时间轴变化，可以保存主数据，必要的时候也可以回溯
2 关系必须要加上create_time, 方便根据时间筛选

4 索引/唯一性约束

创建约束可以参考下这篇文章

1 索引
2 主键约束
3 唯一性约束
4 非空约束
5 检查(check)
6 外键还是不太建议吧

查询当前的索引和约束
在这里插入图片描述
建立索引,登录网页建立

查询此时的schema

创建索引会使操作的速度快很多。例如一个merge…create 节点的操作，在没有建立节点之前，平均花费时间是5~6s/1000条（单线程）；建立节点之后大约是1s/1000条(单线程)。

唯一性约束可以视为是一种更强的索引。建立约束的同时也会创建索引。

CREATE CONSTRAINT ON (n:Person) ASSERT n.email IS UNIQUE
1

可以看到约束的同时加了索引，相当于mysql里的主键
在这里插入图片描述

4.1 约定

eid(Entity ID): 实体ID，唯一识别一个节点。
rid(Relation ID)：关系ID，唯一识别一对关系。

5 数据约定

可以增加，可以修改，不可以删除。（用is_enable属性辨识）

6 时间轴变化

使用时间轴 event_time 来表示核心事件（关系）的发生时间。

create_time: 数据库创建时间，通常也可以表达时间发生时间
event_time: 事件发生的时间，其实可能和创建时间不同。（例如在12:00创建了一个11:30分发生的事）
update_time: 最后的修改时间，一旦有更改update_time > create_time
opr_trace_id: 记录其变化的日志。一种是节点本身的「自环」，一种是关系的修正（交易的打开，关闭）。日志通常可以记在mongo中。
opr_tags: 每次如果有变更，允许节点/关系检查自己的日志，从而生成一些标签。

6.1 时间轴上的衍生作用

对于时序模型而言，自不用多说；对于一般的空间模型（时不变模型）而言也会很方便。
例如，需要根据客户过去一段时间的表现，预测未来会发生什么。时不变模型通常假定在一段时间（例如一年），客户的特征不会发生剧烈的变化，因此在提取特征和观察表现时都会留一段时间。

observe point: 选取任何一个时间，作为分界点，例如2020-1-1
history period: 选择回顾的历史周期（例如一年），那么这个期间就是2019-1-1~2020-1-1
performance period: 表现周期（例如三个月），那么这个期间就是2020-1-1 ~ 2020-4-1
model point: 当前实际的建模时间，例如2020-9-30

如果节点和关系都按之前的方式的方式设立了属性，那么选取建模的样本就非常方便了。

7 测试

7.1 场景

7.2 数据

7.3 吞吐

未建索引，速度非常慢（以下是1k节点的存储时间），而且到了后面是线性减慢，甚至会50s存1k节点。

>>> 2
...
777e2f213e54'}, {'n.eid': 'e0ffe9325deaeacedc7bb2b21f18f536'}, {'n.eid': 'd5c7c05b577bace4a3de9fc51e7121e0'}, {'n.eid': '0be3b390c2adf585fe82562f4d620327'}]
takes 4565.4237270355225 ms
>>>> 3
1
2
3
4
5

创建索引之后，大约在800~1200ms左右。

因为服务器在远端，受网络影响很大。我下班时候测大约50ms/k节点。(话说完后我又补测了一下，这次是100ms/k节点）。

7.4 功能

其他

1 数据删除问题

数据少的时候

# py2neo
graph.delete_all()

# cypher 
# 1 先删除关系
MATCH (n)-[r]-(n1) delete n,r,n1;
# 2 删除节点
MATCH (n) delete n;
1
2
3
4
5
6
7
8

数据多的时候(内存会溢出)：或者修改配置文件增大可用内存，或使用APOC要在配置文件中增加一个插件（jar包），小批量迭代删除。可以参考这篇文章

暴力方法（不适用于生产）,删库后schema也没了（索引和唯一性约束）

# 删掉库文件重启
rm -rf graph.db
1
2

2 边的存储问题

# wrong 这样会重复创建节点
merge (n:enterprise{ eid:row.from_eid})-[r:invest{ rid:row.rid }]->(n1:enterprise{ eid:row.to_eid})
on match set r.rid=row.rid
on create set r.rid=row.rid
return r.rid
# correct - 创建融合边（每种关系只有一条）
merge (n:enterprise{ eid:row.from_eid})

merge (n1:enterprise{ eid:row.to_eid})

create unique (n)-[r:invest]->(n1)
set r.rid=row.rid
return r.rid
# correct - 创建由关系+id指定的唯一边（同种关系可能有多条）
merge (n:enterprise{ eid:row.from_eid})
merge (n1:enterprise{ eid:row.to_eid})

create unique (n)-[r:invest{rid:row.rid}]->(n1)
return r.rid
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

比较一下这两种方法创建边的差别

with 
[
    {   
            from_eid:'2ec2c37743315a007490c721ea582313'
            
            ,rid:'r0'
            
            ,to_eid:'886d1c9082615dde87500b8e2d0a3c9c'
                }
,
    {
            from_eid:'188915a0762628a99577774e1145d0b3'
            
            ,rid:'r1'
            
            ,to_eid:'886d1c9082615dde87500b8e2d0a3c9c'
  }
,
    {
            from_eid:'188915a0762628a99577774e1145d0b3'
            
            ,rid:'rmmm'
            
            ,to_eid:'886d1c9082615dde87500b8e2d0a3c9c'
  }

] as data 
UNWIND data as row
merge (n:enterprise{ eid:row.from_eid})
merge (n1:enterprise{ eid:row.to_eid})

create unique (n)-[r:invest{rid:row.rid}]->(n1)

return r.rid
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

融合边
在这里插入图片描述
由关系标记的多条边（适合交易类存储）

参考

1 py2neo基本用法
 2 py2neo 使用教程
 3 Cypher语法关键字(二)CREATE、MERGE、CREATE UNIQUE、SET

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/不正经/article/detail/161194