赞
踩
该种方式导入只适合小数据量的方式,并且节点的id创建困难的情况下。在实测的情况下,200万条数据,每条数据创建9个节点,8条边的情况下需要7~8分钟。
数据规则,一个订单有多个角色,一个客户可以属于多个订单,一个客户可谓不同角色,具体表结构如下:
字段 | 类型 | 注释 |
order_id | varchar(50) | 订单id |
order_dt | varchar(50) | 订单日期 |
cust_name | varchar(50) | 客户名称 |
id_no | varchar(50) | 证件号 |
role_type | varchar(50) | 角色 |
mobile_phone | varchar(50) | 手机号 |
home_area | varchar(50) | 家庭区域编码 |
home_detail | varchar(50) | 家庭详细地址 |
company_area | varchar(50) | 公司区域编码 |
company_detail | varchar(50) | 公司详细地址 |
导出的CVS文件anti_fraud.csv,字符集为UTF-8,内容如下:
- order_id,order_dt,cust_name,id_no,role_type,mobile_phone,home_area,home_detail,company_area,company_detail
- "201908010001","2019-08-01","张三","1001","01","10000010001","111101","家庭地址1","111101","公司地址1"
- "201908010001","2019-08-01","赵四","1002","02","10000010002","111101","家庭地址2","111101","公司地址2"
- "201908020002","2019-08-02","王一","1003","01","10000010003","111102","家庭地址3","111102","公司地址3"
- "201908020002","2019-08-02","李二","1004","02","10000010004","111102","家庭地址4","111102","公司地址4"
- "201908020003","2019-08-02","黄五","1005","01","10000010001","111102","家庭地址5","111103","公司地址5"
- "201908020003","2019-08-02","马六","1006","02","10000010006","111102","家庭地址6","111102","公司地址6"
- "201908020004","2019-08-02","张三","1001","02","10000010001","111102","家庭地址1","111103","公司地址5"
- "201908020004","2019-08-02","伟奇","1007","01","10000010007","111102","家庭地址7","111102","公司地址7"
将不同的信息放入不同的实体中,不同的实体中属性名称可以重复,但不推荐,具体实体如下:
订单实体 (Order{order_id,order_dt})
人实体 (Person{cust_name,id_no,role_type})
手机号码实体 (Mobilephone{mobile_phone})
家庭地址实体 (HomeAddress{home_area,home_detail})
公司地址实体 (CompanyAddress{company_area,company_detail})
边可以添加属性,用与区分实体之间的关系,边的信息如下:
人与订单关系 (Person)-[person_appl{order_id}]->(Order)
手机号与人关系 (Mobilephone)-[phone_appl{order_id}]->(Person)
家庭地址与人关系 (HomeAddress)-[homeAddr{order_id}]->(Person)
公司地址与人关系 (companyAddr)-[companyAddr{order_id}]->(Person)
首先将csv文件放到neo4j根目录下的import文件夹下,
唯一的节点创建从数据表中取出并不困难,将每一个节点的数据保存为一个文件,并且去重去空,5个节点分别保存5个文件。例如取出order节点的csv的sql语句为:
select distinct order_id,order_dt from anti_fraud where order_dt is not null and order_id is not null;
将节点导入neo4j数据库,在创建接节点时需要首先创建索引(注:实测过程中先创建索引后创建节点索引才生效),提高边的效率,创建语句如下:
- create index on :Order(order_id,order_dt) ;
- create index on :Person(cust_name,id_no,role_type) ;
- create index on :Mobilephone(mobile_phone);
- create index on :HomeAddress(home_area,home_detail);
- create index on :CompanyAddress(company_area,company_detail);
-
- LOAD CSV WITH HEADERS FROM "file:///order.csv" AS line
- CREATE (appl:Order{order_id:line.order_id,order_dt:line.order_dt});
-
- LOAD CSV WITH HEADERS FROM "file:///person.csv" AS line
- CREATE (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type});
-
- LOAD CSV WITH HEADERS FROM "file:///mobilephone.csv" AS line
- CREATE (m:Mobilephone{mobile_phone:line.mobile_phone});
-
- LOAD CSV WITH HEADERS FROM "file:///homeAddress.csv" AS line
- CREATE (h:HomeAddress{home_area:line.home_area,home_detail:line.home_detail});
-
- LOAD CSV WITH HEADERS FROM "file:///companyAddress.csv" AS line
- CREATE (c:CompanyAddress{company_area:line.company_area,company_detail:line.company_detail});
在创建边时关系以两个实体创建方案最优,如创建实体时多个match命令为交集,若一个实体未查询到,则关系不创建,创建边的关系如下:
- LOAD CSV WITH HEADERS FROM "file:///anti_fraud.csv" AS line
- MATCH (appl:Order{order_id:line.order_id,order_dt:line.order_dt})
- MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
- MERGE (person)-[r:person_appl{order_id:line.order_id}]->(appl);
-
- LOAD CSV WITH HEADERS FROM "file:///anti_fraud.csv" AS line
- MATCH (m:Mobilephone{mobile_phone:line.mobile_phone})
- MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
- MERGE (m)-[r:phone_appl{order_id:line.order_id}]->(person);
-
- LOAD CSV WITH HEADERS FROM "file:///anti_fraud.csv" AS line
- MATCH (h:HomeAddress{home_area:line.home_area,home_detail:line.home_detail})
- MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
- MERGE (h)-[r:homeAddr{order_id:line.order_id}]->(person);
-
- LOAD CSV WITH HEADERS FROM "file:///anti_fraud.csv" AS line
- MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
- MATCH (c:CompanyAddress{company_area:line.company_area,company_detail:line.company_detail})
- MERGE (c)-[r:companyAddr{order_id:line.order_id}]->(person);
以shell方式批量执行,到neo4j下home 执行./bin/cypher-shell -u neo4j -p neo4j,执行以上命令便可以创建,创建内容如下;
若在少量情况下可以用命令语句删除,删除语句如下:
MATCH data=(n)-[r1]->(person:Person)-[r2]->(appl:Appl) delete n,r1,person,r2,appl;
在实际情况中,由于服务器资源有限,大量的查询删除会出现内存不足的情况。因此在全量删除数据时可以将整个数据库删除重启,具体步骤如下:
- 1.停止neo4j服务
- ./bin/neo4j stop
-
- 2.删除neo4j数据库文件,最好最备份
- rm -rf $neo4jhome/data/databases/graph.db
-
- 3.启动neo4j服务
- ./bin/neo4j start
创建import_neo4j.cql文件,将执行的neo4j命令写入一个文件,前两行为批量删除,以全量导入的方式实现,若是增量导入,去掉前两行删除命令,文件内容如下:
- create index on :Order(order_id,order_dt) ;
- create index on :Person(cust_name,id_no,role_type) ;
- create index on :Mobilephone(mobile_phone);
- create index on :HomeAddress(home_area,home_detail);
- create index on :CompanyAddress(company_area,company_detail);
-
- LOAD CSV WITH HEADERS FROM "file:///order.csv" AS line
- CREATE (appl:Order{order_id:line.order_id,order_dt:line.order_dt});
-
- LOAD CSV WITH HEADERS FROM "file:///person.csv" AS line
- CREATE (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type});
-
- LOAD CSV WITH HEADERS FROM "file:///mobilephone.csv" AS line
- CREATE (m:Mobilephone{mobile_phone:line.mobile_phone});
-
- LOAD CSV WITH HEADERS FROM "file:///homeAddress.csv" AS line
- CREATE (h:HomeAddress{home_area:line.home_area,home_detail:line.home_detail});
-
- LOAD CSV WITH HEADERS FROM "file:///companyAddress.csv" AS line
- CREATE (c:CompanyAddress{company_area:line.company_area,company_detail:line.company_detail});
-
-
- LOAD CSV WITH HEADERS FROM "file:///anti_fraud.csv" AS line
- MATCH (appl:Order{order_id:line.order_id,order_dt:line.order_dt})
- MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
- MERGE (person)-[r:person_appl{order_id:line.order_id}]->(appl);
-
- LOAD CSV WITH HEADERS FROM "file:///anti_fraud.csv" AS line
- MATCH (m:Mobilephone{mobile_phone:line.mobile_phone})
- MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
- MERGE (m)-[r:phone_appl{order_id:line.order_id}]->(person);
-
- LOAD CSV WITH HEADERS FROM "file:///anti_fraud.csv" AS line
- MATCH (h:HomeAddress{home_area:line.home_area,home_detail:line.home_detail})
- MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
- MERGE (h)-[r:homeAddr{order_id:line.order_id}]->(person);
-
- LOAD CSV WITH HEADERS FROM "file:///anti_fraud.csv" AS line
- MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
- MATCH (c:CompanyAddress{company_area:line.company_area,company_detail:line.company_detail})
- MERGE (c)-[r:companyAddr{order_id:line.order_id}]->(person);
创建import_csv_neo4j.sh文件,删除读取出import_neo4j.cql文件内容,创建实体与边。
- #!/bin/bash
- #--------------------------------------
- # filename: import_neo4j.sh
- # description: 将csv导入neo4j
- # author: test-abc
- # date: 2019.08.05
- #--------------------------------------
- neo4jhome='/home/neo4j/neo4j-community-3.5.7'
- username=neo4j
- passwd=neo4j
- cd $neo4jhome
-
- source /etc/profile
- #stop neo4j service
- echo 'stop neo4j service'
- ./bin/neo4j stop
-
- #delete neo4j database
- echo 'delete neo4j database'
- rm -rf $neo4jhome/data/databases/graph.db
-
- #start neo4j service
- echo 'start neo4j service'
- ./bin/neo4j start
-
- sleep 30
-
- echo "start importing neo4j"
- cat $neo4jhome/import/import_neo4j.cql | ./bin/cypher-shell -u $username -p $passwd > $neo4jhome/import/import_csv_neo4j.log
- echo "end of import neo4j"
给import_csv_neo4j.sh文件添加执行权限,执行该shell脚本
- chmod +x import_csv_neo4j.sh
- ./import_csv_neo4j.sh
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。