当前位置:   article > 正文

MYSQL数据以csv文件到入neo4j:反欺诈简单案例_fraud.csv

fraud.csv

       该种方式导入只适合小数据量的方式,并且节点的id创建困难的情况下。在实测的情况下,200万条数据,每条数据创建9个节点,8条边的情况下需要7~8分钟。

1.测试数据准备

    数据规则,一个订单有多个角色,一个客户可以属于多个订单,一个客户可谓不同角色,具体表结构如下:

字段类型注释
order_idvarchar(50)订单id
order_dtvarchar(50)订单日期
cust_namevarchar(50)客户名称
id_novarchar(50)证件号
role_typevarchar(50)角色
mobile_phonevarchar(50)手机号
home_areavarchar(50)家庭区域编码
home_detailvarchar(50)家庭详细地址
company_areavarchar(50)公司区域编码
company_detailvarchar(50)公司详细地址

  导出的CVS文件anti_fraud.csv,字符集为UTF-8,内容如下:

  1. order_id,order_dt,cust_name,id_no,role_type,mobile_phone,home_area,home_detail,company_area,company_detail
  2. "201908010001","2019-08-01","张三","1001","01","10000010001","111101","家庭地址1","111101","公司地址1"
  3. "201908010001","2019-08-01","赵四","1002","02","10000010002","111101","家庭地址2","111101","公司地址2"
  4. "201908020002","2019-08-02","王一","1003","01","10000010003","111102","家庭地址3","111102","公司地址3"
  5. "201908020002","2019-08-02","李二","1004","02","10000010004","111102","家庭地址4","111102","公司地址4"
  6. "201908020003","2019-08-02","黄五","1005","01","10000010001","111102","家庭地址5","111103","公司地址5"
  7. "201908020003","2019-08-02","马六","1006","02","10000010006","111102","家庭地址6","111102","公司地址6"
  8. "201908020004","2019-08-02","张三","1001","02","10000010001","111102","家庭地址1","111103","公司地址5"
  9. "201908020004","2019-08-02","伟奇","1007","01","10000010007","111102","家庭地址7","111102","公司地址7"

2.neo4j关系设计

2.1 node设计

     将不同的信息放入不同的实体中,不同的实体中属性名称可以重复,但不推荐,具体实体如下:

订单实体          (Order{order_id,order_dt})
人实体             (Person{cust_name,id_no,role_type})
手机号码实体  (Mobilephone{mobile_phone})
家庭地址实体  (HomeAddress{home_area,home_detail})
公司地址实体  (CompanyAddress{company_area,company_detail})

2.2 label设计

    边可以添加属性,用与区分实体之间的关系,边的信息如下:

人与订单关系          (Person)-[person_appl{order_id}]->(Order)
手机号与人关系      (Mobilephone)-[phone_appl{order_id}]->(Person)
家庭地址与人关系  (HomeAddress)-[homeAddr{order_id}]->(Person)
公司地址与人关系  (companyAddr)-[companyAddr{order_id}]->(Person)

3.csv导入neo4j

    首先将csv文件放到neo4j根目录下的import文件夹下,

3.1创建node

     唯一的节点创建从数据表中取出并不困难,将每一个节点的数据保存为一个文件,并且去重去空,5个节点分别保存5个文件。例如取出order节点的csv的sql语句为:

select distinct order_id,order_dt from anti_fraud where order_dt is not null and order_id is not null;

   将节点导入neo4j数据库,在创建接节点时需要首先创建索引(注:实测过程中先创建索引后创建节点索引才生效),提高边的效率,创建语句如下:

  1. create index on :Order(order_id,order_dt) ;
  2. create index on :Person(cust_name,id_no,role_type) ;
  3. create index on :Mobilephone(mobile_phone);
  4. create index on :HomeAddress(home_area,home_detail);
  5. create index on :CompanyAddress(company_area,company_detail);
  6. LOAD CSV WITH HEADERS FROM "file:///order.csv" AS line
  7. CREATE (appl:Order{order_id:line.order_id,order_dt:line.order_dt});
  8. LOAD CSV WITH HEADERS FROM "file:///person.csv" AS line
  9. CREATE (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type});
  10. LOAD CSV WITH HEADERS FROM "file:///mobilephone.csv" AS line
  11. CREATE (m:Mobilephone{mobile_phone:line.mobile_phone});
  12. LOAD CSV WITH HEADERS FROM "file:///homeAddress.csv" AS line
  13. CREATE (h:HomeAddress{home_area:line.home_area,home_detail:line.home_detail});
  14. LOAD CSV WITH HEADERS FROM "file:///companyAddress.csv" AS line
  15. CREATE (c:CompanyAddress{company_area:line.company_area,company_detail:line.company_detail});

 

3.2 创建label

    在创建边时关系以两个实体创建方案最优,如创建实体时多个match命令为交集,若一个实体未查询到,则关系不创建,创建边的关系如下:

  1. LOAD CSV WITH HEADERS FROM "file:///anti_fraud.csv" AS line
  2. MATCH (appl:Order{order_id:line.order_id,order_dt:line.order_dt})
  3. MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
  4. MERGE (person)-[r:person_appl{order_id:line.order_id}]->(appl);
  5. LOAD CSV WITH HEADERS FROM "file:///anti_fraud.csv" AS line
  6. MATCH (m:Mobilephone{mobile_phone:line.mobile_phone})
  7. MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
  8. MERGE (m)-[r:phone_appl{order_id:line.order_id}]->(person);
  9. LOAD CSV WITH HEADERS FROM "file:///anti_fraud.csv" AS line
  10. MATCH (h:HomeAddress{home_area:line.home_area,home_detail:line.home_detail})
  11. MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
  12. MERGE (h)-[r:homeAddr{order_id:line.order_id}]->(person);
  13. LOAD CSV WITH HEADERS FROM "file:///anti_fraud.csv" AS line
  14. MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
  15. MATCH (c:CompanyAddress{company_area:line.company_area,company_detail:line.company_detail})
  16. MERGE (c)-[r:companyAddr{order_id:line.order_id}]->(person);

3.3 执行命令

    以shell方式批量执行,到neo4j下home 执行./bin/cypher-shell -u neo4j -p neo4j,执行以上命令便可以创建,创建内容如下;

 

4.将执行命令包装为shell脚本

4.1 创建neo4j命令文件

       若在少量情况下可以用命令语句删除,删除语句如下:

MATCH data=(n)-[r1]->(person:Person)-[r2]->(appl:Appl) delete n,r1,person,r2,appl;

     在实际情况中,由于服务器资源有限,大量的查询删除会出现内存不足的情况。因此在全量删除数据时可以将整个数据库删除重启,具体步骤如下:

  1. 1.停止neo4j服务
  2. ./bin/neo4j stop
  3. 2.删除neo4j数据库文件,最好最备份
  4. rm -rf $neo4jhome/data/databases/graph.db
  5. 3.启动neo4j服务
  6. ./bin/neo4j start

   创建import_neo4j.cql文件,将执行的neo4j命令写入一个文件,前两行为批量删除,以全量导入的方式实现,若是增量导入,去掉前两行删除命令,文件内容如下:

  1. create index on :Order(order_id,order_dt) ;
  2. create index on :Person(cust_name,id_no,role_type) ;
  3. create index on :Mobilephone(mobile_phone);
  4. create index on :HomeAddress(home_area,home_detail);
  5. create index on :CompanyAddress(company_area,company_detail);
  6. LOAD CSV WITH HEADERS FROM "file:///order.csv" AS line
  7. CREATE (appl:Order{order_id:line.order_id,order_dt:line.order_dt});
  8. LOAD CSV WITH HEADERS FROM "file:///person.csv" AS line
  9. CREATE (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type});
  10. LOAD CSV WITH HEADERS FROM "file:///mobilephone.csv" AS line
  11. CREATE (m:Mobilephone{mobile_phone:line.mobile_phone});
  12. LOAD CSV WITH HEADERS FROM "file:///homeAddress.csv" AS line
  13. CREATE (h:HomeAddress{home_area:line.home_area,home_detail:line.home_detail});
  14. LOAD CSV WITH HEADERS FROM "file:///companyAddress.csv" AS line
  15. CREATE (c:CompanyAddress{company_area:line.company_area,company_detail:line.company_detail});
  16. LOAD CSV WITH HEADERS FROM "file:///anti_fraud.csv" AS line
  17. MATCH (appl:Order{order_id:line.order_id,order_dt:line.order_dt})
  18. MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
  19. MERGE (person)-[r:person_appl{order_id:line.order_id}]->(appl);
  20. LOAD CSV WITH HEADERS FROM "file:///anti_fraud.csv" AS line
  21. MATCH (m:Mobilephone{mobile_phone:line.mobile_phone})
  22. MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
  23. MERGE (m)-[r:phone_appl{order_id:line.order_id}]->(person);
  24. LOAD CSV WITH HEADERS FROM "file:///anti_fraud.csv" AS line
  25. MATCH (h:HomeAddress{home_area:line.home_area,home_detail:line.home_detail})
  26. MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
  27. MERGE (h)-[r:homeAddr{order_id:line.order_id}]->(person);
  28. LOAD CSV WITH HEADERS FROM "file:///anti_fraud.csv" AS line
  29. MATCH (person:Person{cust_name:line.cust_name,id_no:line.id_no,role_type:line.role_type})
  30. MATCH (c:CompanyAddress{company_area:line.company_area,company_detail:line.company_detail})
  31. MERGE (c)-[r:companyAddr{order_id:line.order_id}]->(person);

4.2 创建调用cql文件

    创建import_csv_neo4j.sh文件,删除读取出import_neo4j.cql文件内容,创建实体与边。

  1. #!/bin/bash
  2. #--------------------------------------
  3. # filename: import_neo4j.sh
  4. # description: 将csv导入neo4j
  5. # author: test-abc
  6. # date: 2019.08.05
  7. #--------------------------------------
  8. neo4jhome='/home/neo4j/neo4j-community-3.5.7'
  9. username=neo4j
  10. passwd=neo4j
  11. cd $neo4jhome
  12. source /etc/profile
  13. #stop neo4j service
  14. echo 'stop neo4j service'
  15. ./bin/neo4j stop
  16. #delete neo4j database
  17. echo 'delete neo4j database'
  18. rm -rf $neo4jhome/data/databases/graph.db
  19. #start neo4j service
  20. echo 'start neo4j service'
  21. ./bin/neo4j start
  22. sleep 30
  23. echo "start importing neo4j"
  24. cat $neo4jhome/import/import_neo4j.cql | ./bin/cypher-shell -u $username -p $passwd > $neo4jhome/import/import_csv_neo4j.log
  25. echo "end of import neo4j"

4.3 调用shell脚本

     给import_csv_neo4j.sh文件添加执行权限,执行该shell脚本

  1. chmod +x import_csv_neo4j.sh
  2. ./import_csv_neo4j.sh

 

 

本文内容由网友自发贡献,转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号