上周, 我写了关于中间性中心性算法以及使用graphstream 理解它的尝试 ,在阅读源代码时,我意识到我可以使用neo4j的所有最短路径算法将某些东西放在一起。
概括地说,中间性中心度算法用于确定图中节点的负载和重要性。
在与Jen讨论这一点时,她指出,计算整个图上节点之间的中间性通常是没有意义的。 但是,了解在您感兴趣的较小子图中哪个节点最重要可能很有用。
在这种情况下,我有兴趣在一个非常小的有向图中确定节点的中间性:
让我们简要回顾一下算法:
[中间性中心]等于从所有顶点到经过该节点的所有其他顶点的最短路径数。
这意味着我们排除了直接在两个节点之间而不经过任何其他路径的任何路径,这是我最初没有掌握的。
如果我们手工确定适用的路径,我们将得到以下结果:
- A -> B: Direct Path Exists
- A -> C: B
- A -> D: E
- A -> E: Direct Path Exists
- B -> A: No Path Exists
- B -> C: Direct Path Exists
- B -> D: E or C
- B -> E: Direct Path Exists
- C -> A: No Path Exists
- C -> B: No Path Exists
- C -> D: Direct Path Exists
- C -> E: No Path Exists
- D -> A: No Path Exists
- D -> B: No Path Exists
- D -> C: No Path Exists
- D -> E: No Path Exists
- E -> A: No Path Exists
- E -> B: No Path Exists
- E -> C: No Path Exists
- E -> D: Direct Path Exists
给出以下中间性中心值:
- A: 0
- B: 1
- C: 0.5
- D: 0
- E: 1.5
我们可以针对最新版本的graphstream (考虑方向)编写测试,以确认我们的手动算法:
- @Test
- public void calculateBetweennessCentralityOfMySimpleGraph() {
- Graph graph = new SingleGraph("Tutorial 1");
-
- Node A = graph.addNode("A");
- Node B = graph.addNode("B");
- Node E = graph.addNode("E");
- Node C = graph.addNode("C");
- Node D = graph.addNode("D");
-
- graph.addEdge("AB", A, B, true);
- graph.addEdge("BE", B, E, true);
- graph.addEdge("BC", B, C, true);
- graph.addEdge("ED", E, D, true);
- graph.addEdge("CD", C, D, true);
- graph.addEdge("AE", A, E, true);
-
- BetweennessCentrality bcb = new BetweennessCentrality();
- bcb.computeEdgeCentrality(false);
- bcb.betweennessCentrality(graph);
-
- System.out.println("A="+ A.getAttribute("Cb"));
- System.out.println("B="+ B.getAttribute("Cb"));
- System.out.println("C="+ C.getAttribute("Cb"));
- System.out.println("D="+ D.getAttribute("Cb"));
- System.out.println("E="+ E.getAttribute("Cb"));
- }
输出是预期的:
- A=0.0
- B=1.0
- C=0.5
- D=0.0
- E=1.5
我想看看是否可以使用neo4j做同样的事情,所以我使用以下cypher语句在空白数据库中创建了图形:
- CREATE (A {name: "A"})
- CREATE (B {name: "B"})
- CREATE (C {name: "C"})
- CREATE (D {name: "D"})
- CREATE (E {name: "E"})
-
- CREATE A-[:TO]->E
- CREATE A-[:TO]->B
- CREATE B-[:TO]->C
- CREATE B-[:TO]->E
- CREATE C-[:TO]->D
- CREATE E-[:TO]->D
然后,我编写了一个查询,该查询找到了图中所有节点集之间的最短路径:
- MATCH p = allShortestPaths(source-[r:TO*]->destination)
- WHERE source <> destination
- RETURN NODES(p)
如果运行,它将返回以下内容:
- ==> +---------------------------------------------------------+
- ==> | NODES(p) |
- ==> +---------------------------------------------------------+
- ==> | [Node[1]{name:"A"},Node[2]{name:"B"}] |
- ==> | [Node[1]{name:"A"},Node[2]{name:"B"},Node[3]{name:"C"}] |
- ==> | [Node[1]{name:"A"},Node[5]{name:"E"},Node[4]{name:"D"}] |
- ==> | [Node[1]{name:"A"},Node[5]{name:"E"}] |
- ==> | [Node[2]{name:"B"},Node[3]{name:"C"}] |
- ==> | [Node[2]{name:"B"},Node[3]{name:"C"},Node[4]{name:"D"}] |
- ==> | [Node[2]{name:"B"},Node[5]{name:"E"},Node[4]{name:"D"}] |
- ==> | [Node[2]{name:"B"},Node[5]{name:"E"}] |
- ==> | [Node[3]{name:"C"},Node[4]{name:"D"}] |
- ==> | [Node[5]{name:"E"},Node[4]{name:"D"}] |
- ==> +---------------------------------------------------------+
- ==> 10 rows
我们仍在返回节点之间的直接链接,但是通过基于路径中节点的数量过滤结果可以很容易地纠正这一点:
- MATCH p = allShortestPaths(source-[r:TO*]->destination)
- WHERE source <> destination AND LENGTH(NODES(p)) > 2
- RETURN EXTRACT(n IN NODES(p): n.name)
- ==> +--------------------------------+
- ==> | EXTRACT(n IN NODES(p): n.name) |
- ==> +--------------------------------+
- ==> | ["A","B","C"] |
- ==> | ["A","E","D"] |
- ==> | ["B","C","D"] |
- ==> | ["B","E","D"] |
- ==> +--------------------------------+
- ==> 4 rows
如果我们稍微调整密码查询,我们可以获得每个源/目标的最短路径的集合:
- MATCH p = allShortestPaths(source-[r:TO*]->destination)
- WHERE source <> destination AND LENGTH(NODES(p)) > 2
- WITH EXTRACT(n IN NODES(p): n.name) AS nodes
- RETURN HEAD(nodes) AS source,
- HEAD(TAIL(TAIL(nodes))) AS destination,
- COLLECT(nodes) AS paths
- ==> +------------------------------------------------------+
- ==> | source | destination | paths |
- ==> +------------------------------------------------------+
- ==> | "A" | "D" | [["A","E","D"]] |
- ==> | "A" | "C" | [["A","B","C"]] |
- ==> | "B" | "D" | [["B","C","D"],["B","E","D"]] |
- ==> +------------------------------------------------------+
- ==> 3 rows
当我们有一种使用cypher来对集合进行切片的方法时,从这里获得节点间的中间性得分并不难,但是现在使用通用的编程语言要容易得多。
在这种情况下,我使用了Ruby并提出了以下代码:
- require 'neography'
- neo = Neography::Rest.new
-
- query = " MATCH p = allShortestPaths(source-[r:TO*]->destination)"
- query << " WHERE source <> destination AND LENGTH(NODES(p)) > 2"
- query << " WITH EXTRACT(n IN NODES(p): n.name) AS nodes"
- query << " RETURN HEAD(nodes) AS source, HEAD(TAIL(TAIL(nodes))) AS destination, COLLECT(nodes) AS paths"
-
- betweenness_centrality = { "A" => 0, "B" => 0, "C" => 0, "D" => 0, "E" => 0 }
-
- neo.execute_query(query)["data"].map { |row| row[2].map { |path| path[1..-2] } }.each do |potential_central_nodes|
- number_of_central_nodes = potential_central_nodes.size
- potential_central_nodes.each do |nodes|
- nodes.each { |node| betweenness_centrality[node] += (1.0 / number_of_central_nodes) }
- end
- end
-
- p betweenness_centrality
输出以下内容:
- $ bundle exec ruby centrality.rb
- {"A"=>0, "B"=>1.0, "C"=>0.5, "D"=>0, "E"=>1.5}
它似乎可以完成任务,但是我敢肯定,在某些情况下,它无法处理成熟的库需要处理的问题。 作为一个实验,看看有什么可能,我认为还算不错!
该图位于neo4j控制台上 ,以防有人感兴趣。