当前位置:   article > 正文

影响力最大化——CELF算法的简介与python实现

celf算法

        CELF算法是Leskovecl等人利用IC模型的子模特性对爬山贪心算法进一步改进得到的优化算法。子模函数的定义为:任意函数f(·)将有限集合映射为非负实数集并且满足收益递减特性即为子模函数。设集合s ∈T,任意元素v添加到集合S中获得的边际效益大于等于添加到集合T中所获得的边际效益。

        Kempe已经对独立级联模型和线性阈值模型的影响期望值函数加以证明,得出其满足子模特性。由子模特性,把一个节点v添加到结合S时,如果集合S越小,v节点的边际影响期望值就越大。因此,对于那些在前一轮中计算的边际影响期望值小于某些点在当前轮中所计算的结果,那么这些节点就不需要重新计算边际影响期望值,根据子模特性,可以省去大量节点的重复计算,从而在时间复杂度方面有了很大的提高,效率提升了约为爬山算法的700倍而精度上基本保持一致。

        CELF的原理是,假设已知A,B,C,D 四个节点的边际影响力范围增量分别是d(A)、d(B)、d(C)、d(D),且 d(A)> d(B)> d(C)> d(D),那么根据贪心策略,会将A加入到种子集合S当中之后,然后在下一轮继续计算B,C,D的边际影响力时,如果B此时的边际影响力d(B)>d(C),那么依据子模性,d(B)必定是这一轮所有节点的边际影响力当中最大的,因为C、D等其他节点在这一轮的边际增量必定小于它们各自在上一轮的增量,即d(C)<=d(C)<d(B),d(D)<=d(D)<d'(B),因此可直接把B加入到S当中去,这样就省去了计算C,D,E...等等节点边际影响力的步骤,大大提高了效率。

  1. import matplotlib.pyplot as plt
  2. from random import uniform, seed
  3. import numpy as np
  4. import time
  5. import networkx as nx
  6. from igraph import *
  7. def IC(g, S, p=0.5, mc=1000):
  8. spread = []
  9. for i in range(mc):
  10. # Simulate propagation process
  11. new_active, A = S[:], S[:]
  12. while new_active:
  13. # For each newly active node, find its neighbors that become activated
  14. new_ones = []
  15. for node in new_active:
  16. # Determine neighbors that become infected
  17. np.random.seed(i)
  18. success = np.random.uniform(0, 1, len(g.neighbors(node, mode="out"))) < p
  19. new_ones += list(np.extract(success, g.neighbors(node, mode="out")))
  20. new_active = list(set(new_ones) - set(A))
  21. # Add newly activated nodes to the set of activated nodes
  22. A += new_active
  23. spread.append(len(A))
  24. return (np.mean(spread))
  25. def celf(g, k, p=0.1, mc=1000):
  26. start_time = time.time()
  27. marg_gain = [IC(g, [node], p, mc) for node in range(g.vcount())]
  28. # Create the sorted list of nodes and their marginal gain
  29. Q = sorted(zip(range(g.vcount()), marg_gain), key=lambda x: x[1], reverse=True)
  30. # Select the first node and remove from candidate list
  31. S, spread, SPREAD = [Q[0][0]], Q[0][1], [Q[0][1]]
  32. Q, LOOKUPS, timelapse = Q[1:], [g.vcount()], [time.time() - start_time]
  33. for _ in range(k - 1):
  34. check, node_lookup = False, 0
  35. while not check:
  36. # Count the number of times the spread is computed
  37. node_lookup += 1
  38. # Recalculate spread of top node
  39. current = Q[0][0]
  40. # Evaluate the spread function and store the marginal gain in the list
  41. Q[0] = (current, IC(g, S + [current], p, mc) - spread)
  42. # Re-sort the list
  43. Q = sorted(Q, key=lambda x: x[1], reverse=True)
  44. # Check if previous top node stayed on top after the sort
  45. check = (Q[0][0] == current)
  46. # Select the next node
  47. spread += Q[0][1]
  48. S.append(Q[0][0])
  49. SPREAD.append(spread)
  50. LOOKUPS.append(node_lookup)
  51. timelapse.append(time.time() - start_time)
  52. # Remove the selected node from the list
  53. Q = Q[1:]
  54. return (S, SPREAD, timelapse, LOOKUPS)
  55. source = []
  56. target = []
  57. network_file = 'E:/DPSO_demo-master/CA-HepPh.txt'
  58. file = open(network_file, "r", encoding="utf-8")
  59. Vnumber, Enumber = file.readline().split()
  60. Enumber = int(Enumber)
  61. for i in range(0, Enumber):
  62. start, end = file.readline().split()
  63. source.append(int(start))
  64. target.append(int(end))
  65. print('之前', source)
  66. print('之前', target)
  67. A = set(source + target)
  68. g = Graph(directed=True)
  69. g.add_vertices(range(100000))
  70. g.add_edges(list(zip(source,target)))
  71. start_time = time.time()
  72. # Run algorithms
  73. celf_output = celf(g,50,p = 0.01,mc = 1000)
  74. end_time = time.time()
  75. runningtime1 = end_time - start_time
  76. print("总时间:", runningtime1)
  77. # Print results
  78. print("celf output: " + str(celf_output[0]))

CELF介绍与图的引用:胡怀雄. 基于独立级联模型的社交网络影响力最大化研究[D]. 深圳大学, 2018. 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/运维做开发/article/detail/773903
推荐阅读
相关标签
  

闽ICP备14008679号