赞
踩
CELF算法是Leskovecl等人利用IC模型的子模特性对爬山贪心算法进一步改进得到的优化算法。子模函数的定义为:任意函数f(·)将有限集合映射为非负实数集并且满足收益递减特性即为子模函数。设集合s ∈T,任意元素v添加到集合S中获得的边际效益大于等于添加到集合T中所获得的边际效益。
Kempe已经对独立级联模型和线性阈值模型的影响期望值函数加以证明,得出其满足子模特性。由子模特性,把一个节点v添加到结合S时,如果集合S越小,v节点的边际影响期望值就越大。因此,对于那些在前一轮中计算的边际影响期望值小于某些点在当前轮中所计算的结果,那么这些节点就不需要重新计算边际影响期望值,根据子模特性,可以省去大量节点的重复计算,从而在时间复杂度方面有了很大的提高,效率提升了约为爬山算法的700倍而精度上基本保持一致。
CELF的原理是,假设已知A,B,C,D 四个节点的边际影响力范围增量分别是d(A)、d(B)、d(C)、d(D),且 d(A)> d(B)> d(C)> d(D),那么根据贪心策略,会将A加入到种子集合S当中之后,然后在下一轮继续计算B,C,D的边际影响力时,如果B此时的边际影响力d(B)>d(C),那么依据子模性,d(B)必定是这一轮所有节点的边际影响力当中最大的,因为C、D等其他节点在这一轮的边际增量必定小于它们各自在上一轮的增量,即d(C)<=d(C)<d(B),d(D)<=d(D)<d'(B),因此可直接把B加入到S当中去,这样就省去了计算C,D,E...等等节点边际影响力的步骤,大大提高了效率。
- import matplotlib.pyplot as plt
- from random import uniform, seed
- import numpy as np
- import time
- import networkx as nx
- from igraph import *
-
-
- def IC(g, S, p=0.5, mc=1000):
-
- spread = []
- for i in range(mc):
-
- # Simulate propagation process
- new_active, A = S[:], S[:]
- while new_active:
-
- # For each newly active node, find its neighbors that become activated
- new_ones = []
- for node in new_active:
- # Determine neighbors that become infected
- np.random.seed(i)
- success = np.random.uniform(0, 1, len(g.neighbors(node, mode="out"))) < p
- new_ones += list(np.extract(success, g.neighbors(node, mode="out")))
-
- new_active = list(set(new_ones) - set(A))
-
- # Add newly activated nodes to the set of activated nodes
- A += new_active
-
- spread.append(len(A))
-
- return (np.mean(spread))
-
-
-
- def celf(g, k, p=0.1, mc=1000):
-
- start_time = time.time()
- marg_gain = [IC(g, [node], p, mc) for node in range(g.vcount())]
-
- # Create the sorted list of nodes and their marginal gain
- Q = sorted(zip(range(g.vcount()), marg_gain), key=lambda x: x[1], reverse=True)
-
- # Select the first node and remove from candidate list
- S, spread, SPREAD = [Q[0][0]], Q[0][1], [Q[0][1]]
- Q, LOOKUPS, timelapse = Q[1:], [g.vcount()], [time.time() - start_time]
-
- for _ in range(k - 1):
-
- check, node_lookup = False, 0
-
- while not check:
- # Count the number of times the spread is computed
- node_lookup += 1
-
- # Recalculate spread of top node
- current = Q[0][0]
-
- # Evaluate the spread function and store the marginal gain in the list
- Q[0] = (current, IC(g, S + [current], p, mc) - spread)
-
- # Re-sort the list
- Q = sorted(Q, key=lambda x: x[1], reverse=True)
-
- # Check if previous top node stayed on top after the sort
- check = (Q[0][0] == current)
-
- # Select the next node
- spread += Q[0][1]
- S.append(Q[0][0])
- SPREAD.append(spread)
- LOOKUPS.append(node_lookup)
- timelapse.append(time.time() - start_time)
-
- # Remove the selected node from the list
- Q = Q[1:]
-
- return (S, SPREAD, timelapse, LOOKUPS)
-
- source = []
- target = []
- network_file = 'E:/DPSO_demo-master/CA-HepPh.txt'
- file = open(network_file, "r", encoding="utf-8")
- Vnumber, Enumber = file.readline().split()
- Enumber = int(Enumber)
- for i in range(0, Enumber):
- start, end = file.readline().split()
- source.append(int(start))
- target.append(int(end))
-
- print('之前', source)
- print('之前', target)
- A = set(source + target)
- g = Graph(directed=True)
- g.add_vertices(range(100000))
- g.add_edges(list(zip(source,target)))
-
- start_time = time.time()
-
- # Run algorithms
- celf_output = celf(g,50,p = 0.01,mc = 1000)
- end_time = time.time()
- runningtime1 = end_time - start_time
- print("总时间:", runningtime1)
- # Print results
- print("celf output: " + str(celf_output[0]))
CELF介绍与图的引用:胡怀雄. 基于独立级联模型的社交网络影响力最大化研究[D]. 深圳大学, 2018.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。