赞
踩
最近针对犯罪时空预测、犯罪分布可视化开展研究,图神经网络是必不可少的研究工具之一,为了记录学习PyG的过程,本文通过结合官网案例(非常晦涩难懂)以及网上各位大佬的学习过程,撰写此文章,以此记录学习过程,以防后面遗忘,如有错误请嘴下留情。
此处省略安装过程
本文采用的是美国纽约州的犯罪数据,具体可视化如图所示:
数据被分为训练集、验证集、测试集,数据形状为四维数据:[16, 16, 608, 4],其中分别代表划分成16*16的区域,每个区域上608个时间段,4种犯罪类型。输入是过去n天的犯罪矩阵,输出是n+1天的犯罪矩阵。
用图神经网络的重要一步就是构建图,如何构建图呢?PyG还是比较人性的,在对数据进行转换后,通过KNN算法,筛选出邻近节点,然后构建边,这里也需要注意,数据的输入维度需要按照PyG指定的格式才行。
edge_index = knn(torch.from_numpy(train.reshape(256,-1)),
torch.from_numpy(train.reshape(256,-1)),3)
graph = Data(x=TempTrain,edge_index=edge_index)
将数据处理成图结构后,需要对数据进行封装,熟悉torch的小伙伴也应该也一看就懂,PyG的封装格式与torch相似,这里介绍下将自己的数据处理成PyG格式的方法,借鉴了@Cyril_KI大佬的代码格式,自己进行修改:
def process(dataset, batch_size, step_size, shuffle): nodes, timeLength = dataset.shape[0], dataset.shape[1] # print(nodes,timeLength) dataset = dataset.tolist() graphs = [] for i in tqdm(range(0, timeLength - seq_len - pred_step_size, step_size)): train_seq = [] for j in range(i, i + seq_len): x = [] for c in range(nodes): ##存储过去时间段的二维图矩阵 x.append(dataset[c][j]) temp = functools.reduce(operator.concat, x) # print(temp) train_seq.append(temp) # 下几个时刻的所有变量 train_labels = [] for k in range(i + seq_len, i+seq_len+step_size): train_label = [] for j in range(nodes): train_label.append(dataset[j][k]) # print(train_label) temp1 = functools.reduce(operator.concat, train_label) # print(temp1) train_labels.append(temp1) # tensor # print(np.ndarray(train_seq).shape, np.ndarray(train_labels).shape) train_seq = torch.FloatTensor(train_seq) train_labels = torch.FloatTensor(train_labels) # print(train_seq.shape, train_labels.shape) # 24 13, 13 1 temp = Data(x=train_seq.T, edge_index=graph.edge_index, y=train_labels) # print(temp) graphs.append(temp) loader = torch_geometric.loader.DataLoader(graphs, batch_size=batch_size, shuffle=shuffle, drop_last=True) return loader,graphs Dtr,Dtrgraphs = process(train, B, step_size=1, shuffle=True) Val,Valgraphs = process(val, B, step_size=1, shuffle=True) Dte,Dtegraphs = process(test, B, step_size=pred_step_size, shuffle=True)
此处构建一个GCN+LSTM模型,简单的来说,核心是有batchsize那么多个图,拼成一个大的,然后你分别用网络处理,处理后呢,再返回成batchsize那么多个图
class GCN(torch.nn.Module): def __init__(self, in_feats, h_feats, out_feats,batchsize): super(GCN, self).__init__() self.conv1 = GCNConv(in_feats, h_feats) self.conv2 = GCNConv(h_feats, out_feats) self.fc = nn.Sequential( nn.Linear(128, out_feats), nn.ReLU(), nn.Linear(out_feats, 1), ) self.batch = batchsize self.lstm = nn.LSTM(out_feats, 128, batch_first=True, dropout=0.5) def forward(self, data): x, edge_index, batch = data.x, data.edge_index, data.batch x = x.float() x = F.elu(self.conv1(x, edge_index)) x = self.conv2(x, edge_index)) x = x.view(len(x),1,-1) x,_ = self.lstm(x) out = self.fc(x) out = torch.reshape(out,(self.batch,-1)) return out
此处代码可参照大佬@Cyril_KI之前写的模板,此处不再叙述
由于实验不多,仅仅用了传统CNN算法、GCN算法、和GCN-LSTM算法进行分析,在Mape上指标差异巨大,GCN算法提升效果很高,感兴趣的可以自己试试,相对GCN呢,引入LSTM处理时序数据,在一定程度上可以提高点数。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。