赞
踩
在大数据时代,生成逼真的时间序列数据对于负载平衡、负载预测和智能资源配置等方面至关重要。多变量时间序列数据生成模型基于生成对抗网络(GAN)的能力,能够在不泄露真实数据隐私的前提下生成相似的合成数据。本文介绍了一种用于生成多变量时间序列数据的GAN模型。
云和边缘计算领域的监控数据通常是商业机密或受到数据法规(如GDPR)的保护,获取真实数据用于研究和开发变得非常困难。为了应对这一挑战,研究人员使用合成数据来填补数据空缺。在这种背景下,多变量时间序列生成模型通过GAN的使用,为生成任意数量的时间序列工作负载数据提供了一种新方法。其目标是学习真实生产工作负载的概率分布,并生成统计上相似的时间序列数据。
GAN由两个人工神经网络组成:判别器(Discriminator)和生成器(Generator),通过最小最大博弈进行训练。
y
=
D
(
h
output
)
y = D(h_{\text{output}})
y=D(houtput)
h
^
S
=
g
S
(
z
S
)
,
h
^
t
=
g
X
(
h
^
S
,
h
^
t
−
1
,
z
t
)
\hat{h}_S = g_S(z_S), \quad \hat{h}_t = g_X(\hat{h}_S, \hat{h}_{t-1}, z_t)
h^S=gS(zS),h^t=gX(h^S,h^t−1,zt)
import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import DataLoader, TensorDataset # 定义生成器网络 class Generator(nn.Module): def __init__(self, input_dim, hidden_dim, output_dim): super(Generator, self).__init__() self.lstm = nn.LSTM(input_dim, hidden_dim, num_layers=2, batch_first=True) self.fc = nn.Linear(hidden_dim, output_dim) def forward(self, x): # 确保 x 是三维的 if x.dim() == 2: x = x.unsqueeze(1) # 添加时间维度 batch_size = x.size(0) h_0 = torch.zeros(2, batch_size, hidden_dim).to(x.device) c_0 = torch.zeros(2, batch_size, hidden_dim).to(x.device) out, _ = self.lstm(x, (h_0, c_0)) out = self.fc(out[:, -1, :]) # 使用最后一个时间步的输出 return out # 定义判别器网络 class Discriminator(nn.Module): def __init__(self, input_dim, hidden_dim): super(Discriminator, self).__init__() self.lstm = nn.LSTM(input_dim, hidden_dim, num_layers=2, batch_first=True) self.fc = nn.Linear(hidden_dim, 1) self.sigmoid = nn.Sigmoid() def forward(self, x): # 确保 x 是三维的 if x.dim() == 2: x = x.unsqueeze(1) # 添加时间维度 batch_size = x.size(0) h_0 = torch.zeros(2, batch_size, hidden_dim).to(x.device) c_0 = torch.zeros(2, batch_size, hidden_dim).to(x.device) out, _ = self.lstm(x, (h_0, c_0)) out = self.fc(out[:, -1, :]) # 使用最后一个时间步的输出 out = self.sigmoid(out) return out # 超参数设置 input_dim = 100 # 输入维度 hidden_dim = 64 # 隐藏层维度 output_dim = 100 # 输出维度 batch_size = 32 num_epochs = 1000 learning_rate = 0.0002 # 初始化生成器和判别器 generator = Generator(input_dim, hidden_dim, output_dim) discriminator = Discriminator(output_dim, hidden_dim) # 优化器 g_optimizer = optim.Adam(generator.parameters(), lr=learning_rate) d_optimizer = optim.Adam(discriminator.parameters(), lr=learning_rate) # 损失函数 criterion = nn.BCELoss() # 加载数据(使用随机数据进行示例) real_data = torch.randn(1000, input_dim).unsqueeze(1) # 确保输入是三维的 dataloader = DataLoader(real_data, batch_size=batch_size, shuffle=True) # 训练GAN模型 for epoch in range(num_epochs): for real_samples in dataloader: # 训练判别器 real_samples = real_samples.float() batch_size = real_samples.size(0) real_labels = torch.ones(batch_size, 1) fake_labels = torch.zeros(batch_size, 1) # 生成假的样本 noise = torch.randn(batch_size, input_dim).unsqueeze(1) # 确保噪声是三维的 fake_samples = generator(noise) # 计算判别器损失 d_real_loss = criterion(discriminator(real_samples), real_labels) d_fake_loss = criterion(discriminator(fake_samples.detach()), fake_labels) d_loss = d_real_loss + d_fake_loss d_optimizer.zero_grad() d_loss.backward() d_optimizer.step() # 训练生成器 noise = torch.randn(batch_size, input_dim).unsqueeze(1) # 确保噪声是三维的 fake_samples = generator(noise) g_loss = criterion(discriminator(fake_samples), real_labels) g_optimizer.zero_grad() g_loss.backward() g_optimizer.step() # 打印损失 if (epoch + 1) % 100 == 0: print(f'Epoch [{epoch + 1}/{num_epochs}], d_loss: {d_loss.item():.4f}, g_loss: {g_loss.item():.4f}')
[1] Leznik, M., Michalsky, P., Willis, P., Schanzel, B., Östberg, P., & Domaschka, J. (2021). Multivariate Time Series Synthesis Using Generative Adversarial Networks. In Proceedings of the 2021 ACM/SPEC International Conference on Performance Engineering (ICPE ’21), April 19–23, 2021, Virtual Event, France. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3427921.3450257
[2] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。