赞
踩
https://github.com/facebookresearch/pytorch_GAN_zoo
1、应该先启动visdom,然后再运行命令
python -m visdom.server
2、随机生成10个分子图像
the command of eval is:
python eval.py visualization -n molecular -m PGAN --save_dataset data/output --size_dataset 100
the mean of -m and -n is:
parser.add_argument('-m', '--module', help="Module to evaluate, available modules: PGAN, PPGAN, DCGAN", type=str, dest="module") parser.add_argument('-n', '--name', help="Model's name", type=str, dest="name")
notice1、: 这里一定要注意,-n 是模型的名称“pubchem”而不是它的全称"pubchem_s5_i62000.pt"
得到输出:
notice2、: 这里一定要注意,文件夹的格式如下所示:
应该在output_networks下面添加一个pubchem文件夹,然后将权重文件放进去
4、Inspirational generation
python eval.py inspirational_generation -n pubchem -m PGAN --input_image data/input/1.jpg -f data/output/feature_extractor
In this article, I will demonstrate how a GAN can be trained to generate molecular structures. This is the first step in automating molecule syntheses. Similar work is being done by Machine Learning for Pharmaceutical Discovery and Synthesis Consortium (Can AI Create molecules? by
). I am taking this example to accentuate the ease with which any individual can use AI to perform cutting edge research. The only limiting factor is imagination.
PubChem contains a list of ~96M chemical compounds in SMILES notation. SMILES notation can be classified into functional groups and converted into molecular structure images using python rdkit library. Out of 52 functional groups, only Alcohol Aliphatic functional group was used to limit the scope of this work. First 50,000(他们的实验用了5W张照片) compounds from Alcohol Aliphatic group were converted into images of size 128x128. The code to categorize and convert SMILES entries into images is available at github.
Next step was to select the GAN model. After a few failed attempts, I selected Progressive GAN’s (PGAN) pytorch implementation by facebook research (pytorch_GAN_zoo). Novelty of Progressive GAN is that it starts training with images at low resolution and add new layers that introduce higher-resolution details as the training progresses. According to its authors, these characteristics of PGAN makes it more stable and faster to train.
First step in training was to clone the pytorch_GAN_zoo:
git clone https://github.com/facebookresearch/pytorch_GAN_zoo.git
The training process for PGAN is slightly different than other GANs. Before starting the training, data has to be resized into low resolution images. This is done by giving the following command:
python datasets.py celeba $PATH_TO_IMAGES -o $OUTPUT_DIR -f
python datasets.py celeba256 data/train_datasets/opt_image256 -o data/train_datasets/opt_image256_out -f
In this command, “celeba” is the name of pre-trainned dataset. pytorch_GAN_zoo has multiple dataset pre-trainned on this model. “celeba” dataset corresponds to images of 128x128 pixel, which is same as size of images used in this project. Unless user is a hyper-parameter wizard, it is advisable to adapt data to the hyper-parameters of the pre-trainned model rather than tuning hyper parameters according to data. After first run, hyper-parameters can be changed to fine tune the model. Following Table contains the pre-trainned model names and supported image sizes.
Datasets.py will also generate a config file. This config file will contain:
1) path to the resized images and the original images.
2) number of iterations per scale. ‘Scale’ is a concept that is unique to PGAN. Every scale is associated with number of layers in the model, iterations and image resolution. Max scale is calculated using following formula.
image_size = 2**(2+max_scale)
In this formula constant 2 is added to max scale because the training layers start from resolution of 4x4. Datasets.py will resize images in steps of (64, 128, 512, 1024) until the image size of the data. “$PATH_TO_IMAGES” is the location of the training images. “OUTPUT_DIR” this is the location where resized images will be saved. “-f” option is to generate resized images before training starts. Once the resize images and config file is created, next step was to start training by giving the following command:
python train.py PGAN -c $CONFIG_FILE -n $DATASET_NAME -d $WEIGHTS_DIR
python train.py PGAN -c config_celeba256.json -n molecular -d data/train_datasets/opt_image256_out
In this command, “PGAN” refers to Progressive GAN. pytorch_GAN_zoo also support DCGAN. ”CONFIG_FILE” is the path to config file generated by datasets.py. “DATASET_NAME” is the name of custom dataset. “WEIGHTS_DIR” is the location where weights will be stored. More options are defined in train.py. I had used “ — np_vis” option to use numpy based visualization instead of installing “visdom” package. I had also set “-e” and “-s” option to 2000.
This training went on for 9 days on Tesla K80 based server. I stopped the training after 62000 iterations on last scale. Weights of the trained model can be downloaded using this link. One of the main reason GAN training takes long time is due to lack of transfer learning in generative models. Transfer learning reduces the data requirement and convergence time in discriminative models. There is a recent paper that propose to address this issue in GANs. One way of reducing training time is to use minimum sized images relevant to data.
In PGAN, training layer starts from 4x4 resolution and goes up to image size of the dataset. For image size of 128, layers were added in the scale of 4,8,16,32,64,128. Plotting training time vs iterations clearly show how addition of new layer increased the training time exponentially at every scale.
pytorch_GAN_zoo provides tools to evaluate the performance of the model on generated images. Most popular one is inception score. Inception score of generated images was calculated by giving following command.
python eval.py inception -c $CONFIGURATION_FILE -n $modelName -m $modelType -d $WEIGHTS_DIR
Sliced Wasserstein distance (SWD) is another method used to evaluate high-resolution GANs. laplacian SWD score was calculated by giving following command. More information about evaluating GAN performance is given in this paper.
python eval.py laplacian_SWD -c $CONFIGURATION_FILE -n $modelName -m $modelType -d $WEIGHTS_DIR
pytorch_GAN_zoo implements another tool, “Inspirational Adverserial Image generation”. This tool takes an image as input and extracts an input vector using gradient decent. This input vector is used to generate new images that share characteristics of the input image. Inspirational generation is a two step process.
python save_feature_extractor.py {vgg16, vgg19}\ $PATH_TO_THE_OUTPUT_FEATURE_EXTRACTOR --layers 3 4 5
python save_feature_extractor.py vgg19 data/output/feature_extractor --layers 3 4 5
In this command vgg16/vgg19 specify model to be used for input vector generation. Once the feature extractor was downloaded, following command was used to generate molecule structure based on
python eval.py inspirational_generation -n $modelName -m $modelType\ --inputImage $pathTotheInputImage -f \ $PATH_TO_THE_OUTPUT_FEATURE_EXTRACTOR -d $WEIGHTS_DIR
python eval.py inspirational_generation -n molecular -m PGAN --input_image data/input/11.png -f data/output/feature_extractor
Image below shows input and generated molecule structures. It is admirable how well ProGAN had learned input vector and generated pattern.
The model trained above will generate random molecular structures.
LC-GAN:https://arxiv.org/abs/1711.05772
https://towardsdatascience.com/molecule-synthesis-using-ai-10e0e1f89568
更新于2022-10-6-15:17
我将分子的杆和字母的粗细变化,然后重新训练看一下效果:
这次使用的数据量为100k整,上次未加粗之前是920K张
1、准备数据
python datasets.py celeba256 data/train_datasets/mol_image256bold -o data/train_datasets/mol_image256bold_out -f
2、开始训练
python train.py PGAN -c config_celeba256.json -n molecularBold -d data/train_datasets/mol_image256bold_out
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。