当前位置:   article > 正文

Python基于The Paintings Dataset实现绘画主题识别_painting数据集

painting数据集

The Paintings Dataset数据集主要是Visual Geometry Group官方开源出来的以绘画为主题的数据集,不同的绘画数据有着不同的主题,可以是一个单独的主题也可以是多个混合的主题组成的,所以这个数据集用来做图像识别任务的话就是一个比较经典的多任务学习模型了,这个在我之前的云状识别一文里面也有提到,这里就不再多赘述了。

首先看下官方数据集介绍,截图如下所示:

数据详情统计如下所示:

这里官方提供的数据集形式有别于VOC这类的数据集,他不是直接的图像形式的数据集,而是一堆链接,需要自己下载下来,这里官方一共提供了三个年份的版本的数据集可供使用,如下所示:

         这里我直接使用的是2021年,也就是最新的数据集,下载下来数据集样例如下所示:

  1. Image URL,Web page URL,Subset,Labels
  2. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NID/QUB/NID_QUB_QUB_264-001.jpg,https://artuk.org/discover/artworks/and-the-cow-jumped-over-the-moon-168957,'test',' cow'
  3. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/GMIII/MOSI/GMIII_MOSI_A1978_72_3-001.jpg,https://artuk.org/discover/artworks/0-6-00-6-0-garratt-locomotive-203965,'train',' train'
  4. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_1979_7964-001.jpg,https://artuk.org/discover/artworks/044t-locomotive-no-1431-passing-mosley-siding-signal-box-9593,'train',' train'
  5. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/CHE/CRHC/CHE_CRHC_PCF40-001.jpg,https://artuk.org/discover/artworks/080-locomotive-on-freight-duty-103049,'test',' train'
  6. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NOT/NTMAG/NOT_NTMAG_1997_31-001.jpg,https://artuk.org/discover/artworks/17th-and-21st-lancers-46478,'test',' horse'
  7. https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/STF/STRM/STF_STRM_832-001.jpg,https://artuk.org/discover/artworks/1st-south-staffords-on-the-march-in-burma-1944-19642,'test',' horse'
  8. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_1986_9418-001.jpg,https://artuk.org/discover/artworks/222-locomotive-built-by-george-forrester-8695,'test',' train'
  9. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_2004_7349-001.jpg,https://artuk.org/discover/artworks/222-locomotive-jenny-lind-9616,'test',' train'
  10. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_1986_9421-001.jpg,https://artuk.org/discover/artworks/222-locomotive-patentee-robert-stephensons-patent-locomotive-9409,'train',' train'
  11. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_1996_7374-001.jpg,https://artuk.org/discover/artworks/264t-locomotive-alice-9530,'test',' train'
  12. https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/LLR/RLRH/LLR_RLRH_L_H38_1988_3_0-001.jpg,https://artuk.org/discover/artworks/2nd-battalion-the-leicestershire-regiment-as-chindits-during-operations-against-the-japanese-at-indaw-lake-burma-1944-80060,'train',' aeroplane horse'
  13. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/IWM/IWM/IWM_IWM_LD_5509-001.jpg,https://artuk.org/discover/artworks/43-repair-group-air-frame-repair-service-lincoln-repairing-liberator-aircraft-7481,'test',' aeroplane'
  14. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_1977_5834-001.jpg,https://artuk.org/discover/artworks/460-locomotive-no-1306-mayflower-next-to-unit-m77165-in-the-paint-shop-at-horwich-works-1975-9765,'train',' train'
  15. https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/CW/MTE/CW_MTE_45-001.jpg,https://artuk.org/discover/artworks/6th-earl-and-countess-of-mount-edgcumbe-in-coronation-robes-14840,'validation',' chair'
  16. https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/NY/YAM/NY_YAM_260367-001.jpg,https://artuk.org/discover/artworks/87-squadron-gladiators-tied-together-k7967-k8027-k7972-10402,'test',' aeroplane'
  17. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/YAG/NY_YAG_YORAG_326-001.jpg,https://artuk.org/discover/artworks/a-seventeenth-century-dutch-interior-with-a-seated-lady-8374,'test',' chair diningtable dog'
  18. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/LNE/RAFM/LNE_RAFM_FA03538-001.jpg,https://artuk.org/discover/artworks/a-20-havoc-light-bomber-136117,'test',' aeroplane'
  19. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/DUR/DBM/DUR_DBM_770-001.jpg,https://artuk.org/discover/artworks/a-basket-of-flowers-with-a-dog-chasing-a-bird-44680,'train',' bird dog'
  20. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/DUR/DBM/DUR_DBM_769-001.jpg,https://artuk.org/discover/artworks/a-basket-of-flowers-with-birds-44726,'train',' bird'
  21. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/YAG/NY_YAG_YORAG_66-001.jpg,https://artuk.org/discover/artworks/a-bather-7936,'test',' chair'
  22. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/LW/NARM/LW_NARM_131900-001.jpg,https://artuk.org/discover/artworks/a-battery-of-the-royal-horse-artillery-galloping-to-a-fresh-position-182844,'train',' horse'
  23. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/SYO/CG/SYO_CG_CP_TR_156-001.jpg,https://artuk.org/discover/artworks/a-bavarian-lake-with-fishing-boats-68811,'test',' boat'
  24. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_262-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-4097,'test',' horse'
  25. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/KT/MMB/KT_MMB_06_029-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-76966,'test',' horse'
  26. https://d3d00swyhr67nd.cloudfront.net/w800h800/collection/NG/NG/NG_NG_NG983-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-a-cow-a-goat-and-three-sheep-near-a-building-113901,'test',' cow horse'
  27. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTI/TRR/NTI_TRR_337062-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-and-a-pony-in-a-landscape-101086,'train',' horse'
  28. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIII/CAL/NTIII_CAL_290260-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-called-fleacatcher-169526,'test',' horse'
  29. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTI/PNC/NTI_PNC_1420370-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-chance-and-a-jockey-102326,'train',' horse'
  30. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_258-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-landscape-3778,'test',' horse'
  31. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_590-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-landscape-4061,'test',' horse'
  32. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIV/UPP/NTIV_UPP_138312-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-landscape-with-his-groom-and-two-hounds-220307,'test',' horse'
  33. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIII/CAL/NTIII_CAL_290484-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-stable-169372,'train',' horse'
  34. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIV/KHA/NTIV_KHA_445331-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-stable-220614,'test',' horse'
  35. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_184-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-stable-3599,'test',' horse'
  36. https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/SFK/NHM/SFK_NHM_1986_004-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-wooded-landscape-11244,'train',' horse'
  37. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_455-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-near-a-building-3946,'train',' horse'
  38. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIII/FEL/NTIII_FEL_1401221-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-pony-bloodhound-and-dachshund-outside-felbrigg-hall-171257,'train',' horse'
  39. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTII/BKH/NTII_SKH_1196043-001.jpg,https://artuk.org/discover/artworks/a-bay-hunter-and-a-pug-dog-in-a-landscape-132043,'test',' dog'
  40. https://d3d00swyhr67nd.cloudfront.net/w800h800/collection/NG/NG/NG_NG_NG818-001.jpg,https://artuk.org/discover/artworks/a-beach-scene-with-fishermen-113903,'test',' boat'
  41. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/STF/WAG/STF_WAG_OP536-001.jpg,https://artuk.org/discover/artworks/a-belgian-school-19183,'test',' chair diningtable'
  42. https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/VA/PC/VA_PC_2007BP2389-001.jpg,https://artuk.org/discover/artworks/a-bird-32686,'train',' bird'
  43. https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/LAN/TURT/LAN_TURT_PCF21-001.jpg,https://artuk.org/discover/artworks/a-bird-in-a-tree-152732,'validation',' bird'
  44. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/CDN/LBCN/CDN_LBCN_426-001.jpg,https://artuk.org/discover/artworks/a-bird-in-half-123669,'test',' bird'
  45. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTI/LDK/NTI_LDK_884912-001.jpg,https://artuk.org/discover/artworks/a-black-dog-100725,'test',' dog'
  46. https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/SYO/BHA/SYO_BHA_90009742-001.jpg,https://artuk.org/discover/artworks/a-black-dog-68634,'train',' dog'
  47. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_443-001.jpg,https://artuk.org/discover/artworks/a-black-horse-at-newmarket-3596,'test',' horse'
  48. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTII/ATT/NTII_ATT_609052-001.jpg,https://artuk.org/discover/artworks/a-black-horse-called-bishop-with-his-groom-in-a-landscape-131049,'test',' horse'
  49. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIII/ERH/NTIII_ERH_201463-001.jpg,https://artuk.org/discover/artworks/a-black-horse-in-a-courtyard-167313,'test',' horse'
  50. https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/TATE/TATE/TATE_TATE_T00888_10-001.jpg,https://artuk.org/discover/artworks/a-black-horse-with-two-dogs-201729,'test',' dog horse'

        可以看到:

       数据集一共分为四列,第一列是图像的链接,第二列是页面的链接【这个用处不大】,第三列是该图像所属的子类,比如【train/test/val】,第四列表示图像的主题集合。

        弄清楚了原始数据的含义,接下来第一步就是需要完成指定链接下图像数据的下载,核心实现如下所示:

  1. def loadUrls2Img(data="painting_dataset_2021.csv", resDir="data/"):
  2. """
  3. 读取图像链接,下载图像存储本地
  4. """
  5. with open("img_url.txt") as f:
  6. img_urls=[one.strip() for one in f.readlines() if one.strip()]
  7. print("img_urls_length: ", len(img_urls))
  8. df=pd.read_csv(data)
  9. print(df.head(10))
  10. data_list=df.values.tolist()
  11. print("data_list_length: ", len(data_list))
  12. left_length=len(data_list)-len(img_urls)
  13. print("left_length: ", left_length)
  14. for one_list in data_list[1:]:
  15. try:
  16. img_url,web_page_url,Subset,Labels=one_list
  17. while "'" in Subset:
  18. Subset=Subset.replace("'","")
  19. while "'" in Labels:
  20. Labels=Labels.replace("'","")
  21. oneDir=resDir+Subset.strip()+"/"+Labels.strip()+"/"
  22. if not os.path.exists(oneDir):
  23. os.makedirs(oneDir)
  24. if img_url.strip():
  25. if img_url.strip() not in img_urls:
  26. print("img_url: ", img_url)
  27. with open('img_url.txt','a') as f:
  28. f.write(img_url.strip()+'\n')
  29. one_path=oneDir+str(len(os.listdir(oneDir))+1)+".jpg"
  30. downloadSingleImg(img_url,save_path=one_path)
  31. except Exception as e:
  32. print("Exception: ", e)

        我将下载部分单独封装成一块,方便替换为其他方法:

  1. def downloadSingleImg(img_url,save_path="data/a.jpg"):
  2. """
  3. 下载单个图像数据
  4. """
  5. headers ={
  6. 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'
  7. }
  8. r = requests.get(img_url,headers=headers)
  9. # 下载图片
  10. f = open(save_path,'wb')
  11. f.write(r.content)
  12. f.close()

        下载过程中,会实时记录下载到的图像的链接存储本地文件,方便断点重新继续下载,记录文件截图如下所示:

          下载还是需要挺久的,因为没有做加速处理,就慢慢等着吧,下载结束如下所示:

           因为得到的各个子集下的数据集类别不对等,我对其进行了合并处理,结果如下所示:

          这里面每个子目录的名称就是目录下所有图像的多个主题,这样就好处理了。

          首先确定所有的单主题的数量:

  1. classes_list = os.listdir("data/train/")
  2. print("classes_list: ", classes_list)
  3. print("classes_list_length: ", len(classes_list))
  4. labels_list=[]
  5. for one_class in classes_list:
  6. one_c2l_list=[one.strip() for one in one_class.split(" ")]
  7. labels_list+=one_c2l_list
  8. labels_list=list(set(labels_list))
  9. labels_list.sort()
  10. print("labels_list: ", labels_list)
  11. print("labels_list_length: ", len(labels_list))
  12. with open("labels_list.json","w") as f:
  13. f.write(json.dumps(labels_list))
  14. numbers=len(labels_list)
  15. print("numbers: ", numbers)

          之后进行数据标签生成: 

  1. def generateLabel(one):
  2. """
  3. 标签生成
  4. """
  5. one_label_list=[o.strip() for o in one.split(" ")]
  6. print("one_label_list: ", one_label_list)
  7. one_y_list = []
  8. for i in range(len(labels_list)):
  9. if labels_list[i] in one_label_list:
  10. one_y_list.append(1)
  11. else:
  12. one_y_list.append(0)
  13. print("one_y_list: ", one_y_list)
  14. print("one_y_list_length: ", len(one_y_list))
  15. assert len(one_y_list) == numbers
  16. return one_y_list

         搭建所需的模型,可以根据自己的需求变化:

  1. model = Sequential()
  2. input_shape = (h, w, way)
  3. model.add(Conv2D(64, (3, 3), input_shape=input_shape))
  4. model.add(Activation("relu"))
  5. model.add(Dropout(0.3))
  6. model.add(Conv2D(64, (3, 3)))
  7. model.add(Activation("relu"))
  8. model.add(MaxPooling2D(pool_size=(2, 2)))
  9. model.add(Flatten())
  10. model.add(Dense(1024))
  11. model.add(Activation("relu"))
  12. model.add(Dropout(0.3))
  13. model.add(Dense(numbers))
  14. model.add(Activation("sigmoid"))
  15. lrate = 0.01
  16. decay = lrate / 100
  17. sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
  18. model.compile(loss="binary_crossentropy", optimizer=sgd, metrics=["accuracy"])
  19. print(model.summary())

           拟合训练:

  1. # 拟合训练
  2. checkpoint = ModelCheckpoint(
  3. filepath=saveDir + "best.h5",
  4. monitor="val_loss",
  5. verbose=1,
  6. mode="auto",
  7. save_best_only="True",
  8. period=1,
  9. )
  10. history = model.fit(
  11. X_train,
  12. y_train,
  13. validation_data=(X_test, y_test),
  14. callbacks=[checkpoint],
  15. epochs=nepochs,
  16. batch_size=32,
  17. )
  18. print(history.history.keys())
  19. #可视化
  20. plt.clf()
  21. plt.plot(history.history["acc"])
  22. plt.plot(history.history["val_acc"])
  23. plt.title("model accuracy")
  24. plt.ylabel("accuracy")
  25. plt.xlabel("epochs")
  26. plt.legend(["train", "test"], loc="upper left")
  27. plt.savefig(saveDir + "train_validation_acc.png")
  28. plt.clf()
  29. plt.plot(history.history["loss"])
  30. plt.plot(history.history["val_loss"])
  31. plt.title("model loss")
  32. plt.ylabel("loss")
  33. plt.xlabel("epochs")
  34. plt.legend(["train", "test"], loc="upper left")
  35. plt.savefig(saveDir + "train_validation_loss.png")
  36. scores = model.evaluate(X_test, y_test, verbose=0)
  37. print("Accuracy: %.2f%%" % (scores[1] * 100))

       准确度、损失值曲线如下所示:

           这里同样基于界面做了可视化,简单看下效果:
 

 

 

 

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/羊村懒王/article/detail/148638?site
推荐阅读
相关标签
  

闽ICP备14008679号