赞
踩
https://blog.csdn.net/qq_16964363/article/details/79224776
主要参考这篇文章,侵删。
近期cf上线了难度指数功能,将每道题的难度量化。那么我根据这篇博主的启发, 做了个爬虫来对每个分类的难度进行分析。先上爬虫代码:
# -*- coding: utf-8 -*- import json import urllib.request from bs4 import BeautifulSoup sum_difficulty = {} avg_difficulty = {} min_difficulty = {} max_difficulty = {} problems_count = {} max_page = 48 for i in range(1, max_page): print ('parsing page %d' % i) url='http://codeforces.com/problemset/page/%s'%str(i) data=urllib.request.urlopen(url).read() #发起请求并读取回应 data=data.decode('UTF-8') soup = BeautifulSoup(data, 'html.parser') for p in soup.find(class_='problems').find_all('tr'): tds = p.find_all('td') if len(tds) != 5: continue difficulty_span = tds[3].span if difficulty_span is None: continue difficulty = int(difficulty_span.string) for notice in tds[1].find_all(class_='notice'): tag=notice.string if (tag in problems_count) == False: # 初始化 problems_count[tag] = 0 sum_difficulty[tag] = 0 min_difficulty[tag] = 10000 max_difficulty[tag] = -1 #迭代 problems_count[tag] += 1 sum_difficulty[tag] += difficulty min_difficulty[tag] = min(min_difficulty[tag], difficulty) max_difficulty[tag] = max(max_difficulty[tag], difficulty) for tag in problems_count: avg_difficulty[tag] = sum_difficulty[tag]//problems_count[tag] print ('标签,题目数,平均难度,最高难度,最低难度') d = sorted(problems_count.keys(), key = lambda k: k[0]) for tag in d: print (tag,problems_count[tag],avg_difficulty[tag],max_difficulty[tag],min_difficulty[tag], sep=',')
我主要分析了每个标签(题目类型)的题目数、平均难度、最高难度、最低难度,结论如下(截止到2018.11.12)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。