赞
踩
Github的API无需注册,在网上可以查到使用信息和一些限制:
查询的请求为每分钟10次
此次用到的api格式如下:
url = ‘https://api.github.com/search/repositories?q=language:python&sort=stars’
/search/repositories 搜索github上的仓库
?传递一个查询参数q,=指定查询的内容language:python,查询语言为python的仓库信息
&sort=stars项目按照获得的星星数量排名
倒入用到的库
import requests
# 获取最受欢迎的前30个仓库的信息,返回对象为一个字典对象
def get_info(language):
url = 'https://api.github.com/search/repositories?q=language:%s&sort=stars'%(language)
r = requests.get(url)
if r.status_code == 200:
print 'success'
return r.json()
# 获取python,java,C++三种语言最受欢迎的库
response_python = get_info('python')
response_java = get_info('java')
response_c_dplus = get_info('C++')
success
success
success
获取信息函数返回字典对象的键值
print (response_python.keys())
response_java['total_count'] # github上java语言仓库的数量
response_python['items'][0].keys() # item包含仓库的一些详细信息的列表
[u'issues_url', u'deployments_url', u'stargazers_count', u'forks_url', u'mirror_url', u'subscription_url', u'notifications_url', u'collaborators_url', ........ u'watchers', u'name', u'language', u'url', u'created_at', u'pushed_at', u'forks_count', u'default_branch', u'teams_url', u'trees_url', u'branches_url', u'subscribers_url', u'stargazers_url']
incomplete_results,在执行更复杂的Api调用是,需要检查它的结果.简单的调用可以忽略
对字典对象进行处理,生成一个DataFrame对象
# 返回一个dataframe对象
import pandas as pd
def ret_df(response_dict):
df = pd.DataFrame(columns=['created_at','updated_at','name','forks' ,'stars','size'])
for resp_dict in response_dict['items']:
df = df.append({
'created_at':resp_dict['created_at'],
'updated_at':resp_dict['updated_at'],
'name':resp_dict['name'],
'forks':resp_dict['forks'],
'stars':resp_dict['stargazers_count'],
'size':resp_dict['size']},ignore_index=True)
return df
created_at | updated_at | name | forks | stars | size | |
---|---|---|---|---|---|---|
0 | 2014-06-27T21:00:06Z | 2017-08-04T04:06:23Z | awesome-python | 6999 | 37000 | 3198 |
1 | 2012-02-25T12:39:13Z | 2017-08-04T03:42:53Z | httpie | 2069 | 30840 | 3657 |
2 | 2015-04-08T15:08:04Z | 2017-08-04T04:44:20Z | thefuck | 1476 | 29363 | 1914 |
3 | 2010-04-06T11:11:59Z | 2017-08-04T03:55:18Z | flask | 9091 | 28759 | 4929 |
4 | 2010-10-31T14:35:07Z | 2017-08-03T23:17:48Z | youtube-dl | 5281 | 27920 | 48358 |
可视化仓库
# pygal对仓库信息进行可视化
import pygal
# 生成数据
df_python = ret_df(response_python)
df_java = ret_df(response_java)
df_c_dplus = ret_df(response_c_dplus)
每个仓库的stars和forks
def show(df, language):
line_chart = pygal.Line(x_label_rotation=45)
line_chart.title = 'Most-Starred %s projects on Github'%language
line_chart.x_labels = df['name']
line_chart.add('forks',df['forks'])
line_chart.add('stars',df['stars'])
line_chart.render_to_file(language+'_projects.svg')
show(df_c_dplus, 'C++')
不同语言的比较
# 不同语言比较
from pygal.style import TurquoiseStyle
def compare_show(df_py,df_j,df_c, colu):
chart = pygal.Bar()
chart.title = 'Compare different language'
chart.x_labels = range(1,31) # 项目星星排名
chart.add('Python', df_py[colu])
chart.add('Java', df_j[colu])
chart.add('C++', df_c[colu])
chart.render_to_file(colu+'.svg')
compare_show(df_python, df_java, df_c_dplus, 'stars')
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。