当前位置:   article > 正文

批量下载GitHub代码_如何自动把下载github上全量的代码

如何自动把下载github上全量的代码

批量下载GitHub代码

华为被特朗普政府针对,连GitHub、Apache都瑟瑟发抖。

为了保存GitHub上开源项目,批量下载GitHub代码如下

1、第一个代码,用于获取目标对象所有开源项目网址

# -*- coding: utf-8 -*-
"""
Created on Tue May 21 14:23:04 2019
@author: 慕蒿
@email: muhaocs@yeah.net
"""

import requests

def getProject(url):
    response = requests.get(url)
    content = response.text
    content_ul = content.split('<div class="org-repos repo-list">')[1]
    content_li = content_ul.split('<ul>')[1].split('</ul>')[0]
    projects = content_li.split('<h3 class="wb-break-all">')[1:]
    page_info = []
    for project in projects:
        project_info = []
        
        project_name = project.split('>')[1].split('<')[0].strip()
        project_url = 'https://github.com' + project.split('href="')[1].split('"')[0].strip()
        project_info.append(project_name)
        project_info.append(project_url)
        
        try:
            project_language = project.split('"programmingLanguage">')[1].split('<')[0].strip()
            project_info.append(project_language)
        except:
            project_language = 'None'
            project_info.append(project_language)
            
        try:
            project_star = project.split('stargazers')[1].split('</span>')[1].split('</a>')[0].strip()
            project_info.append(project_star)
        except:
            project_star = 'None'
            project_info.append(project_star)
        
        try:
            project_memb = project.split('members')[1].split('</span>')[1].split('</a>')[0].strip()
            project_info.append(project_memb)
        except:
            project_memb = 'None'
            project_info.append(project_memb)
        
        page_info.append(project_info)
    return page_info

fp = open('GitHub-google.txt', 'w+')
# 生成一个txt文件,存贮五个变量:项目名,项目网址,项目脚本语言,项目加星情况,项目共享数
fp.write('project_name\tproject_url\tproject_language\tproject_star\tproject_memb\n')
for page in range(41,51):
    # 此处更改为你目标对象,我先下载了google公司贡献的1774条开源项目
    url = 'https://github.com/google?page=' + str(page)
    page_info = getProject(url)
    for li in page_info:
        fp.write(li[0]+'\t')
        fp.write(li[1]+'\t')
        fp.write(li[2]+'\t')
        fp.write(li[3]+'\t')
        fp.write(li[4]+'\n')
    print(page, 'Done!')
fp.close()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63

2、将上述代码获取得到的txt用excel打开,我按加星情况降序下载的。复制上述项目网址,成为一个新的txt文件。txt文件内仅包含项目网址,如下所示:

https://github.com/google/physical-web
https://github.com/google/lovefield
https://github.com/google/blockly
https://github.com/google/go-cloud
https://github.com/google/android-classyshark
https://github.com/google/j2objc
https://github.com/google/flutter-desktop-embedding
https://github.com/google/closure-compiler
https://github.com/google/seesaw
https://github.com/google/seq2seq
https://github.com/google/error-prone
......
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

利用以下代码批量下载开源代码:

# -*- coding: utf-8 -*-
"""
Created on Tue May 21 14:23:04 2019

@author: 慕蒿
@email: muhaocs@yeah.net
"""

import requests
import urllib3
from urllib import request

def download(url, save_path):
    
    file_name = url.split('/')[-1]
    
    response = requests.get(url)
    content = response.text
    btn = content.split('btn btn-outline get-repo-btn js-anon-download-zip-link ')[1]
    zip_url = 'https://github.com' + btn.split('href="')[1].split('"')[0].strip()
    
    try:
        request.urlretrieve(zip_url, save_path + file_name + '.zip')
    except:
        try:
            with open(save_path + file_name + '.zip', 'wb') as code:
                code.write(requests.get(zip_url).content)
        except:
            try:
                http = urllib3.PoolManager()
                r = http.request('GET', zip_url)
                with open(save_path + file_name + '.zip', 'wb') as code:
                    code.write(r.data)
            except:
                return file_name + ' 下载失败!!!!!!'
    return file_name + ' 下载完成!'

save_path = "projects(google)/"
fp = open('projects(google).txt', 'r')
for line in fp.readlines():
    url = line.strip()
    try:
        print(download(url, save_path))
    except:
        print(url, '下载失败!!!!!!')
fp.close()
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46

3、下载成果如下所示:
在这里插入图片描述

开源当自立,无论发生什么,居安思危。

个人备份GitHub代码,仅用于学习

师夷长技以制夷,我国会有底层操作系统的,我国操作系统上,也会有强大的生态的!

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/菜鸟追梦旅行/article/detail/476019
推荐阅读
相关标签
  

闽ICP备14008679号