一个简单的批量下载网站数据的脚本,主要用到urllib2等库。
参考资料来自:https://wulc.me/2015/12/02/python%E6%89%B9%E9%87%8F%E4%B8%8B%E8%BD%BD%E6%96%87%E4%BB%B6/

#!/usr/bin/python
#-*- coding: utf-8 -*-
import urllib2
import os

def getLegalUrl(year,mon,day,time):
    base_url="http://ftp.cpc.ncep.noaa.gov/precip/CMORPH_V0.x/RAW/8km-30min/"
    url_preletter='CMORPH_V0.x_RAW_8km-30min_'
    try:
        url=base_url+str(year)+'/'+str(year)+str(mon).zfill(2)+'/'+url_preletter+str(year)+str(mon).zfill(2)+str(day).zfill(2)+str(time).zfill(2)+'.gz'
        f=urllib2.urlopen(url)
        return url
    except urllib2.URLError:
        return " "

def download(url,year,mon,day,time):
    f=urllib2.urlopen(url)
    data=f.read()
    url_preletter = 'CMORPH_V0.x_RAW_8km-30min_'
    pathgz = "E:\\CMORPH\\" + str(year) + str(mon).zfill(2) + "\\"
    with open(os.path.join(pathgz,url_preletter+str(year)+str(mon).zfill(2)+str(day).zfill(2)+str(time).zfill(2)+'.gz'),'wb') as file:
        file.write(data)

if __name__ == '__main__':
    days=[31,30,31,31,30]
    for year in range(2017,2018):
        for mon in range(5,10):
            for day in range(1,days[mon-4]):
                for time in range(0,24):
                    url=getLegalUrl(year,mon,day,time)
                    if url=="":
                        with open("download.log",'a') as log:
                            log.write(str(year)+str(mon).zfill(2)+str(day).zfill(2)+str(time).zfill(2)+'not found\n')
                    else:
                        download(url,year,mon,day,time)
Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐