获得豆瓣FM加心歌曲信息的py脚本
2011年4月13日 18:37
update at 2012.2.29: 传送门 to isnowfy大牛的方法
参考各种资料写的,有个小缺点是需要输入有多少页加心歌曲
改改输出格式应该可以用小众软件里介绍的E音乐盒批量下载吧
当时不知道这个软件,只好又写个爬谷歌音乐的,然后手动下载,真是蛋疼
python 2.7 + BeautifulSoupct
# coding: utf-8 from BeautifulSoup import BeautifulSoup import urllib,urllib2,cookielib loginurl='http://www.douban.com/accounts/login' url='http://douban.fm/mine?start=%d&type=liked' captchaurl='http://www.douban.com/misc/captcha?id=%s&size=s' email=raw_input('email:') passwd=raw_input('passwd:') pages=int(raw_input('pages:')) cookie_support= urllib2.HTTPCookieProcessor(cookielib.CookieJar()) opener = urllib2.build_opener(cookie_support, urllib2.HTTPHandler) urllib2.install_opener(opener) content=urllib2.urlopen(loginurl).read() postdata=urllib.urlencode({ 'source':'simple', 'form_email':email, 'form_password':passwd, #'captcha-id':captchaid, #'captcha-solution':raw_input('code in pic:'), #'user_login':'登录', }) req = urllib2.Request( url = loginurl, data = postdata) content=urllib2.urlopen(req).read() if opener.open(req).geturl() == 'http://www.douban.com/': print 'login success!' for i in range(pages): content=urllib2.urlopen(url % (i*9,)).read() soup=BeautifulSoup(str(content)) for tr in soup.findAll('tr')[1:]: td=tr.findAll('td') print td[0].string,td[1].span.string #,td[1].a.string # title artist album else: print 'login fault'
update:登录方式有变化,douban现在会对频繁登录的ip加验证码,总之此代码有时效性,仅供参考
另附一个从baidu爬歌曲url的脚本,把上面脚本的结果保存成文件,再用这个脚本得到url,最后再用downthemall或迅雷批量下载,多么蛋疼
# coding: utf-8 import sys,os,urllib2,xml from xml.dom import minidom lc = sys.getfilesystemencoding() baseurl0="http://box.zhangmen.baidu.com/x?op=7&mode=1&count=1&listid=&title=%s" baseurl1="http://box.zhangmen.baidu.com/x?op=12&count=1&mtype=4&title=%s" f=open('list.txt','r') for eachline in f.readlines(): song=eachline#raw_input('Song name:').decode(lc).encode('gbk') try: data=urllib2.urlopen(baseurl0 % urllib2.quote(song)) data=data.read().replace('gb2312','utf-8').decode('gbk').encode('utf8') except: print 'failed on '+eachline continue doc=minidom.parseString(data) # 丫真挑剔 names=doc.getElementsByTagName('name') if not names: print 'failed on '+eachline continue count=0 #for i in names: #print count,i.firstChild.nodeValue.replace('$',' ').encode(lc) #count+=1 choice=0#int(raw_input('\nyour choice:')) data=urllib2.urlopen(baseurl1 % urllib2.quote(names[choice].firstChild.nodeValue.encode('gbk'))) data=data.read().replace('gb2312','utf-8').decode('gbk').encode('utf8') doc=minidom.parseString(data) urls=doc.getElementsByTagName('url') #print '\n song urls:' #for i in urls: i=urls[0] try: t=i.childNodes pos=t[0].firstChild.nodeValue.rfind('/') prefix=t[0].firstChild.nodeValue[0:pos+1] suffix=t[1].firstChild.nodeValue print prefix+suffix except: print 'failed on '+eachline raw_input('\nEnter to exit ...')