Python 抓取小说爬虫
scturtle
posted @ 2010年4月14日 06:52
in python
, 3711 阅读
Python好神奇!爬虫原来是这个样子的!抓网页原来这么简单!
import urllib2 from BeautifulSoup import BeautifulSoup content = urllib2.urlopen( 'http://www.feiku.com/html/book/130/159571/4747141.shtm' ).read() soup = BeautifulSoup(content) soup=soup.find('div', id="BookText") f = file('book.txt', 'w') for i in soup.findAll(text=True): i=i.replace(' ',' ') f.write(i.encode('utf-8')+'\n') f.close()