Py+Tk 扫雷游戏

cx_freeze和py2exe打包py程序笔记

Python 抓取小说爬虫

scturtle posted @ 2010年4月14日 06:52 in python , 3711 阅读

Python好神奇!爬虫原来是这个样子的!抓网页原来这么简单!

import urllib2
from BeautifulSoup import BeautifulSoup

content = urllib2.urlopen(
    'http://www.feiku.com/html/book/130/159571/4747141.shtm'
    ).read()

soup = BeautifulSoup(content)
soup=soup.find('div', id="BookText")

f = file('book.txt', 'w')
for i in soup.findAll(text=True):
    i=i.replace('&nbsp;',' ')
    f.write(i.encode('utf-8')+'\n')
f.close()

评论 (0)