斯坦福在线人工智能课程字幕下载脚本
pygame实现文本转图片

Google Reader 导出数据搜索预览小工具

scturtle posted @ 2011年11月02日 15:06 in python , 2021 阅读

updated at 2013/3/14: R.I.P Google Reader.

Google Reader改版了,是好是坏众说纷纭,总之原来那些分享的文章变成冷冰冰的数据包了。在Google悔过之前,我们总不能对着这一个个数据包发呆吧,特别是突然想找什么老文章的时候。所以在大牛 @isnowfy 的启示下有了这个小工具^_^

需要说明的是针对的是“阅读器 JSON”这种数据包,比如 shared-items.json 和 starred-items.json。搜索框里回车可以对标题进行搜索,列表框里单击可以在右侧看到详细信息,比如url地址和去html标签后的网页内容。去html标签使用了BeautifulSoup,所以请自备 BeautifulSoup.py在同一目录下 BeautifulSoup4。由于对Tk(Tkinter)这个古老而恶心的GUI不熟悉,界面上请不要苛责了^_^

附上使用图片和脚本:
reader.png

# coding: utf-8
import json
from Tkinter import *
import tkFileDialog
from bs4 import BeautifulSoup

root = Tk()
e = StringVar()
li, te, data, items = [None]*4
sanitize_all_html = lambda value: BeautifulSoup(value).get_text()


def selectFile():
    global data, items
    filename = tkFileDialog.askopenfilename(parent=root, initialdir='.',
                                            filetypes=[('GR data', '.json')])
    data = json.load(open(filename))
    items = data['items'] if 'items' in data else data
    items = filter(lambda i: 'title' in i, items)
    filterItems()


def filterItems(event=None):
    global data, items
    li.delete(0, END)
    s = e.get()
    for i in items:
        title = i['title']
        try:
            content = ('summary' in i and i['summary']
                                       or i['content'])['content']
        except:
            content = ''
        if s in title or s in content:
            li.insert(END, title)


def showItemInfo(event=None):
    te.delete(1.0, END)
    title = li.get(li.curselection())
    item = filter(lambda i: i['title'] == title, items)[0]
    te.insert(1.0, item['title']+'\n')
    te.insert(2.0, ('author' in item and item['author'] or 'NO AUTHOR')+'\n')
    te.insert(3.0, item['origin']['title']+'\n')
    te.insert(4.0, item['origin']['htmlUrl']+'\n')
    te.insert(5.0, ('alternate' in item and item['alternate'][0]['href'] or
                    'NO URL')+'\n\n')
    if 'summary' in item:
        te.insert(7.0, sanitize_all_html(item['summary']['content'])+'\n')
    else:
        te.insert(7.0, sanitize_all_html(item['content']['content'])+'\n')

f = Frame(root)
lf = Frame(f)
rf = Frame(f)
f.pack(fill='both', expand=1)
lf.pack(side='left', fill='both')
rf.pack(side='left', fill='both', expand=1)

fbt = Button(lf, text="Select file", height=1, command=selectFile)
fbt.pack(side='top', fill=X)

en = Entry(lf, textvariable=e, width=30)
en.bind('<Return>', filterItems)
en.pack(side='top', fill=X)

liy = Scrollbar(lf, orient=VERTICAL)
li = Listbox(lf, yscrollcommand=liy.set)
li.bind('<<ListboxSelect>>', showItemInfo)
liy['command'] = li.yview
li.pack(side='left', fill='both', expand=1)
liy.pack(side='left', fill=Y)

tey = Scrollbar(rf, orient=VERTICAL)
te = Text(rf, yscrollcommand=tey.set, height=30,
          font=('Microsoft Yahei', '12'))
tey['command'] = te.yview
te.pack(side='left', fill='both', expand=1)
tey.pack(side='left', fill=Y)

root.title('GR Data Reader')
root.mainloop()

登录 *


loading captcha image...
(输入验证码)
or Ctrl+Enter