Scraping Yahoo! Search with Web::Scraper in Python
次はhttp://menno.b10m.net/blog/blosxom/perl/scraping-yahoo-search-with-web-scraper.htmlと同じことをやってみる。
比較しやすいようにPerlのコードも並べてみた。
コードの量も見た目もあんまり変わらない。
Perlは括弧がなくてすっきり、Pythonの方はセミコロンがなくてすっきりしてる。
Python版は今まで"spam"という記法をサポートしてなかったので、
キーワード引数が'spam__list'な時にはループするようにした。
""の方が見た目が好きだけど、Pythonのキーワード引数に使える文字は限られているので、多少格好悪くても我慢する。(Djangoのlookup_typeも同じだし)
出力結果はPythonのはネストが深い。
use Data::Dumper; use URI; use Web::Scraper; my $yahoo = scraper { process "/html/body/div[5]/div/div/div[2]/ol/li", 'results[]' => scraper { process "a.yschttl", 'title' => 'TEXT', 'url' => '@href'; process "div.yschabstr", 'description' => "TEXT"; result 'description', 'title', 'url'; }; result 'results'; }; print Dumper $yahoo->scrape( URI->new("http://search.yahoo.com/search?p=Perl") ); $ time ./ysearch.pl | head $VAR1 = [ { 'url' => 'http://www.perl.com/', 'title' => 'Perl.com', 'description' => 'Central resource for Perl developers. It contains the Perl Language, edited by Tom Christiansen, and the Perl Reference, edited by Clay Irving.' }, { 'url' => 'http://www.perl.org/', 'title' => 'Perl Mongers', 'description' => 'Nonprofit organization, established to support the Perl community.' real 0m1.391s user 0m0.282s sys 0m0.038s
#!/usr/bin/env python2.5 #-*- coding: utf-8 -*- from scraper import scraper, process import codecs, sys sys.stdout = codecs.getwriter('utf-8')(sys.stdout) yahoo = scraper( process('/html/body/div[5]/div/div/div[2]/ol/li', results__list=scraper( process('a.yschttl', title='TEXT', url='@href'), process('div.yschabstr', description="TEXT")) ) ) from pprint import pprint pprint(yahoo.scrape('http://search.yahoo.com/search?p=Perl')) $ time ./ysearch.py |head {'results': [{'description': 'Central resource for Perl developers. It contains the Perl Language, edited by Tom Christiansen, and the Perl Reference, edited by Clay Irving.', 'title': 'Perl.com', 'url': 'http://www.perl.com/'}, {'description': 'Nonprofit organization, established to support the Perl community.', 'title': 'Perl Mongers', 'url': 'http://www.perl.org/'}, {'description': 'Instructions on downloading a Perl interpreter for your computer platform. ... On CPAN, you will find Perl source in the /src directory. ...', 'title': 'Getting Perl', 'url': 'http://www.perl.com/download.csp'}, {'description': 'Perl borrows features from a variety of other languages including C, shell ... Perl 3, released in 1989, added support for binary data streams. ...', real 0m1.500s user 0m0.112s sys 0m0.268s