追随
http://use.perl.org/~miyagawa/journal/34461
また真似してUser-Agentの変更とhandlerの指定を出来るようにした。
UserAgentの変更がうまくいっているかをdebuglevelが1のHTTPHandlerをつかって
確認する例。
#!/usr/bin/env python2.5 #-*- coding: utf-8 -*- from scraper import scraper, process url = 'http://www.example.com/' s = scraper( process('a', text='text', href='@href'), ) import urllib2 http_handler = urllib2.HTTPHandler(debuglevel=1) s.user_agent = 'Mozilla/5.0' print s.scrape(url, http_handler)
$ ./example.py connect: (www.example.com, 80) send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.example.com\r\nConnection: close\r\nUser-Agent: Mozilla/5.0\r\n\r\n' reply: 'HTTP/1.1 200 OK\r\n' header: Date: Sun, 16 Sep 2007 07:04:25 GMT header: Server: Apache/2.2.3 (CentOS) header: Last-Modified: Tue, 15 Nov 2005 13:24:10 GMT header: ETag: "280100-1b6-80bfd280" header: Accept-Ranges: bytes header: Content-Length: 438 header: Connection: close header: Content-Type: text/html; charset=UTF-8 {'text': 'RFC \r\n 2606', 'href': 'http://www.rfc-editor.org/rfc/rfc2606.txt'}