解å³æè·¯ï¼
æä¸ä¸ªæè·¯æ为ç®åçæè·¯å¯ä»¥å¨æ解æ页é¢ä¿¡æ¯ãurllibä¸å¯ä»¥è§£æå¨æä¿¡æ¯ï¼ä½æ¯æµè§å¨å¯ä»¥ãå¨æµè§å¨ä¸å±ç°å¤ççä¿¡æ¯å
¶å®æ¯å¤ç好çHTMLæ
æ¡£ãè¿ä¸ºæ们æåå¨æ页é¢ä¿¡æ¯æä¾äºå¾å¥½çæè·¯ãå¨Pythonä¸æä¸ä¸ªå¾æåçå¾å½¢åºââPyQtãPyQtè½ç¶æ¯å¾å½¢åºï¼ä½æ¯ä»éé¢
QtWebkitãè¿ä¸ªå¾å®ç¨ãè°·æçChromeåè¹æçSafarié½æ¯åºäºWebKitå
æ ¸å¼åçï¼æ以æ们å¯ä»¥éè¿PyQtä¸å¾QtWebKit
æ页é¢ä¸çä¿¡æ¯è¯»åå è½½å°HTMLææ¡£ä¸ï¼å解æHTMLææ¡£ï¼ä»HTMLææ¡£ä¸æåæ们æ³ç¨å¾ä¿¡æ¯ã
æéææï¼
ä½è
æ¬äººå®ç¨Mac OS Xãåºè¯¥å¨WindowsåLinuxå¹³å°ä¹å¯ä»¥éç¨ç¸åçåæ³ã
1ãQt4 library
Libraryï¼èä¸æ¯CreatorãLibraryå¨Macçé»è®¤å®è£
è·¯å¾ä¸ï¼åºè¯¥æ¯/home/username/Developor/ï¼ä¸è¦æ¹åQt4çé»è®¤å®è£
è·¯å¾ãå¦åå¯è½å®è£
失败ã
å®æ¹ç½åï¼
http://qt-project.org/downloads2ãSIPãPyQt4
è¿ä¸¤ä¸ªè½¯ä»¶å¯ä»¥å¨å¨PyQtçå®ç½æ¾å°ãä¸è½½çæ¯å®çæºç ãMacåLinuxéè¦èªå·±ç¼è¯ã
ä¸è½½å°åæ¯ï¼
http://www.riverbankcomputing.co.uk/software/pyqt/downloadå¨ç»ç«¯åæ¢å°æ件解ååçç®å½ä¸ã
å¨ç»ç«¯ä¸è¾å
¥
python configure.py
make
sudo make install
è¿è¡å®è£
ç¼è¯ã
SIPåPyQt4两个å®è£
æ¹æ³ç¸åãä½æ¯PyQt4ä¾èµSIPãæ以å
å®è£
SIPåå®è£
PyQt4
1ã2两æ¥å®æä¹åï¼PythonçPyQt4ç模åå°±å®è£
好äºãå¨Python shellä¸è¾å
¥import PyQt4ççè½ä¸è½æ¾å°PyQt4ç模åã
3ãSpynner
spynneræ¯ä¸ä¸ªQtWebKitç客æ·ç«¯ï¼å®å¯ä»¥æ¨¡ææµè§å¨ï¼å®æå 载页é¢ãå¼åäºä»¶ãå¡«å表åçæä½ã
è¿ä¸ªæ¨¡åå¯ä»¥å¨Pythonçå®ç½æ¾å°ã
ä¸è½½å°å:
https://pypi.python.org/pypi/spynner/2.5解ååï¼cdå°å®è£
ç®å½ï¼ç¶åè¾å
¥sudo python configure.py installå®è£
该模åã
è¿æ ·Spynner模åå°±å®è£
å®æäºï¼å¨python shellä¸è¯è¯import spynnerçç该模åæ没æå®è£
å®æã
Spynnerçç®å使ç¨
Spynnerçåè½åå强大ï¼ä½æ¯ç±äºæ¬äººè½åæéï¼å°±ä»ç»ä¸ä¸å¦ä½æ¾ç¤ºç½é¡µçæºç å§ã
#! /usr/bin/python
#-*-coding: utf-8 -*-
import spynner
browser = spynner.Browser()
#å建ä¸ä¸ªæµè§å¨å¯¹è±¡
browser.hide()
#æå¼æµè§å¨ï¼å¹¶éèã
browser.load("
http://www.baidu.com")
#browser ç±»ä¸æä¸ä¸ªç±»æ¹æ³loadï¼å¯ä»¥ç¨webkitå è½½ä½ æ³å è½½ç页é¢ä¿¡æ¯ã
#load(æ¯ä½ æ³è¦å è½½çç½åçå符串形å¼)
print browser.html.encode("utf-8")
#browser ç±»ä¸æä¸ä¸ªæåæ¯htmlï¼æ¯é¡µé¢è¿è¿å¤çåçæºç çå符串.
#å°å
¶è½¬ç 为UTF-8ç¼ç
open("Test.html", 'w+').write(browser.html.encode("utf-8"))
#ä½ ä¹å¯ä»¥å°å®åå°æ件ä¸ï¼ç¨æµè§å¨æå¼ã
browser.close()
#å
³é该æµè§å¨
éè¿è¿ä¸ªç¨åºï¼å°±å¯ä»¥æ¯è¾å®¹æçæ¾ç¤ºwebkitå¤çç页é¢HTMLæºç äºã
spynneråºç¨
ä¸é¢ä»ç»ä¸ä¸spynnerçç®ååºç¨ï¼éè¿ç®åçç¨åºï¼å¯ä»¥è·åä½ å¨æµè§å¨ä¸çå°ç页é¢çå
¨é¨å¾çãç¨HTMLParserãBeautifulSoupçé½å¯ä»¥å®æHTMLParserææ¡£ç解æãèæéæ©HTMParserã
#!/usr/bin/python
import spynner
import HTMLParser
import os
import urllib
class MyParser(HTMLParser.HTMLParser):
def handle_starttag(self, tag, attrs):
if tag == 'img':
url = dict(attrs)['src']
name = os.path.basename(dict(attrs)['src'])
if name.endswith('.jpg') or name.endswith('.png') or name.endswith('gif'):
print "Download.....", name
urllib.urlretrieve(url, name)
if __name__ == "__main__":
browser = spynner.Browser()
browser.show()
browser.load("
http://www.artist.cn/snakewu1994/StyleBasis_Four/en_album_607236.shtml")
Parser = MyParser()
Parser.feed(browser.html)
print "Done"
browser.close()
éè¿è¿ä¸ªç¨åºï¼å¯ä»¥ä¸è½½ä½ å¨é¡µé¢ä¸çå°çå
¨é¨å¾çãç®åçå è¡ç¨åºå°±å®æäºè¿ä¸ªè°å·¨çä»»å¡ãå®ç°äºå¾ççæ¹éå¤çãè¿çæ¯Pythonè¯è¨çä¼å¿ï¼åè°å·¨çä»»å¡äº¤ç»ç¬¬ä¸æ¹å§ã