python爬蟲小程式 python爬蟲學習小程式

#coding:utf-8

# name: 模組1

# purpose:

# author: mrwang

# created: 18/04/2014

# licence:

import urllib

def main():

url = ""

html = urllib.urlopen(url)

# print html.read() #讀取內容

# print html.read().decode("gbk").encode("utf-8") #亂碼解決

# print html.read().decode("gbk", "ignore").encode("utf-8") #乙個頁面多個編碼加ignore 忽略無法顯示的字元

# print html.info() #檢視網頁頭部資訊

connection: close

date: fri, 18 apr 2014 03:13:46 gmt

server: microsoft-iis/6.0

microsoftofficewebserver: 5.0_pub

pragma: no-cache

cache-control: private

content-length: 50853

content-type: text/html

expires: thu, 17 apr 2014 03:13:44 gmt

set-cookie: web%5fid=9952508807; path=/

set-cookie: aspsessionidqctqrbqa=njfijebaifpplgfkelicddel; path=/

cache-control: no-cache

# print html.getcode() #返回訪問狀態碼

# print html.geturl() #返回網頁

# html.close() #關閉連線

urllib.urlretrieve 方法使用

1 傳入**

2 傳入本地儲存路徑檔名

3 乙個函式呼叫，我們可以任意定義這個函式，但是這個函式一定要有三個引數

引數1 到目前為止傳遞的資料塊數量

引數2 每個資料塊的大小，單位byte，位元組

引數3 獲取的檔案的大小有時候會返回-1

urllib.urlretrieve(url, "c:", callback)

def callback(a, b, c):

@引數a 到目前為止傳遞的資料塊數量

@引數b 每個資料塊的大小，單位byte，位元組

@引數c 獲取的檔案的大小有時候會返回-1

down_progress = 100.0 * a * b / c

if down_progress > 100:

down_progress = 100

print "%.2f%%" % down_progress, #後面加上 , 就不會換行

0.00% 16.11% 32.22% 48.33% 64.44% 80.55% 96.66% 100.00%

if __name__ == "__main__":

main()

python 常用小程式網頁爬蟲

設定鏈結的路徑 url def downloadpicfromurl dest dir,url try urllib.urlretrieve url dest dir except print terror retrieving the url dest dir 執行downloadpicfromu...

Python 京東爬蟲搶手機小程式

def login b 登入京東 b.click link by text 你好，請登入 time.sleep 3 b.click link by text 賬戶登入 b.fill loginname 填寫賬戶密碼 b.fill nloginpwd b.find by id loginsubmit ...

網頁爬蟲小程式

乙個簡單的網頁爬蟲程式網頁爬蟲得到網頁上的郵箱位址得到網頁上的時間戳 public class regexdemo d d d webcrawler url 1,reg 1 得到網頁上的郵箱位址 webcrawler url 2,reg 2 得到網頁上的時間戳 param str param ...

python爬蟲小程式 python爬蟲學習小程式

python 常用小程式 網頁爬蟲

Python 京東爬蟲搶手機小程式

網頁爬蟲小程式

相關推薦

python 常用小程式網頁爬蟲