Python3 urllib庫爬蟲基礎

add_header()新增報頭

url=""
req = urllib.request.request(url)
req.add_header("user-agent","mozilla/5.0 (x11; ubuntu; linux x86_64; rv:56.0) gecko/20100101 firefox/56.0")
data = urllib.request.urlopen(req).read()
print(data)

get請求

keyword = "hello"
#要搜尋的關鍵字
url = ""
req = urllib.request
.request(url)
data = urllib.request
.urlopen(req).read()
with open("./result.txt","wb") as fd:
fd.write(data)
# 上述當檢索中文的時候 編碼錯誤 
keyword = "你好"
key_code = urllib.request
.quote(keyword) # 編碼
url = "" + key_code
print(url) # %e4%bd%a0%e5%a5%bd
req = urllib.request
.request(url)
data = urllib.request
.urlopen(req).read()
with open("./result.txt","wb") as fd:
fd.write(data)

post請求

# php頁面#請求位址
url = ""
# 構建表單資料並進行編碼處理
postdata = urllib.parse.urlencode().encode("utf-8")
# 建立request物件 引數包括url位址和要傳遞的資料
req =urllib.request.request(url,postdata)
# 新增頭資訊
req.add_header("user-agent","mozilla/5.0 (x11; ubuntu; linux x86_64; rv:56.0) gecko/20100101 firefox/56.0")
data =urllib.request.urlopen(req).read()
with open("./post.txt",'wb') as fd:
fd.write(data)

一邊執行一邊列印日誌開啟debuglog

httpd = urllib.request
.urlopen("")

異常

# urlerror異常 1,連線不上遠端伺服器,2,遠端url不存在,3 無網路,4 觸發了httperror
try:
data = urllib.request
.urlopen("").read()
print(data)
except urllib.error
.urlerror as e:
# print(e.code)
# print("-----------------------")
print(e.reason)
# 當構造乙個存在的**,引發的異常不能用httperror處理,要用urlerror處理 ,urlerror是httperror的父類

Python3 urllib庫的介紹

爬蟲過程中經常會用到乙個叫urllib的包，但在python2.x版本中與python3.x版本中有所區別 python2.x版本 python3.x版本部分變動如下表 python3對應於原來的python2.x的功能簡單應用 import urllib.request file urllib...

python3 urllib使用debug輸出

python2.7.5中使用debug輸出，可以採用如下方式 python3 中統一使用的是urllib模組庫，將python2中的urllib和urllib2進行了整合，試圖按上述方式編寫如下 python3.4.2 window7 cmd 沒有語法錯誤提示，但是，也沒有任何除錯資訊出來。還有另...

Python3 urllib抓取指定URL的內容

python爬蟲主要使用的是urllib模組，python2.x版本是urllib2，很多部落格裡面的示例都是使用urllib2的，因為我使用的是python3.3.2，所以在文件裡面沒有urllib2這個模組，import的時候會報錯，找不到該模組，應該是已經將他們整合在一起了。下面是乙個簡單的 ...

Python3 urllib庫爬蟲 基礎

Python3 urllib庫的介紹

python3 urllib使用debug輸出

Python3 urllib抓取指定URL的內容

相關推薦

Python3 urllib庫爬蟲基礎