爬蟲請求模組

1 版本

python2 : urllib urllib2

python3 : 把urllib 和 urllib2合併，urllib.request

2 常用方法

2.1.1位元組流 = response.read()

字串 = response.read().decode("utf-8")31

encode() : 字串--->bytes

decode(): bytes-->字串

2.2 重構user-agent

2.2.1 不支援重構user-agent :urlopen()

2.2.2 支援重構 user-agent :urllib.request.request("**",headers="字典")

user-agent 是爬蟲和反爬蟲鬥爭的第一步，傳送請求必須帶user-agent

2.2.2.1 使用流程

2.2.2.1.1 利用request方法構建請求物件

2.2.2.1.2 利用urlopen()獲取響應物件

2.2.2.1.3 利用響應物件的read().decode("utf-8") 獲取內容

2.2.2.2 響應物件 response的方法

2.2.2.2.1 read() 讀取伺服器響應的內容

2.2.2.2.2 getcode()

作用返回http響應碼 print(response.getcode())

200 成功

4xx 伺服器頁面出錯 5xx 伺服器出錯

2.2.2.2.3 geturl()

作用返回實際資料的url （防止重定向問題）

3 urllib.parse 模組

3.1 urlencode(字典)

urlencode() wd=%e%rr.......

3.2 quote(字串)

aseurl = ""

key = input("請輸入要搜尋的內容:")

#用quote()編碼

key = urllib.parse.quote(key)

url = baseurl + key

print(url)

發請求 - 響應（html原始碼）- 解析

爬蟲請求模組

2.urllib.parse 3.請求方式 4.request模組 5.requests原始碼分析位元組流 response.read 字串 response.read decode utf 8 urllib.request.request headers 字典 import urllib.req...

Python爬蟲02 請求模組

七 json資料 response.text 返回unicode格式的資料 str response.content 返回位元組流資料二進位制 response.content.decode utf 8 手動進行解碼 response.url 返回url response.encode 編碼 im...

爬蟲網路請求模組urllib

url 統一資源定位符 uniform resource locator https 協議 new.qq.com 主機名網域名稱省略了埠 443 omn twf20200 twf2020032502924000.html 訪問資源的路徑 anchor 錨點前端用來做頁面定位或者導航 from ...

爬蟲請求模組

爬蟲請求模組

Python爬蟲02 請求模組

爬蟲網路請求模組urllib

相關推薦