python 爬蟲學習 1 基本知識篇）

首先介紹下urllib的用法

urllib提供了一系列用於操作url的功能。

常用的模組：

urllib.request 請求模組

urllib.error 異常處理模組

urllib.parse url解析模組

urllib的ruquest模組可以非常方便地抓取url內容，也就是傳送乙個get請求到指定的頁面，然後返回http的響應：

from urllib import request
response = request.urlopen("")

.read() 每次讀取整個檔案，它通常用於將檔案內容放到乙個字串變數中。

html =response.read()

然而這是遠遠不夠的，因為返回值是以二進位制儲存的，一般網頁原碼都是用utf-8表示，所以一般還有乙個decode（）的過程。

html =html.decode("utf-8")
print(html)

得到以下輸出：

code如下：

from urllib import request as re
response =re.urlopen("")
cat_img = response.read()
with open("d:\\實驗樓\cat_500_600.jpg", 'wb')as f:
f.write(cat_img)

這裡就用到urllib.parse，通過bytes(urllib.parse.urlencode())可以將post資料進行轉換放到urllib.request.urlopen的data引數中。這樣就完成了一次post請求。

所以如果我們新增data引數的時候就是以post請求方式請求，如果沒有data引數就是get請求方式。

具體例子2：使用有道翻譯進行翻譯.

from urllib import request
from urllib import parse
import json
content = input("請輸入需要翻譯的內容")
url =""
head =
data ={}
data["from"]="auto"
data["to"]="auto"
data["i"]= content
data["client"]="fanyideskweb"
data["sign"]="c8f3a6d3a2e68a5ba21a0c36de9ed9cd"
data["salt"]="1539962031171"
data["smartresult"]="dict"
data["doctype"]="json"
data["version"]= 2.1
data["keyfrom"]="fanyi.web"
data["action"]="fy_by_realtime"
data["typoresult"]= "false"
data = parse.urlencode(data).encode("utf-8")
##是將乙個utf-8型別的字串url，解碼成ascii格式的方法
#urllib.parse.urlencode()
#只將連線中utf-8編碼不在ascii表中的字元翻譯成帶百分號的ascii表示形式
#>>>params = 
#>>>data = urllib.parse.urlencode(params)
#>>>data
#'query=%e4%b8%ad%e6%96%87&submit=search'
req = request.request(url, data , head)
response = request.urlopen(req)
# or 刪去request中的head
html = response.read().decode('utf-8')
target = json.loads(html)
#json.loads 用於解碼 json 資料。該函式返回 python 欄位的資料型別。
res= target['translateresult'][0][0]['tgt']
print ("翻譯結果為：",res)

**ip的使用方法：

1、build_opener 和urlopen的區別和優點？

答：要爬取的各種各樣的網頁，它們有一部填寫需要驗證碼，有的需要小餅乾（cookie），還有更多許多高階的功能，它們會阻礙你爬，而我對於urlopen單純地理解就是開啟網頁。urlopen開啟乙個**，它可以是乙個字串或者是乙個request物件。而build_opener就是多了handler，處理問題更專業,更個性化。

2、使用預設的handlers應該怎麼寫？

例子1:

prohandler=

request = urllib.request.request(url,data, headers or {})

iplist =['119.6.144.73:81',...........]

proxy_support =urllib.request.proxyhandler(）

opener = urllib.request.build_opener(support)

例子2:

opener1 = urllib.request.install_opener(opener)

python 爬蟲學習 1 基本知識篇）

python爬蟲基本知識

Python基本知識1

python學習1 基本知識與函式

python 爬蟲 學習 1 基本知識篇）

python爬蟲基本知識

Python基本知識1

python學習1 基本知識與函式

相關推薦

python 爬蟲學習 1 基本知識篇）