Python語言學習（十二）

167、爬蟲**舉例1：

import urllib.request
import urllib.parse
import json
content=
input
('請輸入需要翻譯的內容：'
)url=
''#request url
data=
data[
'i']
=content
data[
'from']=
'auto'
data[
'to']=
'auto'
data[
'smartresult']=
'dict'
data[
'client']=
'fanyideskweb'
data[
'salt']=
'1536137270843'
data[
'sign']=
'22bea357b7e89f725e8975b62fce9993'
data[
'doctype']=
'json'
data[
'version']=
'2.1'
data[
'keyfrom']=
'fanyi.web'
data[
'action']=
'fy_by_clickbuttion'
data[
'yporesult']=
'false'
data=urllib.parse.
urlencode
(data)
.encode
('utf-8'
)response=urllib.request.
urlopen
(url,data)
html=response.
read()
.decode
('utf-8'
)target=json.
loads
(html)
print
('翻譯結果是：%s'
%(target[
'translateresult'][
0][0
]['tgt'])
)

其中url使我們要訪問的**，在**的審查元素中post資料中的request url可以得到。data是乙個字典，字典的內容在request headers中（全部）。得到的將是乙個列表，需要用read()函式讀取，同時要用decode()函式轉換為utf-8編碼形式才能正常讀取

168、urllib.requeset.requset()函式，格式為：

物件=urllib.request.
request
(url, data=none, headers=
, origin_req_host=none, unverifiable=false, method=none)

url必須字串，為我們要訪問的**

data可選資料物件，多為字典型別。是指定要傳送到伺服器的附加資料的物件，如果不需要這樣的資料，則不需要任何資料。目前http請求是唯一使用資料的請求，支援的物件型別包括位元組、類檔案物件和可迭代物件。

headers可選字典，通過修改可以模擬正常的瀏覽器訪問。可以直接匯入字典，也可以通過函式add_header()將字典加入函式。headers的內容是user-agent後面的內容

169、爬蟲**舉例2：

import urllib.request
import urllib.parse
import json
content=
input
('請輸入需要翻譯的內容：'
)url=
''#request url
'''方法一
head=
head[
'user-agent']=
'''data=
data[
'i']
=content
data[
'from']=
'auto'
data[
'to']=
'auto'
data[
'smartresult']=
'dict'
data[
'client']=
'fanyideskweb'
data[
'salt']=
'1536137270843'
data[
'sign']=
'22bea357b7e89f725e8975b62fce9993'
data[
'doctype']=
'json'
data[
'version']=
'2.1'
data[
'keyfrom']=
'fanyi.web'
data[
'action']=
'fy_by_clickbuttion'
data[
'yporesult']=
'false'
data=urllib.parse.
urlencode
(data)
.encode
('utf-8'
)rep=urllib.request.
request
(url,data)
rep.
add_header
('user-agent',)
response=urllib.request.
urlopen
(rep)
html=response.
read()
.decode
('utf-8'
)target=json.
loads
(html)
print
('翻譯結果是：%s'
%(target[
'translateresult'][
0][0
]['tgt'])
)

可以在生成前引數修改，或者生成後通過add_header(key,valu)新增

170、ip位址的單位時間內訪問不能超過乙個閾值，所以有兩種方法，一種是延遲提交時間，一種是進行**。延遲可以通過time模組來直接完成即可。

(1)time()直接通過乙個大的迴圈，每一步停留time()秒即可

(2)**，相當於用**的ip位址代替你的ip位址來實現隱藏的目的。首先要先設定乙個字典，引數是，**ip位址可以從這個**找格式為：

proxy_support=urllib.request.
proxyhandler
()，其中字典為

然後定製，建立乙個opener，相當於私人定製，格式為：

opener=urllib.request.
build_opener
(proxy_support)

然後安裝opener，將其安裝到系統中，這是乙個一勞永逸的辦法，之後只要使用opener就可以使用定製好的opener，格式為：

urllib.request.
install_opener
(opener)

如果不想進行安裝替換掉預設的opener，可以通過opener.open(url)臨時呼叫

Python語言學習（十二）

Python語言學習

語言學習 Python學習

Go語言學習筆記十二 Go語言切片

Python語言學習（十二）

Python語言學習

語言學習 Python學習

Go語言學習筆記十二 Go語言切片

相關推薦