Python爬蟲request庫出現問題如何解決？

1、requests異常情況

在進行網路訪問時，經常會遇到各種錯誤的情況發生，requests庫的主要異常情況如下：

requests.urlrequired url缺失異常

requests.toomanyredirects 超過最大重定向次數，產生重定向異常

requests.connection 網路連線錯誤異常，如dns查詢失敗、拒絕連線等

requests.timeout 請求url超時，產生超時異常

requests.httperror http錯誤異常

requests.connecttimeout 連線遠端伺服器超時異常

在requests庫中所有requests顯式丟擲的異常都繼承自 requests.exceptions.requestexception 。

2、requests異常處理

當我們無論是get/post請求後賦值給response變數，response返回所有內容或者丟擲異常，同時在response物件還提供乙個方法——

response.raise_for_status()，其作用是當訪問網頁後的http狀態碼不是200時，產生requests.httperror。基於此，就不用判斷很多種狀態碼不為200情況了，在大批量爬取網頁內容時，只要出現httperror異常，直接記錄或者跳過，爬完所有資料後在進行處理。

3、異常處理小例子

#-- coding: utf-8 --

created on tue apr 16 23:08:21 2019

@author: www.lizenghai.com

import requests

urls = [『『『

for i,u in enumerate(urls):

print(u)

timeout = 3

if i ==0:

#這裡只對第乙個url(正確的url)進行超時測試

#在千分之一秒內肯定無法完成連線的，所以會報超時的錯誤。

timeout = 0.001

try:

response = requests.get(u, timeout=timeout)

response.raise_for_status() # 檢查http狀態碼是否為200

except requests.connecttimeout:

print(『超時!』)

print(『http狀態碼非200』)

except exception as e:

print(『未進行容錯處理的情況：』, e)

執行上述例子，將返回如下結果：

超時!/404

http狀態碼非200

xx/未進行容錯處理的情況： httpconnectionpool(host=』www.lizenghai.comxx』, port=80): max retries exceeded with url: / (caused by newconnectionerror(『: failed to establish a new connection: [errno 11004] getaddrinfo failed』))

Python爬蟲request庫出現問題如何解決？

Python爬蟲 Request模組

python爬蟲利器 request庫

爬蟲 python（二）初識request

Python爬蟲request庫出現問題如何解決？

Python爬蟲 Request模組

python爬蟲利器 request庫

爬蟲 python（二）初識request

相關推薦