21天打造分布式爬蟲 requests庫（二）

簡單使用

import
requests
response = requests.get("
")#text返回的是unicode的字串，可能會出現亂碼情況
#print(response.text)
#content返回的是位元組，需要解碼
print(response.content.decode('
utf-8'))
#print(response.url) #
#print(response.status_code) #200
#print(response.encoding) #iso-8859-1

新增headers和params

import
requests
params =
headers =
response = requests.get("
s",params=params,headers=headers)
#content返回的是位元組，需要解碼
爬去拉鉤網職位資訊
import
requests
url = "
"data =
headers =
response = requests.post(url,data=data,headers=headers)
#print(response.text)
print(type(response.text)) #
print(type(response.json())) #
print(response.json()) #
獲取為字典的形式

import
requests
proxy = 
response = requests.get("
",proxies=proxy)
print(response.content.decode('
utf-8
'))

#
_*_ coding:utf-8 _*_
import
requests
#1. 建立session物件，可以儲存cookie值
ssion =requests.session()
#2. 處理 headers
headers = 
#3. 需要登入的使用者名稱和密碼
data = 
#4. 傳送附帶使用者名稱和密碼的請求，並獲取登入後的cookie值，儲存在ssion裡
ssion.post("
", data =data)
#5. ssion包含使用者登入後的cookie值，可以直接訪問那些登入後才可以訪問的頁面
response = ssion.get("
")#6. 列印響應內容
print(response.text)
 21天打造分布式爬蟲 urllib庫（一）
encoding utf 8 from urllib import request res request.urlopen print res.readlines urlopen的引數 def urlopen url,data none,timeout socket.global default t...
21天pyhton分布式爬蟲 爬蟲基礎2
http協議 全稱是hypertext transfer protocol，中文意思是超文字傳輸協議，是一種發布和接收html頁面的方法。伺服器端口號為80埠 https 協議 是http協議的加密版本，在http下加入了ssl層，伺服器端口號是443埠 當使用者在瀏覽器的位址中輸入乙個url並按回...
爬蟲 分布式爬蟲
爬蟲的本質 很多搞爬蟲的總愛吹噓分布式爬蟲，彷彿只有分布式才有逼格，不是分布式簡直不配叫爬蟲，這是一種很膚淺的思想。分布式只是提高爬蟲功能和效率的乙個環節而已，它從來不是爬蟲的本質東西。爬蟲的本質是網路請求和資料處理，如何穩定地訪問網頁拿到資料，如何精準地提取出高質量的資料才是核心問題。分布式爬蟲只...

21天打造分布式爬蟲 requests庫（二）

21天打造分布式爬蟲 urllib庫（一）

21天pyhton分布式爬蟲爬蟲基礎2

爬蟲分布式爬蟲

21天打造分布式爬蟲 requests庫（二）

21天打造分布式爬蟲 urllib庫（一）

21天pyhton分布式爬蟲 爬蟲基礎2

爬蟲 分布式爬蟲

相關推薦

21天pyhton分布式爬蟲爬蟲基礎2

爬蟲分布式爬蟲