Python利用Requests庫寫爬蟲(一)

2021-08-13 19:38:30 字數 3167 閱讀 8726

#-*- coding:utf-8 -*-

import requests

url = ''

r = requests.get(url)

print r.text

#-*- coding:utf-8 -*-

import requests

url = ''

payload =

r = requests.get(url, params=payload)

print r.text

#-*- coding:utf-8 -*-

import requests

url1 = ''#登陸位址

url2 = ""#需要登陸才能訪問的位址

data=

"accept-encoding":"gzip",

"accept-language":"zh-cn,zh;q=0.8",

"referer":"",

}res1 = requests.post(url1, data=data, headers=headers)

res2 = requests.get(url2, cookies=res1.cookies, headers=headers)

print res2.content#獲得二進位制響應內容

print res2.raw#獲得原始響應內容,需要stream=true

print res2.raw.read(50)

print type(res2.text)#返回解碼成unicode的內容

print res2.url

print res2.history#追蹤重定向

print res2.cookies

print res2.cookies['example_cookie_name']

print res2.headers

print res2.headers['content-type']

print res2.headers.get('content-type')

print res2.json#講返回內容編碼為json

print res2.encoding#返回內容編碼

print res2.status_code#返回http狀態碼

print res2.raise_for_status()#返回錯誤狀態碼

#-*- coding:utf-8 -*-

import requests

s = requests.session()

url1 = ''#登陸位址

url2 = ""#需要登陸才能訪問的位址

data=

"accept-encoding":"gzip",

"accept-language":"zh-cn,zh;q=0.8",

"referer":"",

}prepped1 = requests.request('post', url1,

data=data,

headers=headers

).prepare()

s.send(prepped1)

'''也可以這樣寫

res = requests.request('post', url1,

data=data,

headers=headers

)prepared = s.prepare_request(res)

# do something with prepped.body

# do something with prepped.headers

s.send(prepared)

'''prepare2 = requests.request('post', url2,

headers=headers

).prepare()

res2 = s.send(prepare2)

print res2.content

#-*- coding:utf-8 -*-

import requests

s = requests.session()

url1 = ''#登陸位址

url2 = ""#需要登陸才能訪問的頁面位址

data=

"accept-encoding":"gzip",

"accept-language":"zh-cn,zh;q=0.8",

"referer":"",

}res1 = s.post(url1, data=data)

res2 = s.post(url2)

print(resp2.content)

sessionapi

>>> r = requests.put("")

>>> r = requests.delete("")

>>> r = requests.head("")

>>> r = requests.options("")

在cmd下執行,遇到個小錯誤:

unicodeencodeerror:'gbk' codec can't encode character u'\xbb' in   

position 23460: illegal multibyte sequence

分析:

1、unicode是編碼還是解碼

unicodeencodeerror
很明顯是在編碼的時候出現了錯誤

2、用了什麼編碼

'gbk' codec can't encode character
使用gbk編碼出錯

確定當前字串,比如

#-*- coding:utf-8 -*-

import requests

url = ''

r = requests.get(url)

print r.encoding

>utf-8

已經確定html的字串是utf-8的,則可以直接去通過utf-8去編碼。

print r.text.encode('utf-8')

Python爬蟲 Request模組

文章說明了request模組的意義,且強調了request模組使用更加方便。接下來介紹幾種常用的request操作,並且會在後續補充說明一些特定用法。匯入檔案 import requests一 請求 右邊為請求語句,返回值為response回應 r requests.get r requests.p...

python爬蟲利器 request庫

request庫比urllib2庫更為高階,因為其功能更強大,更易於使用。使用該庫可以十分方便我們的抓取。基本請求 r requests.get r requests.post r requests.put r requests.delete r requests.head r requests.o...

Python 使用request傳送http請求

requests.get headers response requests.post login headers headers,data data response requests.post login allow redirects false 注 若不禁止重定向,則當響應是302時,req...