爬蟲實戰嗅事百科段子多頁爬取

假如我們想爬取糗事百科( http://ww qiushibaike.com/)上的段子，也可以編寫對應的python網路爬蟲實現。

本專案糗事百科網路爬蟲的實現思路及步驟如下:

分析各頁間的**規律，構造**變數，並可以通過for迴圈實現多頁內容的爬取

構建乙個自定義函式，專門用來實現爬取某個網頁上的段子，包括兩部分內容，一部分是對應使用者，一部分是使用者發表的段子內容。該函式功能實現的過程為:首先，模擬成瀏覽器訪問，觀察對應網頁源**中的內容，將使用者資訊部分與段子內容部分的格式寫成正規表示式。隨後，根據各正規表示式分別提取出該頁中所有的使用者與所有的內容，然後通過for迴圈遍歷段子內容並將內容分別賦給對應的變數，這裡變數名是有規律的，格式為「content+順序號」，接下來再通過for迴圈遍歷對應使用者，並輸出該使用者對應的內容。

通過for迴圈分別獲取多頁的各頁url鏈結，每頁分別呼叫一次getcontent ( url,page)函式。

使用者

審查元素

檢視源**

多審查幾個使用者，

於是我們可以定義規則

userpat=
str(
'')

內容

於是可以定義內容正則

contentpat =
'(.*?)
'

import urllib.request
import re
from urllib import request
defgetcontent
(url,page)
: headers =
("user-agent",)
opener = urllib.request.build_opener(
) opener.addheaders =
[headers]
# 將opener安裝為全域性
urllib.request.install_opener(opener)
url_request = request.request(url)
html1 = request.urlopen(url_request, timeout=10)
data=html1.read(
).decode(
'utf-8'
)#構建使用者正規表示式
userpat=
str('')
#構建內容正規表示式
contentpat =
'(.*?)
'#尋找出所有使用者
userlist=re.
compile
(userpat,re.s)
.findall(data)
#尋找所有的內容
contentlist=re.
compile
(contentpat,re.s)
.findall(data)
x=1#通過for迴圈遍歷段子內容並將內容賦值給對應的變數
for content in contentlist:
content=content.replace(
'\n',''
)#用字串作為變數名，先將對應的字串賦值給乙個變數
name=
"content"
+str
(x)exec
(name+
'=content'
) x+=
1 y=
1#通過for迴圈遍歷使用者，並始終輸出 該使用者對應的內容
for user in userlist:
name=
'content'
+str
(y)print
('使用者'
+str
(page)
+str
(y)+
'是:'
+user)
print
('內容是:'
exec()是乙個十分有趣且使用的內建函式，不同於eval()函式只能執行計算數學表示式的結果的功能，exec()能夠動態地執行複雜的python**，能夠十分強大
首先是乙個簡單的小例子，**如下：
i =
12j =
13exec
("answer=i*j"
)print
("answer is %s"
%answer)
答案·156
 爬蟲實戰（二） 爬取糗事百科段子
源 為 from urllib.request import request,urlopen import requests import re import time def gethtml url headers 設定虛擬headers資訊 request request url,headers...
爬取糗事百科段子
user bin env python coding utf 8 author holley file baike1.py datetime 4 12 2018 14 32 description import requests import re import csv from bs4 impor...
Scrapy 爬取糗事百科段子
1.python爬蟲實戰一之爬取糗事百科段子 2.在工作目錄建立myproject scrapy startproject myproject3.編寫 myproject myproject items.py coding utf 8 define here the models for your ...

爬蟲實戰 嗅事百科段子多頁爬取

爬蟲實戰（二） 爬取糗事百科段子

爬取糗事百科段子

Scrapy 爬取糗事百科段子

相關推薦

爬蟲實戰嗅事百科段子多頁爬取

爬蟲實戰（二）爬取糗事百科段子