Python簡單爬取貓眼電影排名

1 匯入需要的庫

import requests
import re
import json

requests 用來實現url的請求，就相當於我們輸入**，瀏覽網頁。

re 使用正規表示式抓取我們想要的資料。

json 因為請求連線返回的是 json 型別的字串，因此我們需要用json庫轉換成普通字串儲存。

2 想要爬取到我們想要的資料，我們第一步當然是要獲取目標網頁的html**。

這裡我們以爬取貓眼電影電影排名榜單為例，它所在網頁的url為。

廢話不多說，先上**。

def
get_one_page
(url)
: headers=
response=requests.get(url,headers=headers)
if response.status_code==
200:
return response.text

上面的**中，構造了乙個輸入url，獲取目標網頁html文件的函式。

headers資訊包含了user-agent欄位資訊，它的意思是代表乙個瀏覽器，相當於我們偽裝成乙個瀏覽器進行訪問。

然後使用requests庫中的get()函式實現乙個get請求。

response.status_code==200 的意思是網頁請求成功。

然後，返回請求網頁的html文件，即return response.text，部分返回結果如下圖：

2 爬取到目標網頁的html後，需要對它進行解析，才能得到我們想要的資料。

def
parse_one_page
(html)
: pattern=re.
compile
('.*?board-index.*?>(.*?).*?name.*?a.*?>(.*?)'
,re.s)
items=re.findall(pattern,html)
print
(items)
for item in items:
yield

上面的**中，是我們構造的解析html的函式。

在這裡，我們使用了正規表示式匹配我們的資料，這個正規表示式，可以抓取電影排名和電影名稱。

然後，用生成器輸出結果，部分結果如下圖：

3 得到解析結果後，下一步就是儲存資料。

def
write_to_file
(content)
:with
open
('result1.txt'
,'a'
)as f:
f.write(json.dumps(content,ensure_ascii=
false)+
'\n'
)

使用json庫的dumps函式，可以把json型別的資料，轉換成普通的字串。

4 構造主函式，呼叫上面的函式。

def
main
(offset)
: url=
'?offset='
+str
(offset)
html=get_one_page(url)
tt=parse_one_page(html)
for n in tt:
write_to_file(n)

5 爬取的部分結果

6 完整的**

import requests
import re
import json
defget_one_page
(url)
: headers=
response=requests.get(url,headers=headers)
if response.status_code==
200:
return response.text
defwrite_to_file
(content)
:with
open
('result1.txt'
,'a'
)as f:
f.write(json.dumps(content,ensure_ascii=
false)+
'\n'
)def
main
(offset)
: url=
'?offset='
+str
(offset)
html=get_one_page(url)
tt=parse_one_page(html)
for n in tt:
write_to_file(n)
defparse_one_page
(html)
: pattern=re.
compile
('.*?board-index.*?>(.*?).*?name.*?a.*?>(.*?)'
,re.s)
items=re.findall(pattern,html)
print
(items)
for item in items:
yield
if __name__==
"__main__"
:for i in
range(10
):main(offset=i*10)
print
(i)

Python爬取貓眼電影

不多說，直接上 import requests import re import random import pymysql import time 連線資料庫 db pymysql.connect host localhost port 3306,user root passwd a db pyt...

爬取貓眼電影

有乙份工作需要我列出兩個電影院的每天電影排期資訊，我不想每次都要去貓眼上覆制貼上。所以做了個爬蟲功能能夠知道每天的電影排期資訊使用限制只能在當天使用，不能在前一晚上使用，後面我會再考慮修改 coding utf 8 import requests import re from bs4 imp...

python爬取貓眼電影排行

完整的如下在這裡閒著沒事，把解析html中的正則方法改用了xpath與beautifulsoup，只能說各有各的優點吧。正則的話，提取資訊可以連貫，一次性提取出所有需要的資訊，當然前提是你的正則式子沒有寫錯，所以說正則寫起來相比xpath與beautifulsoup來說要複雜一下，提取出錯後，除...

Python簡單爬取貓眼電影排名

Python爬取貓眼電影

爬取貓眼電影

python爬取貓眼電影排行

相關推薦