爬取zol索尼相機排行榜

2022-09-08 20:06:12 字數 1342 閱讀 3201

乙個很有趣的個人部落格,不信你來撩 fangzengye.com

import requests

import re

import json

from bs4 import beautifulsoup

def

get_one_page

(url)

:user_agent =

headers =

response = requests.get(url,headers)

return response.text

獲取網頁內容

def

get_information

(html_text)

: pattern = re.

compile

('shtml">(.)

.*?"rank__price">(.)

.*?(.*?)'

, re.s)

items = re.findall(pattern,html_text)

for item in items:

yield

正則匹配

yield整合起資料結構

finaall返回匹配到的列表,裡面為元組

def

recording

(information)

:with

open

('豆瓣top250.txt'

,'a'

,encoding=

'utf-8'

)as f:

f.write(json.dumps(information,ensure_ascii=

false)+

'\n'

)

將爬到的資訊寫入檔案

def

main()

:for i in

range(0

,1):

response = get_one_page(

'') html_text = get_information(response)

for m in html_text:

recording(m)

print

('正在爬取第'

+str

(i)+

'頁')

print

('爬取完畢!'

)main(

)

爬取zol索尼相機排行榜

import requests import re import json from bs4 import beautifulsoup defget one page url user agent headers response requests.get url,headers return re...

爬取貓眼電影排行榜

匯入我們需要的模組 import reimport requests 一 獲取網頁內容 1 宣告目標url,就是爬取的 位址 base url 2 模仿瀏覽器 headers 3 發起請求 response requests.get base url,headers headers 4 接收響應的資...

爬取豆瓣電影推薦排行榜

import requests from bs4 import beautifulsoup class dianying def html url self,url html requests.get url soup beautifulsoup html.text,lxml pai soup.se...