爬取當當網 Top 500 本五星好評書籍

開啟這個書籍排行榜的位址

可以看到一下網頁

每一頁顯示 20 本書

你可以發現位址變了

也就是我們翻到第幾頁的時候

鏈結位址的最後乙個引數會跟著變

那麼我們等會在 python 中可以用乙個變數

來實現獲取不同頁數的內容

可以看到

我們通過 get 請求，獲得的請求頭

伺服器返回的資料

我們要的就是前 500 本書的排名

書名位址

作者推薦指數

五星評分次數 **

通過原始碼我們可以看到

這些資訊被放在了標籤中

主要思路

使用 page 變數來實現翻頁

我們使用 requests 請求當當網

然後將返回的 html 進行正則解析

由於我們暫時還沒學到資料庫

所以解析完之後就把內容存到檔案中

def main(page):
url = '' + str(page)
html = request_dandan(url)
items = parse_result(html) # 解析過濾我們想要的資訊
for item in items:
write_item_to_file(item)

首先需要請求當當網，使用requests模組中的get方法，發起get請求

def request_dandan(url):
try:
response = requests.get(url)
if response.status_code == 200:
return response.text
except requests.requestexception:
return none

這裡可以得到伺服器返回的響應內容也就是源**，這裡就需要對源**進行解析

使用正規表示式獲取我們想要的關鍵資訊

獲取到了之後我們封裝一下資料

def parse_result(html):
pattern = re.compile('.*?list_num.*?(\d+).
.*?.*?¥(.*?).*?',re.s)
items = re.findall(pattern,html)
for item in items:
yield

如果大家學過beautifulsoup，那麼解析這段**就很容易了

最後寫到檔案中

def write_item_to_file(item):
print('開始寫入資料 ====> ' + str(item))
with open('book.txt', 'a', encoding='utf-8') as f:
f.write(json.dumps(item, ensure_ascii=false) + '\n')
f.close()

整合如下：

import requests
import re
import json
def write_item_to_file(item):
print('開始寫入資料 ====> ' + str(item))
with open('dangdang.txt', 'a', encoding='utf-8') as f:
f.write(json.dumps(item, ensure_ascii=false) + '\n')
f.close()
str_item = str(item)
def func2(page):
url = '' + str(page)
html = request_dandan(url)
items = parse_result(html)
for item in items:
write_item_to_file(item)
def request_dandan(url):
try:
response = requests.get(url)
if response.status_code == 200:
return response.text
except requests.requestexception:
return none
def parse_result(html):
pattern = re.compile('.*?list_num.*?(\d+).
.*?.*?¥(.*?).*?',re.s)
items = re.findall(pattern,html)
print(items)
return items
if __name__ == '__main__':
for i in range(1, 26):
func2(i)
# text = "jgod is a handsome boy,but he is a ider"
# print (re.findall('\w*o\w*', text)) # 查詢有o的單詞

爬蟲實戰爬取當當網top500書籍

1.這個好像是爬蟲入門必備專案，練練手練習 2.requests bs4模式，因為這個比較簡單，不多說廢話了。usr bin env python coding utf 8 爬取當當網top500書籍 import requests from bs4 import beautifulsoup f...

爬取當當網 Top 500 本五星好評書籍

開啟這個書籍排行榜的位址很容易就能定位到書籍資訊難點在於如何寫乙個正則。直接看 import requests import reimport json headers defwrite item to file item with open book.txt a encoding utf 8 ...

爬蟲爬取當當網書籍

初學者學習爬蟲爬取當當網會比較容易，因為噹噹沒有反爬蟲import requests from lxml import html name input 請輸入要搜尋書籍的資訊 1.準備url url format name start 1 while true print start start 1...

爬取當當網 Top 500 本五星好評書籍

爬蟲實戰 爬取當當網top500書籍

爬取當當網 Top 500 本五星好評書籍

爬蟲爬取當當網書籍

相關推薦

爬蟲實戰爬取當當網top500書籍