python爬蟲小練習 小說爬取

2021-10-12 07:02:06 字數 1145 閱讀 8863

import requests

from bs4 import beautifulsoup

if __name__ ==

'__main__'

: headers =

# 對首頁的頁面資料進行爬取

url =

''page_text = requests.get(url=url,headers=headers)

.text

# 在首頁中解析出章節的標題和詳情頁的url

# 1.例項化beautifulsoup物件,需要頁面源**資料載入到物件中

soup = beautifulsoup(page_text,

'lxml'

)# 解析出章節的標題和詳情頁的url

li_list = soup.select(

'.book-mulu > ul > li'

)with

open

('./sanguoyanyi.txt'

,'w'

,encoding=

'utf-8'

)as fp:

for li in li_list:

title = li.a.string

detial_url =

''+ li.a[

'href'

]# 對詳情頁url發起請求,解析出詳情頁內容

detial_page_text = requests.get(url=detial_url,headers=headers)

.text

# 解析出詳情頁中相關的章節內容

detial_soup = beautifulsoup(detial_page_text,

'lxml'

) div_tag = detial_soup.find(

'div'

,class_=

'chapter_content'

) content = div_tag.text

fp.write(title+

':'+content+

'\n'

)print

(title+

'爬取成功!!'

)

Python爬蟲例項,爬取小說

import pprint import requests from bs4 import beautifulsoup 獲取原始碼 defget source url r requests.get url if r.status code 200 print r.status code 錯誤 rai...

python小爬蟲 爬小說(html

先挑個軟柿子捏捏吧,硬的現在還不行。就結合網頁html的各種標籤,爬取已在原始碼內的資訊。就觀察標籤的的特點,利用bs4中的beautifulsoup 進行獲取資訊。如下 import requests from bs4 import beautifulsoup 使用beautifulsoup 解析...

爬蟲之小說爬取

以筆趣閣 為例,爬取一念永恆這本 具體 如下 1 from bs4 import beautifulsoup 2from urllib import request 3import requests 4importre5 import sys6 def down this chapter chapt...