Python爬取新筆趣閣小說

1、首先就是先安裝第三方庫requests,這個庫，開啟cmd，輸入pip install requests回車就可以了，等待安裝。然後測試

import resquests

2、然後就可以編寫程式了，首先獲取網頁源**，也可以在瀏覽器檢視和這個進行對比。

s = requests.session()
url = ''
html = s.get(url)
html.encoding = 'utf-8'

執行後顯示網頁源**

按f12檢視

說明這是對的，

3、然後進行獲取網頁源**中的每章url，進行提取

caption_title_1 = re.findall(r'.*?',html.text)
print(caption_title_1)

由於過多，就剪下了這些，看到這些url，你可能想問為什麼不是完整的，這是因為網頁中的本來就不完整，需要進行拼湊得到完整的url

這樣就完成了，就可以得到完整的了

4、下面就是獲取章節名，和章節內容

#獲取章節名
name = re.findall(r'',r1.text)[0] # 提取章節名
print(name)
file_name.write(name)
file_name.write('\n')
# 獲取章節內容
chapters = re.findall(r'(.*?)
',r1.text,re.s)[0] #提取章節內容
chapters = chapters.replace(' ', '') # 後面的是進行資料清洗
chapters = chapters.replace('readx();', '')
chapters = chapters.replace('& lt;!--go - - & gt;', '')
chapters = chapters.replace('', '')
chapters = chapters.replace('()', '')

5、轉換字串和儲存檔案

# 轉換字串
s = str(chapters)
s_replace = s.replace('
',"\n")
while true:
index_begin = s_replace.find("
index_end = s_replace.find(">",index_begin+1)
if index_begin == -1:
break
s_replace = s_replace.replace(s_replace[index_begin:index_end+1],"")
pattern = re.compile(r' ',re.i)
fiction = pattern.sub(' ',s_replace)
file_name.write(fiction)
file_name.write('\n')

6、完整的**

import requests
import re
s = requests.session()
url = ''
html = s.get(url)
html.encoding = 'utf-8'
# 獲取章節
caption_title_1 = re.findall(r'.*?',html.text)
# 寫檔案
path = r'c:\users\administrator\pycharmprojects\untitled\title.txt' # 這是我存放的位置，你可以進行更改
file_name = open(path,'a',encoding='utf-8')
for i in caption_title_1:
caption_title_1 = ''+i
# 網頁源**
s1 = requests.session()
r1 = s1.get(caption_title_1)
r1.encoding = 'utf-8'
# 獲取章節名
name = re.findall(r'',r1.text)[0]
print(name)
file_name.write(name)
file_name.write('\n')
# 獲取章節內容
chapters = re.findall(r'(.*?)
',r1.text,re.s)[0]
chapters = chapters.replace(' ', '')
chapters = chapters.replace('readx();', '')
chapters = chapters.replace('& lt;!--go - - & gt;', '')
chapters = chapters.replace('', '')
chapters = chapters.replace('()', '')
# 轉換字串
s = str(chapters)
s_replace = s.replace('
',"\n")
while true:
index_begin = s_replace.find("
index_end = s_replace.find(">",index_begin+1)
if index_begin == -1:
break
s_replace = s_replace.replace(s_replace[index_begin:index_end+1],"")
pattern = re.compile(r' ',re.i)
fiction = pattern.sub(' ',s_replace)
file_name.write(fiction)
file_name.write('\n')
file_name.close()

7、修改你想要爬取**url後再進行執行，如果出現錯誤，可能是存放位置出錯，可以再儲存檔案位址修改為你要存放的位址，然後就結束了

這就是爬取的完整的**，是不是很簡單，，希望能對你所幫助

爬蟲實戰一爬取新筆趣閣小說1 0

每天早上上班地鐵上很多人都在看打發時間，我也是乙個玄幻迷，那麼就從寫乙個的爬蟲開始吧。可以開始了一確定目標和思路瀏覽之後做出如下打算爬蟲功能 1 輸入要搜尋的 2 跳到目標 url 3 按章節爬取 4 先按伏天氏來吧，覺都不是打廣告啊，這我看了好久了，啥主角就愛裝比，不建議大家...

爬蟲實戰一爬取新筆趣閣小說2 0

在昨天的基礎上增加了以下內容一獲取簡介 ddef get jianjie pattern description re.compile r description scontent resp description pattern description.findall resp2 resp ...

Python爬蟲筆趣閣小說爬取

import requests from lxml import etree以我有百萬技能點為例，在筆趣閣搜尋進入目錄頁，複製目錄頁url 對目錄頁的每個章節的url進行爬取，分析網頁利用xpath定位每個章節的url然後進行爬取，然後重新構造url。目錄每一章節的url href html e...

Python爬取新筆趣閣小說

爬蟲實戰 一 爬取新筆趣閣小說1 0

爬蟲實戰 一 爬取新筆趣閣小說2 0

Python爬蟲 筆趣閣小說爬取

相關推薦

爬蟲實戰一爬取新筆趣閣小說1 0

爬蟲實戰一爬取新筆趣閣小說2 0

Python爬蟲筆趣閣小說爬取