Python 爬蟲（獲取小說）

以《筆趣閣》****為例

需求：python3版本以上

安裝方法如下：

先安裝python3-pip，然後檢查下版本，如果版本可以公升級，就--upgrade pip 一下，然後再安裝beautifulsoup4

sudo apt-get install python3-pip pip3 --version pip3 install --upgrade pip

pip3 install beautifulsoup4

**如下：

#!/usr/bin/env python3
from urllib import request,parse
from bs4 import beautifulsoup
import time
def search_book(bookname):
url = '' + parse.quote(bookname)
response = request.urlopen(url)
content = response.read().decode('gbk')
soup = beautifulsoup(content,'html.parser')
menu = 
key = 0
for row in soup.find('table').find_all('tr'):
td1 = row.select('td:nth-of-type(1)')
td3 = row.select('td:nth-of-type(3)')
if(td1 and td3):
name = td1[0].find('a').string
href = td1[0].find('a').get('href')
author = td3[0].string
key += 1
if(menu):
select_key = -1
while(select_key >= key or select_key < 0):
return menu[int(select_key)]
return 
def get_novel_menu(url):
response = request.urlopen(url)
content = response.read().decode('gbk')
soup = beautifulsoup(content, 'html.parser')
list = 
for dd in soup.find('div',id="list").find('dt').find_next('dt').find_all_next('dd'):
title = dd.find('a').string
href = dd.find('a').get('href')
return list
def get_novel_content(title,url):
headers = 
response = request.urlopen(request.request(url,headers=headers))
content = response.read().decode('gbk')
soup = beautifulsoup(content, 'html.parser')
text = soup.find('div',id="content").get_text()
return title + "\r\n" + text
info = 
while(info == ):
bookname = input('請輸入你要查詢的**名：')
info = search_book(bookname)
menu_lists = get_novel_menu(info['href'])
if(menu_lists == ):
exit(0)
for list in menu_lists:
content = get_novel_content(list['title'],list['href'])
f = open('/home/novel.txt', 'a')
f.write(content)
f.close()
time.sleep(0.1)

python 爬蟲，抓取小說

coding utf 8 from bs4 import beautifulsoup from urllib import request import re import os,time 訪問url，返回html頁面 defget html url req request.request url ...

python 爬蟲東宮小說

2k 網爬取最近大火的東宮借鑑之前看過的一段修改之後，進行簡單爬取。from urllib import request from bs4 import beautifulsoup url req request.request url response request.urlopen req...

Python製作爬蟲採集小說

開發工具 python3.4 作業系統 win8 主要功能去指定網頁爬目錄，按章節儲存到本地，並將爬過的網頁儲存到本地配置檔案。被爬名稱靈棺夜行出處本人親自碼的 print 獲取列表完成 url path url file.txt url r open url path,r url a...

Python 爬蟲（獲取小說）

python 爬蟲，抓取小說

python 爬蟲東宮小說

Python製作爬蟲採集小說

相關推薦