python爬蟲 10 爬蟲例項（6）

# -*- coding: utf-8 -*-
import re
import requests
import time
f = open('鬥破蒼穹.txt', 'a+')
def get_info(url):
response = requests.get(url, headers = header)
if response.status_code == 200:
contents = re.findall('', response.content.decode('utf-8'), re.s)
for content in contents:
f.write(' ' + content + '\n')
else:
pass
if __name__ == '__main__':
for i in range(1, 1665):
url = '' + str(i) + '.html'
get_info(url)
time.sleep(1)
f.close()

**分析

（1）1-5行，匯入必要的第三方包；給定請求頭header；以「追加」的方式新建文字文件，命名為』鬥破蒼穹.txt』；

（2）11-18行，定義get_info()函式，獲取網頁資訊。若網頁訪問的狀態碼為200，表示正常訪問，則用re模組獲取文字資訊，對 response.content.decode(『utf-8』)這個類進行findall操作，尋找格式為

的所有文字資訊；re.s表示換行匹配。最後，將資訊寫入檔案；

若狀態碼不是200，則表示網頁不可訪問，例如404forbidden等。則直接pass，不進行任何操作；

（3）20-25行，定義主程式入口；對1665個可能有文字資訊的頁面執行get_info()函式；最後用time.sleep()做假裝的睡眠，模仿瀏覽器訪問。

10個python爬蟲入門例項

1.爬取強大的bd頁面，列印頁面資訊 import requests 匯入爬蟲的庫，不然呼叫不了爬蟲的函式 response requests.get 生成乙個response物件 print 狀態碼 str response.status code 列印狀態碼 print response.tex...

Python爬蟲例項

中國大學排名專案功能描述輸出大學排名資訊的螢幕輸出排名，大學名稱，總分技術路線 requests bs4 定向爬蟲僅對輸入url進行爬取，不擴充套件爬取程式的結構設計步驟1 從網路上獲取大學排名網頁內容步驟2 提取網頁內容中資訊到合適的資料結構二維列表步驟3 利用資料結構展示並...

python 爬蟲例項

coding utf 8 import re import sys import os from time import sleep from bs4 import beautifulsoup import requests reload sys sys.setdefaultencoding utf...

python爬蟲 10 爬蟲例項（6）

10個python爬蟲入門例項

Python爬蟲例項

python 爬蟲例項

相關推薦