爬取某學校官網通知

需求：有時候對於學生黨來說，每次大考之後，查分數都是特別心急，特別是對於學校官網的通知等，本筆記主要關注這一點，以下是實現內容。

# 使用 urllib 和 beautifulsoup 庫實現
import urllib.request
import urllib.parse
from bs4 import beautifulsoup
# 一、獲取**中的html源**儲存為python list物件
requst = urllib.request.request(
'')# 二、因**設定有反爬蟲，需要新增請求頭
requst.add_header(
'user-agent'
,'mozilla/4.0 (compatible; msie 5.5; windows nt)'
)# 新增請求頭，模仿人使用瀏覽器訪問頁面
response = urllib.request.urlopen(requst)
html = response.read(
)# 三、使用 beausoup() 物件實現對 html 頁面的解析 ，使用 python 自帶的解析方式 『html.parser』
bs = beautifulsoup(html,
'html.parser'
)# 四、定位 通知資訊所在的 table 標籤，使用find_all() 方法，class 類選擇器查詢
tables = bs.find_all(
'table'
, class_=
"in_list2"
)# 兩種寫法
# 五、查詢 table **下的行標籤 tr 獲取 list 物件
tab = tables[0]
.find_all(
'tr'
)# print(tab)
# 六、 遍歷 tr 得到 本頁面的所有通知
print
('--------------------------------'
)for tr in tab:
for td in tr.find_all(
'td'):
print
(, td.find_all(
'p')[1
].get_text())
print
(, td.p.get_text()[
2:])
print(,
''+ td.a[
'href'][
2:])
print
('-----------分割線----------------'
)

以上是簡單的爬取頁面，其中還有很多功能沒有實現，比如：把通知存放到資料庫，把最新通知傳送到自己手機，不用人死盯著頁面重新整理等等，後續功能在慢慢學習中，有錯誤的方法還請各位指正，非常感謝這篇文章有緣見到你，謝謝。

學校官網資料的爬取

import requests import re import bs4 from bs4 import beautifulsoup as bs for i in range 1,11 獲取11頁的新聞資料 if i 1 url else url str i html r requests.get ...

利用Python網路爬蟲爬取學校官網十條標題

利用python網路爬蟲爬取學校官網十條標題案例 author j date 2018 03 06 匯入需要用到的庫檔案 import urllib.request import reimport pymysql 建立乙個類用於獲取學校官網的十條標題 class getnewstitle 建構函式...

python爬取學校新聞

這是我做的第乙個python爬蟲專案，在這裡與大家分享出來目標下面展示一下我的 import requests from bs4 import beautifulsoup sessions requests.session i 1 對應第1頁資訊 page str i if i 1 newsma...

爬取某學校官網通知

學校官網資料的爬取

利用Python網路爬蟲爬取學校官網十條標題

python爬取學校新聞

相關推薦