Python 中國大學排名定向爬蟲

**來自於中國大學mooc北京理工大學pythont教學團隊：

1.函式版

#
中國大學定向爬蟲
import
requests
from bs4 import
beautifulsoup
import
bs4 
defgethtmltext(url):
try:
r = requests.get(url, timeout=30)
r.raise_for_status()
return
r.text
except
: 
return
""def
fillunivlist(ulist, html):
soup = beautifulsoup(html, "
html.parser")
for tr in soup.find('
tbody
').children:
ifisinstance(tr, bs4.element.tag):
tds = tr('td'
) 
defprintunivlist(ulist, num):
tplt = "
\t^10}\t
"print(tplt.format("
排名","
學校名稱
","總分
",chr(12288)))
for i in
range(num):
u=ulist[i]
print(tplt.format(u[0],u[1],u[2],chr(12288)))
defmain():
uinfo =
#url = ''
url = '
'html =gethtmltext(url)
fillunivlist(uinfo, html)
printunivlist(uinfo, 20) #
20 univs
main()

2.修改無函式版用於學習

#
中國大學定向爬蟲
import
requests
from bs4 import
beautifulsoup
import
bs4ulist = 
url = '
'try
: r = requests.get(url, timeout=30)
r.raise_for_status()
except
: 
print("
爬取失敗")
html =r.text
soup = beautifulsoup(html, "
html.parser")
for tr in soup.find('
tbody
').children:
ifisinstance(tr, bs4.element.tag):
tds = tr('td'
) 
tplt = "
\t^10}\t
"print(tplt.format("
排名","
學校名稱
","總分
",chr(12288))) #
使得中文對齊
num = 20
for i in range(num): #
列印前20名
u=ulist[i]
print(tplt.format(u[0],u[1],u[2],chr(12288)))
print("
爬取完畢
")

中國大學排名定向爬取

步驟一從網路上獲取大學排名網頁內容步驟二提取網頁內容中資訊到合適的資料結構步驟三利用資料結構展示並輸出結果通過右鍵檢視其網頁源可得到如下介面我們在這個介面找到如下資訊，可以發現，這些資訊是在tbody標籤下的，tr下面的td中就是我們想要爬取的資訊。我們僅爬取前四個td值進行返回，第...

中國大學排名定向爬蟲

功能描述輸入大學排名url鏈結輸出大學排名資訊的螢幕輸出排名，大學名稱，總分技術路線 requests bs4 定向爬蟲僅對輸入url進行爬取，不擴充套件爬取程式的結構設計步驟1 從網路上獲取大學排名網頁內容 gethtmltext 步驟2 提取網頁內容中資訊到合適的資料結構 fi...

爬蟲例項（中國大學排名定向排名）

功能輸出大學排名資訊的螢幕輸出排名，大學名稱，總分技術路線 requests bs4 定向爬蟲僅對輸入的url進行爬取，不擴充套件爬取步驟 1.從網路上獲取大學排名網頁內容 2.提取網頁中資訊到合適的資料結構 3.利用資料結構展示並輸出結果程式的結構設計 1.從網路上獲取大學排名網頁內...

Python 中國大學排名定向爬蟲

中國大學排名定向爬取

中國大學排名定向爬蟲

爬蟲例項（中國大學排名定向排名）

相關推薦