用python爬取2023年中國最好大學排名

在學習中國大學慕課網的python網路爬蟲與資訊提取時，有這麼一道題，要求我們爬取2023年的中國最好大學排名鏈結在這，按照題目要求很快便可以爬取到我需要的排名順序。**如下

import requests
from bs4 import beautifulsoup
import bs4
def getedhtml(url, code='utf-8'):
kv = #如果**有反爬蟲措施，需要改變headers，來偽裝自己
try:
r = requests.get(url, headers=kv, timeout=30)
r.raise_for_status()
r.encoding = code
return r.text
except:
return ''
def returned(html, list, num):
count = 0 #對加入列表中的資訊進行計數
soup = beautifulsoup(html, 'html.parser')
info = soup.find('tbody', 'hidden_zhpm').children
for tr in info:
if count >= num:
break #如果以及滿足所需要的高校數，就可以退出了
if isinstance(tr, bs4.element.tag):
count += 1
tds = tr.find_all('td')
def printed(list, num):
print('\t^10}\t'.format('排名', '高校', '分數', chr(12288)))
for i in range(num):
l = list[i]
print('\t^10}\t'.format(l[0], l[1], l[2], chr(12288)))
def main():
list = 
url = ''
num = int(input('請問要查詢2016前多少名的高校呢：'))
html = getedhtml(url)
returned(html, list, num)
printed(list, num)
main()

但是當我準備繼續爬取2023年的排名時，卻發現程式並沒有如期爬取我要的名次。並給我報出如下錯誤

typeerror: unsupported format string passed to nonetype.__format__

當我鎖定到源**中的排名部分時，發現了問題所在。

在圖中的排名子節點「1」所對應的地方並非是乙個完整的子節點，「1」並沒有被一對完整的標籤所包圍，所以tds[0]實際上是被第乙個標籤所包圍的所有內容，而這相當於把後續所有內容全給裝進去了，所以肯定沒有辦法對其進行string操作，來獲得其排名。

由於字元1仍是第一的標籤下的第乙個子節點，所以我打算通過bs4庫的contents方法來獲得這個排名，由於是第乙個子節點，那麼tds[0].contents[0]就是我們所需要的排名。重新修改的**如下

# -*- coding: utf-8 -*-
import requests
from bs4 import beautifulsoup
import bs4
def getedhtml(url, code='utf-8'):
kv = #**有反爬蟲措施，需要改變headers，來偽裝自己
try:
r = requests.get(url, headers=kv, timeout=30)
r. raise_for_status()
r.encoding = code
return r.text
except:
return ''
def returned(html, list, num):
count = 0 #對加入列表中的資訊進行計數
soup = beautifulsoup(html, 'html.parser')
info = soup.find('tbody', 'hidden_zhpm').children
for tr in info:
if count >= num:
break #如果以及滿足所需要的高校數，就可以退出了
if isinstance(tr, bs4.element.tag):
count += 1
tds = tr.find_all('td')
def printed(list, num):
print('\t^10}\t'.format('排名', '高校', '分數', chr(12288)))
for i in range(num):
l = list[i]
print('\t^10}\t'.format(l[0], l[1], l[2], chr(12288)))
def main():
list = 
url = ''
num = int(input('請問要查詢2017前多少名的高校呢：'))
html = getedhtml(url)
returned(html, list, num)
printed(list, num)
main()

執行所寫**

排名　　　　高校　　　　分數 1 　　　清華大學　　　 94.0 2 　　　北京大學　　　 81.2 3 　　　浙江大學　　　 77.8 4 　　上海交通大學　　 77.5 5 　　　復旦大學　　　 71.1 6 　中國科學技術大學　 65.9 7 　　　南京大學　　　 65.3 8 　　華中科技大學　　 63.0 9 　　　中山大學　　　 62.7 10 　哈爾濱工業大學　　 61.6 11 　　　同濟大學　　　 60.8 12 　　　東南大學　　　 59.8 13 　　　武漢大學　　　 58.4 14 　北京航空航天大學　 58.3 15 　　　南開大學　　　 58.2 16 　　　四川大學　　　 57.4 16 　　西安交通大學　　 57.4 18 　　　天津大學　　　 56.2 19 　　華南理工大學　　 56.1 20 　　北京師範大學　　 55.1

結果如下，成功拿到排名資料。

用python爬取2023年中國最好大學排名

2023年中總結

2023年中總結半飽

用python爬取小說章節內容

用python爬取2023年中國最好大學排名

2023年中總結

2023年中總結 半飽

用python爬取小說章節內容

相關推薦

2023年中總結半飽