爬取校園新聞首頁的新聞

2022-06-01 17:42:07 字數 1908 閱讀 6631

1. 用requests庫和beautifulsoup庫,爬取校園新聞首頁新聞的標題、鏈結、正文、show-info。

2. 分析info字串,獲取每篇新聞的發布時間,作者,**,攝影等資訊。

import

requests

from bs4 import

beautifulsoup

from datetime import

datetime

url = "

"res =requests.get(url);

res.encoding = "

utf-8

"soup = beautifulsoup(res.text, "

html.parser");

for news in soup.select("li"

):

if len(news.select("

.news-list-title

")) > 0: #

排除為空的li

time = news.select("

.news-list-info

")[0].contents[0].text

title = news.select("

.news-list-title

")[0].text

description = news.select("

.news-list-description

")[0].text

a = news.select('

a')[0].attrs['

href']

detail_res =requests.get(a)

detail_res.encoding = "

utf-8

"detail_soup = beautifulsoup(detail_res.text, "

html.parser")

print(detail_soup.select("

#content

")[0].text) #

正文print

(time, title, description, a)

content = detail_soup.select("

#content

")[0].text

info = detail_soup.select("

.show-info

")[0].text

date_time = info.lstrip('

')[:19]

print

(info)

break

info = '

'detail_time = info.lstrip('

')[:19]

sh = info[info.find("

審核"):].split()[0].lstrip('')

print

(detail_time, sh)

info1 = '

'info1 = info1[info1.find("

作者"):info1.find('

')].lstrip('

').split()[1]

print

(info1)

now_time =datetime.now();

now_time.year

print(datetime.strptime(date_time, "

%y-%m-%d %h:%m:%s"))

print(now_time.strftime('

%y\%m\%d

'))

執行截圖:

爬取校園新聞首頁的新聞

1.用requests庫和beautifulsoup庫,爬取校園新聞首頁新聞的標題 鏈結 正文 show info。2.分析info字串,獲取每篇新聞的發布時間,作者,攝影等資訊。import requests newsurl res requests.get newsurl 返回response物...

爬取校園新聞首頁的新聞

1.用requests庫和beautifulsoup庫,爬取校園新聞首頁新聞的標題 鏈結 正文 show info。import requests from bs4 import beautifulsoup newsurl res requests.get newsurl res.encoding ...

爬取校園新聞首頁的新聞

1.用requests庫和beautifulsoup庫,爬取校園新聞首頁新聞的標題 鏈結 正文。codding utf 8 author wf import requests from bs4 import beautifulsoup from datetime import datetime ur...