爬取校園新聞首頁的新聞

2022-06-02 05:48:11 字數 2930 閱讀 4866

1. 用requests庫和beautifulsoup庫,爬取校園新聞首頁新聞的標題、鏈結、正文。

# -*- codding: utf-8 -*-

# -*- author: wf -*-

import requests

from bs4 import beautifulsoup

from datetime import datetime

url = ""

res = requests.get(url)

res.encoding = 'utf-8'

soup = beautifulsoup(res.text,'html.parser')

for news in soup.select('li'):

if len(news.select('.news-list-title'))>0 : #排除為空的li

data = news.select('.news-list-info')[0].contents[0].text #日期

title = news.select('.news-list-title')[0].text #標題

description = news.select('.news-list-description')[0].text #標題下的內容

a = news.select('a')[0].attrs['href'] #鏈結

'''print(data,title,description,a)'''

detail_res = requests.get(a)

detail_res.encoding = 'utf-8'

detail_soup = beautifulsoup(detail_res.text,'html.parser') #新聞詳情頁

content = detail_soup.select('#content')[0].text #正文內容

info = detail_soup.select('.show-info')[0].text #指令碼資訊

print(info)

break

2. 分析字串,獲取每篇新聞的發布時間,作者,**,攝影等資訊。

3. 將其中的發布時間由str轉換成datetime型別。

#獲取當前時間

now_time = datetime.now()

now_time.year

exchangetime = datetime.strptime(publish_time,"%y-%m-%d %h:%m:%s") #字元轉換為時間

exchangestring = now_time.strftime('%y\%m\%d') #時間轉換為字元

4. 將完整的**及執行結果截圖發布在作業上。

# -*- codding: utf-8 -*-

# -*- author: wf -*-

import requests

from bs4 import beautifulsoup

from datetime import datetime

url = ""

res = requests.get(url)

res.encoding = 'utf-8'

soup = beautifulsoup(res.text,'html.parser')

for news in soup.select('li'):

if len(news.select('.news-list-title'))>0 : #排除為空的li

data = news.select('.news-list-info')[0].contents[0].text #日期

title = news.select('.news-list-title')[0].text #標題

description = news.select('.news-list-description')[0].text #標題下的內容

a = news.select('a')[0].attrs['href'] #鏈結

print(data,title,description,a)

detail_res = requests.get(a)

detail_res.encoding = 'utf-8'

detail_soup = beautifulsoup(detail_res.text,'html.parser') #新聞詳情頁

content = detail_soup.select('#content')[0].text #正文內容

info = detail_soup.select('.show-info')[0].text #指令碼資訊

print(info)

break

print(publish_time,sh,zz,ly)

#獲取當前時間

now_time = datetime.now()

now_time.year

exchangetime = datetime.strptime(publish_time,"%y-%m-%d %h:%m:%s") #字元轉換為時間

exchangestring = now_time.strftime('%y\%m\%d') #時間轉換為字元

print(exchangetime)

print(exchangestring)

爬取校園新聞首頁的新聞

1.用requests庫和beautifulsoup庫,爬取校園新聞首頁新聞的標題 鏈結 正文 show info。2.分析info字串,獲取每篇新聞的發布時間,作者,攝影等資訊。import requests newsurl res requests.get newsurl 返回response物...

爬取校園新聞首頁的新聞

1.用requests庫和beautifulsoup庫,爬取校園新聞首頁新聞的標題 鏈結 正文 show info。import requests from bs4 import beautifulsoup newsurl res requests.get newsurl res.encoding ...

爬取校園新聞首頁的新聞

1.用requests庫和beautifulsoup庫,爬取校園新聞首頁新聞的標題 鏈結 正文 show info。2.分析info字串,獲取每篇新聞的發布時間,作者,攝影等資訊。import requests from bs4 import beautifulsoup from datetime ...