爬取校園網新聞首頁的新聞 使用正規表示式,函式抽離

2022-06-04 05:09:11 字數 2009 閱讀 7957

import

requests

from bs4 import

beautifulsoup

from datetime import

datetime

import

reres = requests.get('

')res.encoding = '

utf-8

'soup = beautifulsoup(res.text, '

html.parser')

#獲取新聞點選次數

defgetnewsid(url):

newsid = re.findall(r'

\_(.*).html

', newsurl)[0][-4:]

clickurl = '

'.format(newsid)

clickres =requests.get(clickurl)

#利用正規表示式獲取新聞點選次數

clickcount = int(re.search("

hits'\).html\('(.*)'\);

", clickres.text).group(1))

return

clickcount

#獲取新聞細節

defgetnewsdetail(newsurl):

resd =requests.get(newsurl)

resd.encoding = '

utf-8

'soupd = beautifulsoup(resd.text, '

html.parser')

content = soupd.select('

#content

')[0].text

info = soupd.select('

.show-info

')[0].text

#呼叫getnewsid()獲取點選次數

count =getnewsid(newsurl)

print

(info)

#識別時間格式

date = re.search('

(\d.\d.\d\s\d.\d.\d)

', info).group(1)

#識別乙個至三個資料

author = re.search('

', info).group(1)

check = re.search('

', info).group(1)

sources = re.search('

', info).group(1)

#用datetime將時間字串轉換為datetime型別

datetime = datetime.strptime(date, '

%y-%m-%d %h:%m:%s')

#利用format對字串進行操作

print('

'.format(datetime, author, check, sources, count))

print

(content)

for new in soup.select('li'

):

if len(new.select('

.news-list-title

')) >0:

title = new.select('

.news-list-title

')[0].text

description = new.select('

.news-list-description

')[0].text

newsurl = new.select('

a')[0]['

href']

print('

'.format(title, description, newsurl))

#呼叫getnewsdetail()獲取新聞詳情

getnewsdetail(newsurl)

break

爬取校園新聞首頁的新聞

1.用requests庫和beautifulsoup庫,爬取校園新聞首頁新聞的標題 鏈結 正文 show info。2.分析info字串,獲取每篇新聞的發布時間,作者,攝影等資訊。import requests newsurl res requests.get newsurl 返回response物...

爬取校園新聞首頁的新聞

1.用requests庫和beautifulsoup庫,爬取校園新聞首頁新聞的標題 鏈結 正文 show info。import requests from bs4 import beautifulsoup newsurl res requests.get newsurl res.encoding ...

爬取校園新聞首頁的新聞

1.用requests庫和beautifulsoup庫,爬取校園新聞首頁新聞的標題 鏈結 正文 show info。2.分析info字串,獲取每篇新聞的發布時間,作者,攝影等資訊。import requests from bs4 import beautifulsoup from datetime ...