使用正規表示式,取得點選次數,函式抽離

2022-08-26 00:57:14 字數 3895 閱讀 9832

1. 用正規表示式判定郵箱是否輸入正確。

import re

def validateemail(email):

if len(email) > 7:

if re.match("^.+\\@(\\[?)[a-za-z0-9\\-\\.]+\\.([a-za-z]|[0-9])(\\]?)$", email) != none:

return 1

return 0

2. 用正規表示式識別出全部**號碼。

import re  

text=" 1561515818"

m=re.findall(r"1\d",text)

if m:

print m

else:

print 'not match'

3. 用正規表示式進行英文分詞。re.split('',news)

import re

news = '''it's true that we don't know what we've got until we lose it, but it's also true that we don't know what we've been losing until it arrives. '''

new=re.split('[\s,.?!\-]+',news)

print(new)

4. 使用正規表示式取得新聞編號

news = "facebook? informs data leak victims whether they " \

"need to burn down house, cut off fingerprints, start anew,"

word = re.split("[\s,.?\-]+", news)

print(word)

5. 生成點選次數的request url

import requests

res = requests.get('')

res.encoding = 'utf-8'

b=res.text.split('.html')[-1].lstrip("(')").rstrip("');")

print(b)

6. 獲取點選次數

import requests

import re

newsurl = ''

newsid=re.search('\_(.*).html',newsurl).group(1).split('/')[-1]

res = requests.get(''.format(newsid))

b=res.text.split('.html')[-1].lstrip("(')").rstrip("');")

print(b)

7. 將456步驟定義成乙個函式 def getclickcount(newsurl):

def getclickcount(newsurl):

newsid=re.search('\_(.*).html',newsurl).group(1).split('/')[-1]

res = requests.get(''.format(newsid))

b=res.text.split('.html')[-1].lstrip("(')").rstrip("');")

print(b)

8. 將獲取新聞詳情的**定義成乙個函式 def getnewdetail(newsurl):

def getnewdetail(newsurl):

res=requests.get(newsurl)

res.encoding='utf-8'

soup=beautifulsoup(res.text,'html.parser')

global soupdetail

for news in soup.select('li'):

if len(news.select('.news-list-title'))>0:

pert=news.select('.news-list-title')[0].text #pertitle 每則新聞題目

perdt=news.select('.news-list-info')[0].contents[0].text #perdetail 每則新聞詳細內容

perhref=news.select('a')[0].attrs['href'] #perhref 每則新聞源鏈結

# ————————————爬取子頁面內容——————————————————————————

global soupdetail

perdetail=requests.get(perhref)

perdetail.encoding='utf-8'

soupdetail=beautifulsoup(perdetail.text,'html.parser')

textcontent=soupdetail.select('#content')[0].text

#————————————輸出內容——————————————

print('題目:',pert)

print('源頁面:',perhref)

print('正文內容:',textcontent)

break

9. 取出乙個新聞列表頁的全部新聞 包裝成函式def getlistpage(pageurl):

def getlistpage(pageurl):

res = requests.get(pageurl)

res.encoding = 'utf-8'

soup = beautifulsoup(res.text, 'html.parser')

for news in soup.select('li'):

if len(news.select('.news-list-title')) > 0:

g = news.select('a')[0].attrs['href']

print(g)

getnewsdetail(g)

10. 獲取總的新聞篇數,算出新聞總頁數包裝成函式def getpagen():

def getpagen():

res = requests.get('')

res.encoding = 'utf-8'

soup = beautifulsoup(res.text, 'html.parser')

pagenumber=int(soup.select('.a1')[0].text.rstrip('條'))

page = pagenumber//10+1

return page

11. 獲取全部新聞列表頁的全部新聞詳情。

def crawlonepageschoolnews(page_url):

res = requests.get(page_url)

res.encoding = 'utf-8'

soup = beautifulsoup(res.text, 'html.parser')

news = soup.select('.news-list > li')

for n in news:

# print(n)

print('**' * 5 + '列表頁資訊' + '**' * 10)

print('新聞描述:' + n.a.select('.news-list-description')[0].text)

getnewdetail(n.a.attrs['href'])

使用正規表示式,取得點選次數,函式抽離

1.用正規表示式判定郵箱是否輸入正確。importre defvalidateemail email if len email 7 if re.match a za z0 9 a za z 0 9 email none print good return 1 return 0 print exit ...

使用正規表示式,取得點選次數,函式抽離

學會使用正規表示式 1.用正規表示式判定郵箱是否輸入正確。r w w w w e 67890222 qq.com if re.match r,e print re.match r,e group 0 else print 非郵箱格式!2.用正規表示式識別出全部 號碼。3.用正規表示式進行英文分詞。r...

使用正規表示式,取得點選次數,函式抽離

1.用正規表示式判定郵箱是否輸入正確 r w w w w e 123456789 qq.com if re.match r,e print re.match r,e group 0 else print error 2.用正規表示式識別出全部 號碼。str 羅德廣的號碼020 123456,藝術大師...