python爬蟲正規表示式

特此宣告：

正規表示式基礎

re模組（python通過re模組提供對正則

表示式的支援）

主要用到的方法：

#返回pattern物件
1 re.compile(string
[,flag]) 
2#以下為匹配所用函式
3 re.match(pattern, string
[, flags])
4 re.search(pattern, string
[, flags])
5 re.split(pattern, string
[, maxsplit])
6 re.findall(pattern, string
[, flags])
7 re.finditer(pattern, string
[, flags])
8 re.sub(pattern, repl, string
[, count])
9 re.subn(pattern, repl, string[, count])

re使用步驟：

step1：將正規表示式的字串形式編譯為pattern例項。

step2：使用pattern例項處理文字並獲得匹配結果（match例項）。

step3：使用match例項獲得資訊，進行其他的操作。

1
import re #匯入模組 
2 pattern=re.compile(r'
hello')#
將正規表示式編譯成pattern物件，注意hello前面的r的意思是「原生字串」,原原本本的輸出
3 match1=pattern.match('
hello world
')#使用pattern物件來進行進一步的匹配文字，獲得匹配結果
4 match2=pattern.match('
helloo world')
5 match3=pattern.match('
helllo world')
6ifmatch1: #如果匹配成功
7print (match1.group()) # 使用match獲得分組資訊
8else
:9 print('
not match1
') #
10if
match2:
11print(match2.group())
12else
:13 print('
not match2')
14if
match3:
15print(match3.group())
16else
:17 print('
no match3
')

下面來具體看看**中的關鍵方法。

★ re.compile(strpattern[, flag]):

這個方法是pattern類的工廠方法，用於將字串形式的正規表示式編譯為pattern物件。

第二個引數flag是匹配模式，取值可以使用按位或運算子'|'表示同時生效，比如re.i | re.m。

另外，你也可以在regex字串中指定模式，

比如re.compile('pattern', re.i | re.m)與re.compile('(?im)pattern')是等價的。

可選值有：

1
import re
2 a=re.compile(r"""
\d+ 
3\. 
4 \d*"""
,re.x)
5 b=re.compile(r'
\d+\.\d*')
6 match1=a.match('
3.1415')
7 match2=a.match('33'
)8 match3=b.match('
3.1415')
9 match4=b.match('33'
)10ifmatch1:
11print(match1.group())
12else
:13 print('
match1 is not a digital')
14if
match2:
15print(match2.group())
16else
:17 print('
match2 is not a digital')
18if
match3:
19print(match3.group())
20else
:21 print('
match3 is not a digital')
22if
match4:
23 print(match4.group())

python爬蟲正規表示式

正規表示式是十分高效而優美的匹配字串工具，一定要好好掌握。利用正規表示式可以輕易地從返回的頁面中提取出我們想要的內容。1 貪婪模式與非貪婪模式 python預設是貪婪模式。貪婪模式，總是嘗試匹配盡可能多的字元非貪婪模式，總是嘗試盡可能少的字元。一般採用非貪婪模式來提取。2 反斜槓問題正規表示式裡...

Python爬蟲正規表示式

一般的正規表示式都可直接到正則生成工具處生成，常見匹配字元 re.match及其常規匹配 re.match 嘗試從字串的起始位置匹配乙個模式，如果不是起始位置匹配成功的話，match 就返回none。re.match pattern,string,flags 0 返回的為乙個物件，其中span代表長...

Python 爬蟲正規表示式

常見的正則字元和含義如下匹配任意字元，除了換行符匹配字串開頭匹配字串末尾匹配括號內表示式，也表示乙個組 s 匹配空白字元 s 匹配任何非空白字元 d 匹配數字，等價於 0 9 d 匹配任何非數字，等價於 0 9 w 匹配字母數字，等價於 a za z0 9 w 匹配非字母數字，等價於 a z...

python爬蟲 正規表示式

python爬蟲 正規表示式

Python爬蟲 正規表示式

Python 爬蟲 正規表示式

相關推薦

python爬蟲正規表示式

python爬蟲正規表示式

Python爬蟲正規表示式

Python 爬蟲正規表示式