python爬蟲 re庫（正則）

1.re.match

re.match嘗試從字元創的起始位置匹配乙個模式，如果不是起始位置匹配成功的話，就會返回none。

re.match(pattern,string,flags=0)

2.最常規的匹配

import re
content = 'hello 123 4567 world_this is a regex demo'
result = re.match('^hello\s\d\s\d\s\w.*demo$',content)
print(result)

3.範匹配

import re
content = 'hello 123 4567 world_this is a regex demo'
result = re.match('^hello.*demo$',content)
print(result)
print(result.group())
print(result.span())

4.匹配目標

import re
content = 'hello 1234567 world_this is a regex demo'
result = re.match('hello\s(\d+)\sworld.*demo$',content)
print(result)
print(result.group(1))
print(result.span())

5.貪婪匹配

import re
content = 'hello 1234567 world_this is a regex demo'
result = re.match('^he.*(\d+).*demo$',content)
print(result)
print(result.group(1))
print(result.span())

6.非貪婪模式

import re
content = 'hello 1234567 world_this is a regex demo'
result = re.match('^he.*?(\d+).*demo$',content)
print(result)
print(result.group(1))
print(result.span())

7.匹配模式

import re
content = 'hello 1234567 world_this
is a regex demo'
result = re.match('^he.*?(\d+).*demo$',content，re.s)
print(result)
print(result.group(1))
print(result.span())

8.轉義

import re
content = 'price is $500'
result = re.match('price is $500',content)
print(result)
#結果為none

import re
content = 'price is $5.00'
result = re.match('price is \$5\.00',content)
print(result)#有結果

tip:盡量使用範匹配，使用括號得到匹配目標，盡量使用非貪婪模式，有換行符就用re.s

9.re.search

re.search掃瞄整個字串並返回第乙個成功的匹配

import re
content = 'extra stings hello 1234567 world_this is a regex demo extra stings'
result = re.match('hello.*?(d+).*?demo',content)
print(result)#結果為none

import re
content = 'extra stings hello 1234567 world_this is a regex demo extra stings'
result = re.search('hello.*?(\d+).*?demo',content)
print(result.group(1))#結果為1234567

10.re.findall

搜尋字串，以列表形式返回全不能匹配的字串。

11.re.sub

替換字串中每乙個匹配的字串後返回替換的字串。

import re
content = 'extra stings hello 1234567 world_this is a regex demo extra stings'
content = re.sub('\d+',content)
print(content)

爬蟲之 re庫

a表示正則的規則，b表示字串從開頭開始匹配，若開頭就匹配失敗，則返回為none result re.match a b result.group 若a 的規則中有用小括號圈起來東西，可以按順序由 result.group 1 result.group 2 等匹配得到掃瞄整個字串，返回第乙個成...

Python程式設計 re正則庫基本使用

之前的文章 python程式設計 re正則庫字符集 w 匹配字母數字及下劃線 w 匹配非字母數字及下劃線 s 匹配任意空白字元，等價於 n t r f s 匹配任意非空字元 d 匹配任意數字，等價於 0 9 d 匹配任意非數字 a 匹配字串開始 z 匹配字串結束，如果是換行，只匹配到換行前的結束字...

Python爬蟲正規表示式（re模組）

正規表示式是通過特殊的字串行，實現字串的檢索替換匹配驗證。在爬蟲時，使用正規表示式便於我們快速提取到html中的資訊。說明匹配除換行符 n 以外的任意字元。當re.dotall標記被制定時，則可以匹配任意字元匹配字串的開頭匹配字串的結尾匹配中列舉的字元匹配不在中列舉的字元匹配0個...

python爬蟲 re庫（正則）

爬蟲 之 re庫

Python程式設計 re正則庫基本使用

Python爬蟲 正規表示式（re模組）

相關推薦

爬蟲之 re庫

Python爬蟲正規表示式（re模組）