Java正規表示式學習三

8.捕獲組

捕獲組（capturing group）是將多個字元作為單獨的單元來對待的一種方式。構建它們可以通過把字元放在一對圓括號中而成為一組。例如，正規表示式（dog）建了單個的組，包括字元"d" "o" 和 "g"。匹配捕獲組輸入的字串部分將會存放於記憶體中，稍後通過反向引用再次呼叫。

8.1 編號方式

在pattern的api描述中，獲取組通過從左至右計算開始的圓括號進行編碼。例如，在表示式((a)(b(c)))中，有下面的四組：

1.((a)(b(c)))

2.(a)

3.(b(c))

4.(c)

要找出當前的表示式中有多少組，通過呼叫matcher物件的groupcount方法。groupcount方法返回int型別值，表示當前matcher模式中捕獲組的數量。例如，groupcount返回4時，表示模式中包含4個捕獲組。

有乙個特別的組--組0，它表示整個表示式。這個組不包括groupcount的報告範圍內。以（？開始的組是純粹的非捕獲組（non-catturinggroup））,它不捕獲文字，也不作為組總數而計數

matcher中的一些方法，可以指定int型別的特定組號作為引數，因此理解組是如何編號的是尤為重要的。

public int start(int group):返回之前的匹配操作期間，給定組所捕獲子串行的初始索引。

public int end(int group)：返回之前的匹配操作期間,給定組所捕獲的字序列的最後字元索引加1。

public string group(int group):返回之前的匹配操作期間，通過給定組而捕獲的輸出字序列。

8.2 反向引用

匹配輸入字串的捕獲組部分會存放在記憶體中，通過反向引用（backreferences）稍後再呼叫。在正規表示式中，反向引用使用反斜線（\）後跟乙個表示需要在呼叫組號的數字來表示。例如，在表示式（\d\d）定義了匹配一行中的兩個數字的捕獲組，通過反向引用\1，表示式稍後會被再次呼叫。

匹配兩個數字，且後面跟著兩個完全相同的數字時，就可以使用（\d\d）\1 作為正規表示式：

enter your regex: (\d\d)\1 enter input string to search: 1212

i found the text "1212" starting at index 0 and ending at index 4.

如果更改最後兩個數字，這時匹配就會失效：

enter your regex: (\d\d)\1
enter input string to search: 1234
no match found.

對於巢狀的捕獲組而言，反向引用採用完全相同的方式進行工作，即指定乙個反斜線加上需要被再次呼叫的組號。

9. 邊界匹配器

通過指定一些邊界匹配器（boundary matches）的資訊，可以使模式匹配更為精確。比如說你對某個特定的單詞感興趣，並且它只出現在行首或者是行尾。又或者你想匹配發生在單詞邊界（word boundary），或者是上乙個匹配的尾部。

下表中列出了所有的邊界匹配器及其說明。

邊界匹配器

^

行首$

行尾\b

單詞邊界

\b

非單詞邊界

\a

輸入的開頭

\g

上乙個匹配的結尾

\z

輸入的結尾，僅用於最後的結束符（如果有的話）

\z

輸入的結尾

接下來的例子中，說明了^和$邊界匹配器的用法。注意上表中,^匹配行首，$匹配行尾。

enter your regex: ^dog$ enter input string to search: dog i found the text "dog" starting at index 0 and ending at index 3. enter your regex: ^dog$ enter input string to search: dog no match found. enter your regex: \s*dog$ enter input string to search: dog i found the text " dog" starting at index 0 and ending at index 29. enter your regex: ^dog\w* enter input string to search: dogblahblah

i found the text "dogblahblah" starting at index 0 and ending at index 11.

第乙個例子匹配是成功的，這是因為模式佔據了整個輸入的字串。第二個例子失敗了，是由於輸入的字串在開始部分包含了額外的空格。第三個例子指定的表示式是不限的空格，後跟著在行尾的dog。第四個例子，需要dog放在行首，後面跟著是不限數量的單詞字元。

對於檢查乙個單詞開始和結束的邊界模式（用於長字元裡子字串），這時可以在兩邊使用\b,例如\bdog\b。

enter your regex: \bdog\b enter input string to search: the dog plays in the yard. i found the text "dog" starting at index 4 and ending at index 7. enter your regex: \bdog\b enter input string to search: the doggie plays in the yard.

no match found.

對於匹配非單詞邊界的表示式，可以使用\b來代替：

enter your regex: \bdog\b enter input string to search: the dog plays in the yard. no match found. enter your regex: \bdog\b enter input string to search: the doggie plays in the yard.

i found the text "dog" starting at index 4 and ending at index 7.

對於需要匹配僅出現在前乙個匹配的結尾，可以使用\g:

enter your regex: dog enter input string to search: dog dog i found the text "dog" starting at index 0 and ending at index 3. i found the text "dog" starting at index 4 and ending at index 7. enter your regex: \gdog enter input string to search: dog dog

i found the text "dog" starting at index 0 and ending at index 3.

這裡的第二個例子僅找到了乙個匹配，這時由於第二次出現"dog"不是在前乙個匹配結尾的開始。

Java正規表示式學習三

java正規表示式學習

正規表示式學習（三）

Java正規表示式

Java正規表示式學習 三

java正規表示式學習

正規表示式學習（三）

Java正規表示式

相關推薦

Java正規表示式學習三