Java正規表示式

2021-08-29 15:31:48 字數 2501 閱讀 4562

兩個問題

b. 如何提取這些頁面的發帖時間

分析:

發現很有規律。規則差不多是這樣:

發帖時間都這樣的:[2008-08-09 14:51:35] 

規則:\[ (\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d)\]

regular expression syntax

syntax

explanation

characters c

the character c

\ unnnn , \ xnn,

\ 0n , \ 0nn , \ 0nnn

the code unit with the given hex or octal value

\ t, \ n, \ r, \ f, \ a, \e

the control characters tab, newline, return, form feed, alert, and escape

\ cc

the control character corresponding to the character c

character classes [

c1c2 . . .]

any of the characters represented by c

1 , c

2 , . . . the ci

are characters, character ranges (c

1 -c

2 ), or character classes

[^ . . .]

complement of character class

[ . . . && . . .]

intersection of two character classes

predefined character classes .

any character except line terminators (or any character if the dotall flag is set)\d

a digit [0-9 ]\d

a nondigit [^0-9 ]\s

a whitespace character [ \t\n\r\f\x0b ]\s

a non-whitespace character\w

a word character [a-za-z0-9 _]\w

a nonword character\p

a named character class—see table 12-9 \p

the complement of a named character class

boundary matchers

^ $beginning, end of input (or beginning, end of line in multiline mode)\b

a word boundary\b

a nonword boundary

syntax

explanation \a

beginning of input\z

end of input\z

end of input except final line terminator\g

end of previous match

quantifiers x?

optional x

x*x, 0 or more times

x +x, 1 or more times

x x x

x n times, at least n times, between n and m times

quantifier suffixes ?

turn default (greedy) match into reluctant match+

turn default (greedy) match into possessive match

set operations xy

any string from x , followed by any string from y

x|yany string from x or y

grouping

(x)capture the string matching x as a group\n

the match of the n th group

escapes \

c the character c (must not be an alphabetic character)\q

. . . \e

quote . . . verbatim

(? . . . )

special construct—see api notes of pattern class

從html中去除標籤,提取正文的正規表示式:

||]*>||

上傳乙個正規表示式測試工具:

Java正規表示式

正規表示式結構簡介 字元 x 字元 x 反斜槓 0n 十進位制數 0 n 7 0nn 十進位制數 0nn 0 n 7 0mnn 十進位制數 0mnn 0 m 3,0 n 7 xhh 十六進製制數 0xhh uhhhh 十六進製制數 0xhhhh t 製表符 u0009 n 換行符 u000a r 回...

Java正規表示式

方便查詢的東西 基本語法 轉義字元 in d d d 數字0 9 多少到多少 d 非數字 0 9 非 w 單詞字元 a za z0 9 a3 w 非單詞字元 w s 空白 如 n t 0 1次 1 n次 0 n次 必須是n次 大於等於n次 n demo 中文 u0391 uffe5 英文 a za ...

Java正規表示式

舉例說明 the 開頭一定要有 the 字串 of despair 結尾一定要有 of despair 的字串 那麼,abc 就是要求以abc開頭和以abc結尾的字串,實際上是只有abc匹配。notice 匹配包含notice的字串。你可以看見如果你沒有用我們提到的兩個字元 最後乙個例子 就是說 模...