Boost學習之正規表示式 regex

如果只要編譯regex庫，有兩種方法(參考鏈結):

在boost根目錄下執行bjam--toolset=《編譯器名》--with-regex其它引數

到\libs egex\build裡，找到對應編譯器的makefile，然後make-f ***x.mak

boost.regex手裡有七種**和兩****寶

其中的七種**是:

regex_match 函式regex_search 函式regex_replace 函式regex_format 函式regex_grep 函式regex_split 函式regex 類

每種**都又有諸多變化（每個函式都分別以c字串型別、std::string型別、迭代器型別作為引數過載

）,不過後面四種**因年久失修已不建議使用.

兩****寶是:

regex_iterator 迭代器regex_token_iterator 迭代器

這兩****寶是整個boost.regex的靈魂，用熟它們以後那是「摘花飛葉即可傷人」啊~~

#include

先準備乙個測試用的資料備用，如果各位有雅興可以參考本站的另一篇文章《google testing》使用google testing框架來做這個實驗，花一樣時間學兩樣啊~~

#include

using

namespace std;

int main(int argc, char* ar**)

#include #include using namespace std;int main(int argc, char* ar**)

要確定一行字串是否與指定的正規表示式匹配，使用regex_match。

下面這個**可以驗證szstr字串（定義在上面）是否與szreg匹配。

boost::regex的建構函式中還可以加入標記引數用於指定它的行為，如:

//指定使用perl語法（預設），忽略大小寫。

boost::regex reg1( szreg, boost::regex::perl|boost::regex::icase );

//指定使用posix擴充套件語法（其實也差不多）

boost::regex reg2( szreg, boost::regex::extended );

//指定使用perl語法（預設），忽略大小寫。boost::regex reg1( szreg, boost::regex::perl|boost::regex::icase );//指定使用posix擴充套件語法（其實也差不多）boost::regex reg2( szreg, boost::regex::extended );

下面這個**不僅驗證是否匹配，而且可以從中提取出正規表示式括號對應的子串。 }

//也可直接取指定位置資訊

if(mat[4].matched) cout <

}

}        //也可直接取指定位置資訊        if(mat[4].matched) cout << "path is" << mat[4] << endl;    }

其中，boost::cmatch是乙個針對c字串的特化版本，它還有另三位兄弟,如下:

typedef match_resultscmatch;typedef match_resultssmatch;typedef match_resultswcmatch;typedef match_resultswsmatch;

可以把match_results看成是乙個sub_match的容器，同時它還提供了format方法來代替regex_format函式。

乙個sub_match就是乙個子串，它從std::pair繼承而來，這個迭代器pair裡的first和second分別指向了這個子串開始和結尾所在位置。同時，sub_match又提供了str()，length()方法來返回整個子串。

regex_match只驗證是否完全匹配，如果想從一大串字串裡找出匹配的一小段字串（比如從網頁檔案裡找超連結

），這時就要使用regex_search了。

下面這段**從szstr中找數字 }

regex_replace提供了簡便的方法來部分替換源字串

正規表示式中，使用$1~$9（或\1~\9）表示第幾個子串,$&表示整個串，$`表示第乙個串,$'表示最後未處理的串。

正規表示式中，使用(?1~?9新字串)表示把第幾個子串替換成新字串

對應於c字串和c++字串以及寬字元，regex_iterator同樣也有四個特化:

typedef regex_iteratorcregex_iterator;    typedef regex_iteratorsregex_iterator;    typedef regex_iteratorwcregex_iterator;    typedef regex_iteratorwsregex_iterator;

這個迭代器的value_type定義是乙個match_results。 }

boost.regex也提供了make_regex_iterator函式簡化regex_iterator的構造，如上面的itrbegin可以寫成:

itrbegin = make_regex_iterator(szstr,reg);

它同樣也有四個特化，形式和上面類似，就不再寫一遍騙篇幅了。

這個迭代器的value_type定義是乙個sub_match。 }

boost.regex也提供了make_regex_token_iterator函式簡化regex_token_iterator的構造，最後的那個引數-1表示以reg為分隔標誌拆分字串，如果不是-1則表示取第幾個子串，並且可以使用陣列來表示同時要取幾個子串，例如:

; // 第一子串和第二子串

boost::cregex_token_iterator itrbegin = make_regex_token_iterator(szstr,reg,subs); //使用-1引數時拆分，使用其它數字時表示取第幾個子串，可使用陣列取多個串

boost::cregex_token_iterator itrend;

for(boost::cregex_token_iterator itr=itrbegin; itr!=itrend; ++itr) }

;        // 第一子串和第二子串        boost::cregex_token_iterator itrbegin = make_regex_token_iterator(szstr,reg,subs); //使用-1引數時拆分，使用其它數字時表示取第幾個子串，可使用陣列取多個串        boost::cregex_token_iterator itrend;        for(boost::cregex_token_iterator itr=itrbegin; itr!=itrend; ++itr)            }

完整測試**:

#include

using

namespace std;

int main(int argc, char* ar**)

//也可直接取指定位置資訊

if(mat[4].matched) cout <

; // 第一子串和第二子串

boost::cregex_token_iterator itrbegin = make_regex_token_iterator(szstr,reg,subs); //使用-1引數時拆分，使用其它數字時表示取第幾個子串，可使用陣列取多個串

boost::cregex_token_iterator itrend;

for(boost::cregex_token_iterator itr=itrbegin; itr!=itrend; ++itr)

cout <

} cin.get();

return 0;

Boost學習之正規表示式 regex

正規表示式之re

Python學習之正規表示式Re

Boost學習之正規表示式 regex

Boost學習之正規表示式 regex

正規表示式之re

Python學習之正規表示式Re

Boost學習之正規表示式 regex

相關推薦