使用HTMLPaser解析HTML資料

1, 首先我們通過

ns::data data(ns::url(...));

得到目標得html檔案如下：

id =
"tab"
>
id =
"tab"
>
id="tab2"
>
divno attr
class=
"cls red"
>hello
id="abc"
>id
id="abc"
class=
"blue"
>id class
style=
"#a1:."
>style a1
style=
"a2"
>style a2

2，普通得處理方式可能就是通過html解析器器遍歷所有節點，挨個判斷找出目標資料。或者通過正規表示式進行字串分析，得到資料。這些方法存在邏輯複雜和難以寫出高效穩定的正規表示式的問題。

3, 在wsi中，我們可以通過 ::wsi::html::document 類進行類似html解析器的分析工作。並且，::wsi::html::document 提供 jquery 的查詢功能，可以利用 jquery 語法，快速穩定查詢出目標節點。

4, jquery 語法： html標籤.css類[屬性=」值」] html標籤.css類[屬性=」值」] ....

說明：父標籤子標籤

5, 例子：

::wsi::html::document doc(::wsi::core::type_cast<::wsi::core::data>(data)); // 轉換nsdata到c＋＋支援的core::data 類。然後開啟文件。

如果需要得到 class=
"cls red"
>hello
節點::wsi::html::document::query_result result; // 查詢的結果會以 ::wsi::html::element* 的指標形式儲存在result中。
doc.jquery(「li a.red」, result);
此時就可以對result進行處理。
::wsi::html::node const* node = (::wsi::html::node const*)result[0];
ns::string str = node->value(); // 得到 hello

BeautifulSoup解析非標準HTML的問題

beautifulsoup版本 4.3.2 在用beautifulsoup.find all 搜尋html時，遇到下面的 a href shipin donghuapian 2012 07 25 23404.html title 謙謙君子 target blank 溫潤如玉 a 可以看出中a標籤的...

使用RestTemplate呼叫外部Http介面

使用resttemplate呼叫外部http介面我們使用resttemplate呼叫外部介面，resttemplate可以自動轉換實體類和json字串，這樣便於我們呼叫介面。通常我們得到乙個外部介面，會告訴我們請求報文和響應報文。例如這樣在轉換時，就會自動將companyreqheader轉換為...

nginx配置使用ssl模組配置支援HTTPS訪問

生成自簽名證書 1 伺服器上安裝mod ssl和openssl yum install mod ssl openssl 2 生成金鑰 openssl genrsa out cyz.com.key 2048 3.生成證書請求檔案，執行之後會出現一大堆要輸入的東西，輸入之後就生成.csr的檔案了 cou...

使用HTMLPaser解析HTML資料

BeautifulSoup解析非標準HTML的問題

使用RestTemplate呼叫外部Http介面

nginx配置使用ssl模組配置支援HTTPS訪問

相關推薦