HtmlAgilityPack的一點總結

最近工作中用到了htmlagilitypack的類庫，總的來說使用起來確實感覺挺方便，別的不多說，就這類似於能把html標籤自動補全的load()方法就感覺挺讚(其實上不是不全，而是將不完整的標籤給格式化一下)。但這不就足夠了嗎？捨得自己去用正規表示式去匹配，萬一匹配的內容就是html作者寫的文字內容，豈不功虧一簣。

本文以格式化一篇html為例，講述一點此類庫的一點用法，至於更多的方法和屬性，那就看作者的發揮了。

何為格式化html？當你看到別人寫的html是這樣的：

就是不換行

是不是覺得很頭疼呢？寫成規範一點的樹結構是不是更好一點？(雖然改變了原文，加入了很多/r,/n,/t，但畢竟這是使用者想看到的)

______

____________

__________________

就是不換行

____________

______

當然了，強大的ide和各種工具會幫你做到，但是！實際開發中怎會容你輕鬆的使用其他非開源工具，而開元的htmlagility區區幾行**便能做到。

不多說，上**：

private htmldocument loadwebsite(string path)
return hdoc;
}

計算父節點的方法，以節點為根父節點，有所少個父節點，就加入多少個/t(縮排)

private int parentnumbers(htmlnode node, int temp)
else
}return result;
}

好了，以上兩個方法是準備工作，真正格式化的方法在這裡：

private void converthtml(string filepath)
else}}
}else 
//load兩遍,保證標籤的完整性
hdoc.loadhtml(htmlcontent.tostring());
stringbuilder content = new stringbuilder(hdoc.documentnode.outerhtml);
hdoc.loadhtml(content.tostring());
htmlnodecollection linecollection = hdoc.documentnode.selectnodes("/html");
dictionaryinscontent = new dictionary();//key為要插入的index，value為插入的內容，也就是/r/n, /r/n/t, /r/n/t/t.../t
stringbuilder tempbuilder = new stringbuilder();
foreach (htmlnode htmlnode in linecollection)
foreach (htmlnode hnode in htmlnode.descendants())//遍歷所有子代節點
if (hnode.previoussibling != null)//上乙個兄弟節點
inscontent.add(hnode.streamposition, tempbuilder.tostring());}}
if (hnode.nextsibling != null)//下乙個兄弟節點
inscontent.add(hnode.nextsibling.streamposition, tempbuilder.tostring());}}
if (hnode.haschildnodes)
inscontent.add(hnode.firstchild.streamposition, tempbuilder.tostring());}}
if (hnode.lastchild != null)//最後乙個子節點
inscontent.add(hnode.lastchild.streamposition + hnode.lastchild.outerhtml.length, tempbuilder.tostring());}}
}}
}foreach (keyvaluepairitem in inscontent.orderbydescending(n => n.key))//倒序插入，保證原html不變
file.writealltext(filepath, content.tostring(), encoding.utf8);
}

HtmlAgilityPack相關網頁

多執行緒替換webbrowser預設的彈出選單 c webbrowser 獲得選中部分的html原始碼 linq to xml操作xml 向htmlagilitypack道歉解析html還是你好用通過 webbrowser 獲取網頁截圖 c 無限迴圈treeview 資料庫 datatable...

C 使用HtmlAgilityPack爬蟲例項

使用htmlagilitypack類庫解析html非常方便，網上的資料有很多，可以自行搜尋了解下面上乙個非常簡單的小例子要爬取的資訊如下首先要引用htmlagilitypack.dll檔案上 internal void run httptool類初始化請求請求頭資訊可以按自己需求增加 p...

使用HtmlAgilityPack抓取網頁資料

剛剛學習了xpath路徑表示式，主要是對xml文件中的節點進行搜尋，通過xpath表示式可以對xml文件中的節點位置進行快速定位和訪問，html也是也是一種類似於xml的標記語言，但是語法沒有那麼嚴謹，在codeplex裡有乙個開源專案htmlagilitypack，提供了用xpath解析html檔...

HtmlAgilityPack的一點總結

HtmlAgilityPack相關網頁

C 使用HtmlAgilityPack爬蟲例項

使用HtmlAgilityPack抓取網頁資料

相關推薦