NekoHtml解析 html 檔案

最近做了乙個 html 解析的小專案，用的是 nekohtml, 解析靜態的html頁面，提取需要的資訊成jason物件並放到乙個檔案中。

思路是，先使用netko可以快速的將需要的html中的指定標籤如 table 中的資訊拿到。然後存入臨時生成乙個temp.html頁面，再重新解析成 node物件。就可以根據結構獲得制定的 node下的內容了。

核心**如下：


public static customerrecord convertfiletoobj(string filepath) throws exception ;
// create html parser
xmlparserconfiguration parser = new htmlconfiguration();
parser.setproperty("", filters);
xmlinputsource source = new xmlinputsource(null, filepath, null);
parser.parse(source);
string description = filtereddescription.tostring();
pattern p = pattern.compile("\\s*|\t|\r|\n");
matcher m = p.matcher(description);
description = m.replaceall("");
// * wirte the content into file
file temp = new file(file.getparentfile().getpath(), "temp.html");
writer out = null;
out = new filewriter(temp, false);
out.write(description);
out.close();
domparser parser2 = new domparser();
parser2.parse(temp.getpath());
document document = parser2.getdocument();
int a = 0;
nodelist nodelist = xpathapi.selectnodelist(document, "//tr");
for (int i = 0; i < nodelist.getlength(); i++) 
if (i == (a + 2)) 
}// soa
if (trcontent.startswith("soarecord")) 
}// a
if (trcontent.startswith("arecords")) 
}// mxrecords
if (trcontent.startswith("mxrecords")) 
}// nsrecords
if (trcontent.startswith("nsrecords")) }}
cr.setareclist(alist);
cr.setmxreclist(mxlist);
cr.setsoareclist(soalist);
cr.setnxreclist(nxlist);
temp.delete();
return cr;
}

HTML資料解析

html資料解析用到開源 htmlparser htmlnode.m htmlnode.h htmlparser.m htmlparser.h 解析你的資料前還有三步 1在工程中新增libxml2的庫 2 在header search path中新增 usr include libxml2 3 將...

DocumentHelper解析xml檔案

documenthelper解析xml檔案解析xml格式的字串，需要先引入以下依賴 org.dom4j dom4j 2.1.1 jaxen jaxen 1.1.1 最簡單的應用示例 public static void main string args throws exception迴圈迭代遍歷子...

使用HtmlParser解析HTML

如果要對html進行解析,提取html的資料或者修改html資料,htmlparser是乙個不錯的選擇.使用htmlparser可以解析本地和網路上的html資料 parser parser new parser new winista.text.htmlparser.http.httpprotoc...

NekoHtml解析 html 檔案

HTML資料解析

DocumentHelper解析xml檔案

使用HtmlParser解析HTML

相關推薦