HTML解析之五 lxml的XPath解析

#coding:utf8

# beautifulsoup可以將lxml作為預設的解析器使用，lxml亦可以單獨使用;

# 比較beautifulsoup和lxml：

#（1）

#beaufulsoup基於dom，會在如整個文件，解析整個dom樹，比較消耗記憶體和時間；

#lxml是使用xpath技術查詢和處理html/xml文件庫，只會區域性遍歷，所以速度較快。

#現在beautifulsoup可以使用lxml作為預設解析庫』

#（2）

#beautifulsoup較簡單，api非常人性化，支援css選擇器。

# lxml的xpath比較麻煩，開發效率不如beautifulsoup

#使用lxml解析網頁，例項：

fromlxmlimportetree

html_str ="""the dormouse's story

once upon a time there were three little sisters; and their names were,andtillie;and they lived at the bottom of a well.

...

"""html = etree.html(html_str)

result = etree.tostring(html)

printresult

#lxml還可以自動修正html**

#除了讀取字串之外，lxml還可以直接讀取html檔案

#將html_str儲存為index.html檔案，理由parse方法進行解析：

fromlxmlimportetree

html = etree.parse('index.html')

result = etree.tostring(html, pretty_print=true)

printresult

#用xpath語法抽取所有的url:

html = etree.html(html_str)

urls = html.xpath(".//*[@class='sister']/@href")

printurls

解析html之lxml包，提取html的資料

解析html之lxml包 1 lxml的安裝安裝方式 pip install lxml 2 lxml的使用 2.1 lxml模組的入門使用匯入lxml 的 etree 庫匯入沒有提示不代表不能用 from lxml import etree 利用etree.html，將字串轉化為element...

Python 之lxml解析模組

lxml 是乙個html xml的解析器，主要的功能是如何解析和提取 html xml 資料。一 lxml示例 1 初步使用 lxml 的 etree 庫 from lxml import etree text 利用etree.html，將字串解析為html文件 html etree.html ...

Python 之lxml解析庫

一 xpath常用規則二解析html檔案 from lxml import etree 讀取html檔案進行解析 defparse html file html etree.parse test.html parser etree.htmlparser print etree.tostring ...

HTML解析之五 lxml的XPath解析

解析html之lxml包，提取html的資料

Python 之lxml解析模組

Python 之lxml解析庫

相關推薦