Python 標準庫 xml 詳解

對於簡單的 xml 解析處理, 可以使用標準庫xml, 相對於第三方庫lxml,xml無需額外安裝, 但xml是用 python 實現的, 效能不如lxml

xml 的解析功能主要由xml.etree.elementtree模組完成, 其中包含兩個類,elementtree用於表示整個 xml 文件, 而element表示文件中的乙個節點

示例資料, 命名為 book.xml

<?xml version="1.0"?>
>
name
="西遊記"
>
>
吳承恩author
>
>
明朝dynasty
>
name
="封神演義"
author
="許仲琳"
/>
book
>
name
="紅樓夢"
>
>
曹雪芹author
>
>
清朝dynasty
>
book
>
name
="三國演義"
>
>
羅貫中author
>
>
明末清初dynasty
>
name
="三國志"
author
="陳壽"
/>
book
>
bookstore
>

匯入要解析的 xml 文件, 並獲取文件的根節點

import xml.etree.elementtree as et
tree = et.parse(
"./book.xml"
)root = tree.getroot(
)

也可以直接解析字串

with
open
("./book.xml"
)as fp:
root = et.fromstring(fp.read(
))

對於每乙個節點element:

其他還有tag屬性表示標籤名,text表示其包含的文字內容

# 遍歷直接子節點
for book in root:
print
(book.tag, book.attrib, book.get(
"name"))
# 訪問根節點下的第2個子節點, 再向下訪問第1個子節點的文字, 也就是 "曹雪芹"
author = root[1]
[0].text
print
(type
(author)
, author)

列印輸出

book 西遊記 book 紅樓夢 book 三國演義

曹雪芹

獲取到的文字結果與lxml不同, 這裡的結果直接是字串型別

遞迴函式, 可以遍歷所有的後代節點

# 遞迴選擇所有標籤名為 "similar" 的節點
for book in root.
iter
("similar"):
print
(book.attrib)

列印輸出

xpath 語法

xpath 類似於檔案路徑, 路徑中最末尾的部分表示要提取的內容, 分隔符有兩種, "/"表示直接子節點的關係, "//"表示所有的子節點

語法含義

tag匹配特定標籤

*匹配所有元素

.當前節點, 用於相對路徑

…父節點

[@attrib]

匹配包含 attrib 屬性的節點

[@attrib=『value』]

匹配 attrib 屬性等於 value 的節點

[tag]

匹配包含直接子節點 tag 的節點

[tag=『text』]

匹配包含直接子節點 tag 且子節點文字內容為 text 的節點

[n]匹配第 n 個節點

前面必須有標籤名, book[@name][similar] 匹配帶有 name 屬性以及 similar 直接子節點的 book 節點, 然後將 book[@name][similar] 置於 xpath 路徑中, 例如「/bookstore/book[@name][similar]」

可以通過element物件的方法findall(path)和find(path)使用 xpath 語法, 次時路徑是從element代表的節點開始, 也可以通過elementtree物件呼叫findall與find, 相當於路徑從根節點開始

匹配到節點,findall返回所有匹配節點的列表,find返回首個匹配節點, 沒有匹配到節點時,findall返回空列表,find返回none

# . 表示 bookstore 節點
author_1 = tree.find(
"./book[@name='紅樓夢']/author"
).text
author_2 = tree.findtext(
"./book[@name='紅樓夢']/author"
)print
(, author_1, author_2)
author_3 = root.find(
"./book/similar[@name='三國志']"
).get(
"author"
)print
(, author_3)

列印結果

findtext類似於find, 直接獲取節點的文字內容

books_1 = root.findall(
"./book[similar]"
)# 對於直接子節點, 可以省略 ./
books_2 = root.findall(
"book[similar]"
)print
(books_1 == books_2)
for book in books_1:
print
(book[0]
.text, book[1]
.text)

列印結果

true 吳承恩明朝

羅貫中明末清初

python標準庫模組五 Xml模組學習

xml本身是一種格式規範，是一種包含了資料以及資料說明的文字格式規範。在json沒有興起之前各行各業進行資料交換的時候用的就是這個。目前在金融行業也在廣泛在運用。舉個簡單的例子，xml是一種標記性語言，格式類似於資料，這樣乙個封閉起來是乙個整體以上就是xml內部的樣子，可以將其想象成一棵樹，如下圖...

python標準庫時間庫

眾所皆知，每乙個程式語言都有自己的時間類庫，python也不例外用法十分簡單最基本的類，time類 time基本函式介紹 import time print time.asctime 如果未傳入乙個tuple或乙個time struct就是使用當前的時間，返回乙個24字長的時間字串就這個mon ...

python標準庫 os庫

os模組主要用於跟作業系統打交道 os模組常用的方法 os.getcwd 獲取當前工作目錄，即當前python指令碼工作的目錄路徑 os.chdir dirname 改變當前指令碼工作目錄相當於shell下cd os.curdir 返回當前目錄 os.pardir 獲取當前目錄的父目錄字串名 os...

Python 標準庫 xml 詳解

python標準庫模組五 Xml模組學習

python標準庫 時間庫

python標準庫 os庫

相關推薦

python標準庫時間庫