BeautifulSoup選擇器語法

本部落格所用示例html**：


"content"
>
>
titleh1
>
class
="clear fix"
>
class
="article"
>
class
="indent"
>
class
="ul first"
>
p>
width
="100%"
>
class
="item"
>
width
="100"
valign
="top"
>
class
="nbg"
href
="somelink"
>
src=
"link_to_the_picture"
width
="90"
/>
a>
td>
valign
="top"
>
class
="pl2"
>
href
="somelink"
title
="title"
>
information 1a
>
div>
class
="pl"
>
information 2p
>
class
="star clearfix"
>
class
="allstar50"
>
span
>
class
="rating"
>
information 3span
>
class
="pl"
>
information 4span
>
div>
class
="quote"
style
="somestyle
">
class
="inq"
>
information 5span
>
p>
td>
tr>
table
>
>
information 6h2
>
div>
div>
div>
div>
假設我們已經用以上內容初始化了乙個bs4物件:
soup = beautifulsoup(html,

'lxml'

)在以下內容中預設#後為輸出結果，為簡潔起見，不再寫print()

soup.h1                              #

soup.p #

(文字中有多個p節點時，只輸出第乙個)

soup.div # 整個html文件 (注意若節點內巢狀有別的節點會全部輸出)

soup.p.name # p (獲取節點名似乎是個很雞肋的寫法)

soup.a.attrs # （獲取a節點所有屬性）

soup.a.attrs['href'] # somelink

soup.table.div.a.string # information 1 （.string可以獲取節點的文字內容）

soup.table.tr.td.contents # ['\n',

（.contents可以獲取指定節點下所有直接子節點組成的列表注意換行符也包括在內）

soup.table.tr.td.children # （依然是獲取指定節點下的所有直接子節點，不過返回的是乙個可迭代物件，可用for..in遍歷）

soup.ul.descendants # （獲取指定節點下所有子孫節點，可用for...in迭代）

soup.p.parent.name # div （獲取指定節點的直接父節點的名字）

soup.p.parents # (獲取指定節點的所有父節點，可用for...in遍歷)

soup.table.previous_siblings # （獲取指定節點的所有前⾯的兄弟節點，可用for...in遍歷）

soup.table.next_siblings # （獲取指定節點的所有後⾯的兄弟節點，可用for...in遍歷）傳⼊屬性或⽂本，返回所有符合條件的節點

soup.find_all(name="p")                          # [

, information 2

, information 5

] （獲取名字為指定名的所有節點）

soup.find_all(attrs=) # [information 5] （獲取滿足attr給定的屬性的所有節點）

soup.find_all(class_="inq") # 同上 (注意這種方法和上一種方法在匹配時只要class屬性中包含所給字段就可以匹配到)

soup.find_all(id='content') # 整個html文件（獲取滿足給定的id的所有節點）

soup.find_all(text=re.compile('information')) # ['information 1', 'information 2', 'information 3', 'information 4', 'information 5', 'information 6'] （獲取含有給定欄位的所有文字，其中re是正則模組）傳⼊屬性或⽂本，返回所有符合條件的第⼀個元素

soup.find(name="p")                          #

（獲取名字為指定名的第乙個節點）

soup.find(text=re.compile('information')) # information 1 （獲取含有給定欄位的第乙個文字，其中re是正則模組）

其餘按屬性和文字查詢的語法和find_all()相同，只是結果只取第乙個匹配的節點，不再贅述通過呼叫select()方法可以使用css選擇器的語法進行匹配

soup.select('table tr img')                   # [soup.select('div.star span')                  # [, information 3, information 4]
soup.select('table tr img')[0].attrs['src'] # link_to_the_picture （獲取節點屬性值的方法，注意select返回的結果是列表，故需要先取出其中的元素）
soup.select('table div.pl2')[0].text # information 1 （獲取節點文字的方法，與上例相同，需要先將列表裡的元素挑出來）
soup.select('table div.pl2')[0].get_text() # 結果同上
for a in soup.select('div.star span'):
print(a['class']) # 獲取屬性class的值
print(a.attrs['href']) # 等價 同上 獲取屬性值
print(a.get_text()) # 等價於print(a.string) 獲取元素節點的⽂本內容

BeautifulSoup選擇屬性（一）

soup.select class class 名可以提取出class類同名的 soup.select div div名可以提取出div類同名的如下 import requests 帶入requests模組 from bs4 import beautifulsoup 從bs4中帶入reques...

BeautifulSoup的選擇器

用beautifulsoup查詢指定標籤元素的時候，有幾種方法 soup beautifulsoup html 1.soup.find all tagname 返回乙個指定tag元素的列表 2.soup.select selector 返回乙個指定tag元素的列表，是非常好用的方法，它支援大部分...

BeautifulSoup解析器的選擇

在我們使用beautifulsoup的時候，選擇怎樣的解析器是至關重要的。使用不同的解析器有可能會出現不同的結果！今天遇到乙個坑，在解析某html的時候。使用html.parser解析器自己將table標籤截斷了當然這與html本身有直接關係原html如下排序中標候選人名稱投標質量工期...

BeautifulSoup選擇器語法

BeautifulSoup選擇屬性（一）

BeautifulSoup的選擇器

BeautifulSoup解析器的選擇

相關推薦