爬蟲Beautiful Soup的CSS選擇器

一點睛

beautiful soup還提供了另外一種選擇器，那就是css選擇器。如果對web開發熟悉的話，那麼對css選擇器肯定也不陌生。如果不熟悉的話，可以參考了解。

使用css選擇器時，只需要呼叫select()方法，傳入相應的css選擇器即可。

二基礎用法

1 **

html='''
hello
'''from bs4 import beautifulsoup
soup = beautifulsoup(html, 'lxml')
print(soup.select('.panel .panel-heading'))
print(soup.select('ul li'))
print(soup.select('#list-2 .element'))
print(type(soup.select('ul')[0]))

2 結果

e:\webspider\venv\scripts\python.exe e:/webspider/4_2.py
[hello
][foo, bar, jay, foo, bar]
[foo, bar]

3 說明

這裡我們用了3次css選擇器，返回的結果均是符合css選擇器的節點組成的列表。例如，select('ul li')則是選擇所有ul節點下面的所有li節點，結果便是所有的li節點組成的列表。

最後一句列印輸出了列表中元素的型別。可以看到，型別依然是tag型別。

三巢狀選擇

1 **

html='''
hello
'''from bs4 import beautifulsoup
soup = beautifulsoup(html, 'lxml')
# select()方法同樣支援巢狀選擇。例如，先選擇所有ul節點，再遍歷每個ul節點，選擇其li節點
for ul in soup.select('ul'):
print(ul.select('li'))

2 結果

e:\webspider\venv\scripts\python.exe e:/webspider/4_2.py
[foo, bar, jay]
[foo, bar]

3 說明

可以看到，這裡正常輸出了所有ul節點下所有li節點組成的列表。

四獲取屬性

1 **

html='''
hello
'''from bs4 import beautifulsoup
soup = beautifulsoup(html, 'lxml')
# 嘗試獲取每個ul節點的id屬性
for ul in soup.select('ul'):
print(ul['id'])
print(ul.attrs['id'])

2 結果

e:\webspider\venv\scripts\python.exe e:/webspider/4_2.py list-1 list-1 list-2

list-2

3 說明

可以看到，直接傳入中括號和屬性名，以及通過attrs屬性獲取屬性值，都可以成功。

五獲取文字

1 **

html='''
hello
'''from bs4 import beautifulsoup
soup = beautifulsoup(html, 'lxml')
# 用string屬性和get_text()獲得結果一樣
for li in soup.select('li'):
print('get text:', li.get_text())
print('string:', li.string)

2 結果

e:\webspider\venv\scripts\python.exe e:/webspider/4_2.py get text: foo string: foo get text: bar string: bar get text: jay string: jay get text: foo string: foo get text: bar

string: bar

爬蟲 BeautifulSoup 模組

二根據這個dom樹就可以按照節點的名稱屬性和文字搜尋節點 find all 方法會搜尋出所有滿足要求的節點，find 方法只會搜尋出第乙個滿足要求的節點兩個方法的引數一模一樣三得到節點以後，就可以訪問它的名稱屬性文字。a為標籤名稱超連結 href，class為屬性，顯示在頁面上的是p...

爬蟲beautifulsoup實踐

爬蟲beautifulsoup實踐一觀察response。首先，在chrome瀏覽器裡觀察一下該網頁的response內容，可以觀察到，的url都存放在img標籤下面，srcset屬性裡面，而且它們的class屬性都為 2zekz。二理清爬蟲步驟的思路。規律已經找出來了下一步就把爬蟲的思路寫...

爬蟲資料 Beautiful Soup

安裝 pip intsall bs4 beautiful soup的簡介 beautiful soup是python的乙個庫，最主要的功能是從網頁抓取資料,官方解釋如下 github位址和lxml一樣，beautifulsoup也是乙個html xml的解析器，主要功能也是如何解析和提取html ...

爬蟲Beautiful Soup的CSS選擇器

爬蟲 BeautifulSoup 模組

爬蟲beautifulsoup實踐

爬蟲資料 Beautiful Soup

相關推薦