使用BeautifulSoup解析頁面

@安裝beautifulsoup4：

@引入類庫

# 由bs4引入雞湯
from bs4 import beautifulsoup

@獲得beautifulsoup物件

# 宣告使用lxml作為解析器，獲得一碗雞湯 # 這裡必須同時裝有lxml

bsp = beautifulsoup(page_text, 'lxml')

@獲得具體頁面元素

# 獲得 title 元素
# print(bsp.title)
# print(type(bsp.title)) # # 獲得 title 元素文字
# print(bsp.title.text)
# print(bsp.title.string)
# 獲得第乙個div元素
# print(bsp.div)
# 獲得所有div元素
# print(bsp.find_all('div'))
# print(bsp.select('div'))
# 所有擁有id屬性的div元素集合列表
# print(bsp.select('div[id]'))
# 所有class屬性為div_classname的所有元素
# print(bsp.select('.div_classname'))
# print(bsp.select('div[class=div_classname]'))
# 所有id屬性為divid的所有元素
# print(bsp.select('#divid'))
# print(bsp.select('div[id=divid]'))
# 位置為最前面2個的div元素
# print( bsp.find_all('div', limit=2) )
# 第乙個a元素的href屬性
# print( bsp.a.get('href') )
# print( bsp.a.attrs['href'] )
# 第二個a元素的所有屬性
# print( bsp.a.find_next('a').attrs['href'] )
# print( bsp.select('a')[1].attrs['href'] )
#id=divid的div元素一級子a元素
# print( bsp.select('div[id=divid] > a') )
#id=divid的div元素下所有層的a元素
# print( bsp.select('div[id=divid] a') )
#id=divid的div標籤下第1個span的id屬性值
# print( bsp.select('div[id=divid] span')[0].attrs['id'] )
# 獲得所有a元素的href屬性集合
# print( [a.attrs['href'] for a in bsp.select('a')] )
# 所有屬性【非空】的div元素集合列表
# print( [div for div in bsp.select('div') if div.attrs] )
# 所有屬性為【空】的div元素集合列表
# print( [div fordivin bsp.select('div') if not div.attrs] )

BeautifulSoup 安裝使用

linux環境 1.安裝方法一解壓 tar xzvf beautifulsoup4 4.2.0.tar.gz 安裝進入解壓後的目錄 python setup.py build sudo python setup.py install 方法二快速安裝 ubuntu sudo apt get i...

BeautifulSoup使用相關知識

1基礎使用，獲取某一內容的h1標籤 2複雜html解析 print name.get text get text 清除標籤，只保留內容 4通過網際網路採集外鏈 from urllib.request import urlopen from bs4 import beautifulsoup imp...

使用BeautifulSoup解析HTML

通過css屬性來獲取對應的標籤，如下面兩個標籤可以通過class屬性抓取網頁上所有的紅色文字，具體如下 from urllib.request import urlopen from bs4 import beautifulsoup html urlopen bsobj beautifulsou...

使用BeautifulSoup解析頁面

BeautifulSoup 安裝使用

BeautifulSoup使用相關知識

使用BeautifulSoup解析HTML

相關推薦