BeautifulSoup模組解析html

beautiful soup是乙個的三方模組，用於從html頁面提取資訊（用於這個目的時，它比正規表示式更好用）

安裝及匯入

pip install beautifulsoup4#安裝

import bs4#匯入

接下來開始學習這個模組

bs.beautifulsoup()函式呼叫時需要乙個字串，其中包含將要解析的html。

bs.beautifulsoup() 函式返回乙個 beautifulsoup 物件，於是你的beautiful的物件就生成了

使用時需要保持計算機與網際網路的連線

exampsoup=bs4.beautifulsoup(examplefile)
#示例

有了bs物件之後，就可以利用他的方法，定位html中的特定部分

傳遞給select（）方法的選擇器

將匹配…

soup.select(『div』)

所有名為soup.select（『#author』）

帶有id屬性為author的元素

soup.select(』.notice』)

所有使用css class屬性名為notice 的元素

soup.select(『div span』)

所有在div元素之內的span元素

soup.select(『input[name]』)

所有名為input，並有乙個name屬性，其值無所謂的元素

soup.select(『input[type=「button」]』)

所有名為input，並有乙個type屬性，其值為button的元素

soup.select（）方法將返回乙個tag物件的列表，tag值可以傳遞給str（）函式，顯示他們代表的html標籤。tag值也可也有attrs屬性，他將該tag的所有html屬性作為乙個字典

download mypythonbook from "">my website.
"slogan">learn python the easy way!
by "author">al sweigart

下面例1中用到的example.html就是上面的**

import bs4
examplefile=
open
('example.html'
)examplesoup=bs4.beautifulsoup(examplefile.read(),
"html.parser"
)#這裡的read沒有也可以，用requests返回值+.text
elms=examplesoup.select(
'#author'
)print
(type
(elms)
)print
(len
(elms)
)print
(elms[0]
.gettext())
print
(str
(elms[0]
))print
(elms[0]
.attrs)

執行結果如下

d:\recent\code\venv\scripts\python.exe d:
/recent/code/venv/test.py
<
class
'bs4.element.resultset'
>
1al sweigart
="author"
>al sweigart<
/span>

例2：

from bs4 import beautifulsoup
s ='今開
3.87
成交量85.12萬手
'soup = beautifulsoup(s,
'html.parser'
)list
= soup.select(
'dl'
)#解析s中dl之間的元素
print
(list[0
].gettext())
print
(list[1
].gettext(
))

執行結果

d:\recent\code\venv\scripts\python.exe d: /recent/code/venv/test.py 今開3.87 成交量85.12萬手

例3：把上面例子改一下（貌似沒事實際意義)

from bs4 import beautifulsoup
s ='今開
3.87
成交量85.12萬手
'soup = beautifulsoup(s,
'html.parser'
)list
= soup.select(
'dd')[
0]print
(str
(list))
print
(list
.get(
'class')[
0])print
(list
.attrs)

d:\recent\code\venv\scripts\python.exe d:
/recent/code/venv/test.py
="s-down"
>
3.87
<
/dd>
s-down

總結一下bs模組三步走

1、從html建立乙個bs物件，物件=bs4.beautifulsoup(』'html字串「）

2、用select()方法尋找元素

3、通過元素的屬性獲取資料

爬蟲 BeautifulSoup 模組

二根據這個dom樹就可以按照節點的名稱屬性和文字搜尋節點 find all 方法會搜尋出所有滿足要求的節點，find 方法只會搜尋出第乙個滿足要求的節點兩個方法的引數一模一樣三得到節點以後，就可以訪問它的名稱屬性文字。a為標籤名稱超連結 href，class為屬性，顯示在頁面上的是p...

BeautifulSoup模組的簡單使用

可以通過dir beautifulsoup.beautifulsoup 檢視其有什麼函式，如果想知道某個函式的含義可以使用help beautifulsoup.beautifulsoup.find 來檢視其官方文件。可以使用pprint來整輸出，使用dir和help之前一定要import beaut...

資料解析模組BeautifulSoup簡單使用

1 準備測試頁面test.html html head title the dormouse s story title head body p class title b the dormouse s story b p p class story once upon a time there w...

BeautifulSoup模組解析html

爬蟲 BeautifulSoup 模組

BeautifulSoup模組的簡單使用

資料解析模組BeautifulSoup簡單使用

相關推薦