網路爬蟲之Beautifulsoup入門（二）

開啟beautifulsoup之旅

在使用之前，我們還需要配置解析器，本文及之後都使用python自帶的解析器」html.parser」,更多解析器介紹及比較可參考本人部落格 beautiful soup4 之table資料提取。我們使用乙個最常見的例子來說明其使用方法：

html_doc = """
the dormouse's storytitle>
head>
class="title">
the dormouse's storyb>
p>
class="story">once upon a time there were three little sisters; and their names were
href=""
class="sister"
id="link1">elsiea>,
href=""
class="sister"
id="link2">laciea> and
href=""
class="sister"
id="link3">tilliea>;
and they lived at the bottom of a well.p>
class="story">...p>
"""

使用beautifulsoup來解析這段**：

from bs4 import beautifulsoup
soup = beautifulsoup(html_doc, 'html.parser')

首先引入庫函式，接著我們宣告乙個beautifulsoup物件soup，括號內的兩個引數分別是要解析的**段、使用的解析器，以後我們還將豐富引數，如在此配置編碼等，暫時我們只需要這兩個引數即可。接下來僅需對這個物件進行操作即可。

#示例1
soup.title
# the dormouse's storytitle>
#示例2
soup.a
# class="sister"
href=""
id="link1">elsiea>
#示例3
soup.find_all('a')
# [class="sister"
href=""
id="link1">elsiea>,
# class="sister"
href=""
id="link2">laciea>,
# class="sister"
href=""
id="link3">tilliea>]

以上**分別是解析**，獲取解析物件的title標籤內容、a標籤內容、獲取所有a標籤內容，這就是最簡單的應用啦！當然有可能有的小夥伴會疑惑，示例2和示例3都是獲取a標籤，為什麼結果這麼大差異呢？這就涉及到物件的種類及相關的屬性、操作方法。

2.物件

網路爬蟲之Beautifulsoup入門（二）

python 網路爬蟲 beautifulsoup

從零開始學網路爬蟲之BeautifulSoap

爬蟲處理資料的方式（三）BeautifulSoup

網路爬蟲之Beautifulsoup入門（二）

python 網路爬蟲 beautifulsoup

從零開始學網路爬蟲之BeautifulSoap

爬蟲處理資料的方式（三）BeautifulSoup

相關推薦