用python3爬取百度首頁

import urllib.request
import urllib
url=
""html=urllib.request.urlopen(url)
content=html.read(
).decode(
'utf-8'
)#html_text=bytes.decode(html.read())
#print(html_text)
print
(content)

在控制台輸入pip install bs4安裝beautifulsoup

from urllib.request import urlopen
from bs4 import beautifulsoup as bf
html=urlopen(
"")obj=bf(html.read(),
'html.parser'
)print
(obj.head.title)

from urllib.request import urlopen
from bs4 import beautifulsoup as bf
html = urlopen(
"")obj = bf(html.read(),
'html.parser'
)title=obj.head.title
pic_info = obj.find_all(
'img'
)#分別列印每個的資訊
for i in pic_info:
print
(i)

執行完得到所有的資訊結果，包含了所有的屬性

得到的logo位址如下所示

)成功獲取logo，命名為logo.png

採用urllib中的request.urlopen讀取網頁內容

用bytes.decode可以將網頁內容轉換為位元組

採用bs4將網頁內容結構化，方便讀取

beautifulsoup中的find_all方法可以提取包含在標籤裡的資訊。

1. 有哪些足不出戶，能用十天左右時間掌握的新技能？ - 朱衛軍的回答 - 知乎

python3爬取百度百科

在每個頁面裡只爬 h1 標題和下面的一段簡介準備工作資料庫需要三個字段，id，標題，內容資料庫一定要在建立的時候加入 character set utf8 不然會引發好多錯誤開始爬！先找到當前頁面的所有內鏈找規律是 item 開頭的，所以利用正規表示式刷刷刷，之後利用beatuiful很...

python爬蟲之爬取百度首頁

剛開始學習爬蟲，照著教程手打了一遍，還是蠻有成就感的。使用版本 python2.7 注意 python2的預設編碼是ascii編碼而python3預設編碼是utf 8 import urllib2 url response urllib2.urlopen url print response.rea...

Python3爬蟲爬取百度貼吧

1.需求分析為了爬取貼吧中樓主所發表的帖子，並把內容提取出來儲存到txt檔案中。2.全部這份寫的比較早，所以裡面提取內容基本上用的全是正規表示式，並沒有呼叫一些非常高階的包。如下 coding utf 8 import urllib.request import urllib.parse im...

用python3爬取百度首頁

python3爬取百度百科

python爬蟲 之 爬取百度首頁

Python3爬蟲爬取百度貼吧

相關推薦

python爬蟲之爬取百度首頁