爬蟲筆記（一）

**如下：

import urllib.request
defload_data()
: url =
""response = urllib.request.urlopen(url)
#http響應物件
print
(response)
#讀取內容,bytes型別
data = response.read(
)print
(data)
#把bytes轉換成字串
str_data = data.decode(
"utf-8"
)print
(str_data)
#將抓取到的內容寫入本地html檔案
#建立檔案with open（）as語句。可以在檔案前面指定路徑
#寫入抓取的內容
load_data(
)

**如下：

import urllib.request
import urllib.parse #搜尋引數有中文，python沒法解釋。python只支援解釋ascii碼表0-127位
import string
#defload_data02()
: url =
""search_para =
"超跑"
#python為解釋型語言，只支援ascii碼表0-127，變數裡有中文無法解析，所以要先對變數進行編碼，urllib.parse.quote（變數名）
final_url = url + search_para
encoded_finalurl = urllib.parse.quote(final_url,safe=string.printable)
print
(encoded_finalurl)
response = urllib.request.urlopen(encoded_finalurl)
print
(response)
decoded_data = response.read(
).decode(
)#讀取格式，解析，可以並在這一步。
# print(data_search)
# decoded_data = data_search.decode("utf-8")
print
(decoded_data)
with
open
("supercar01.html"
,"w"
,encoding=
"utf-8"
)as f:
f.write(decoded_data)
load_data02(
)

#總結：

本地抓取網頁的步驟：

1.確認url

2.根據搜尋內容決定是否需要轉碼（使用urllib.parse.quote）

3.獲取返回值response（ulllib.request.urlopen）

4.讀取返回值，解析uft-8格式。（response.read().decode(「utf-8」)）

5.儲存為html檔案。（with open (「檔名」，「w」,encoding = 「utf-8」) as f ,寫入html檔案，f.write（最終讀取的response））

Python爬蟲筆記一爬蟲基本入門

最近在做乙個專案，這個專案需要使用網路爬蟲從特定上爬取資料，於是乎，我打算寫乙個爬蟲系列的文章，與大家分享如何編寫乙個爬蟲。這是這個專案的第一篇文章，這次就簡單介紹一下python爬蟲，後面根據專案進展會持續更新。一何謂網路爬蟲網路爬蟲的概念其實不難理解，大家可以將網際網路理解為一張巨大無比的...

python 爬蟲基礎筆記（一）

筆記記錄來自慕課網 imooc 例 import urllib2,cookielib 建立cookie容器 cj cookielib.cookiejar 建立1個opener 給urllib2安裝opener urllib2.install opener opener 使用帶有cookie的urll...

Python爬蟲學習筆記一

爬蟲網路蜘蛛，通俗講就是模擬瀏覽器。所需要的知識架構關於基礎知識，隨便找本書就可以 urllib和urllib2是基本的爬蟲庫正規表示式比較關鍵框架比較高階，至於是什麼我現在也剛開始學習，一起交流。爬網頁，首先要了解瀏覽網頁是怎麼工作的？使用者輸入之後，經過dns伺服器，找到伺服器主機，向...

爬蟲筆記（一）

Python爬蟲筆記 一 爬蟲基本入門

python 爬蟲基礎筆記（一）

Python爬蟲學習筆記一

相關推薦

Python爬蟲筆記一爬蟲基本入門