爬蟲初學,寫個簡單的爬蟲

首先構造一下請求頭,呼叫request模組傳送請求,

def
request_data
(url)
: headers =
try:
response = requests.get(url, headers=headers)
if response.status_code ==
200:
return response.content.decode(
'gbk'
,'ignore'
)except requests.requestexception:
return
none

然後用bs4解析一下我們的html網頁,

soup = beautifulsoup(html,
'lxml'
)

找一下我們前端網頁中我們需要的資料的所在標籤,獲取一下

def
get_item
(soup)
:list
= soup.find(class_=
'listbox'
).find_all(
'li'
)for item in
list
: item_name = item.find(
'a')
.string
if item_name is
notnone
: write_item(item_name)

寫入,

def
write_item
(item)
:print
('開始寫入資料 *****==>'
+str
(item)
)with
open
('56.txt'
,'a'
, encoding=
'utf-8'
)as f:
f.write(item+
'\n'
) f.close(
)

def
main
(page)
: url =
''+str
(page)
+'.html'
html = request_data(url)
soup = beautifulsoup(html,
'lxml'
) get_item(soup)

乙個簡單的小爬蟲就搞定了,看下結果

開始寫入資料 ==
====
=>定喘湯
開始寫入資料 ==
====
=>射干麻黃湯
開始寫入資料 ==
====
=>黛蛤散
開始寫入資料 ==
====
=>二母散
開始寫入資料 ==
====
=>貝母瓜蔞散
開始寫入資料 ==
====
=>清燥救肺湯

使用Perl語言寫個簡單的爬蟲

之前用scala和go語言分別都寫了乙個爬蟲，最近看了perl，就來寫個功能相同的版本。使用到了lwp 模組，使用 cpan lwp安裝即可 ubuntu 13.04沒有隨perl一同提供這個重要模組，太可惜了如下 1 usr bin perl 2use lwp qw get 34 my page...

嘗試寫個爬蟲（1）

背景知識 url uniform resource locator 也是平常所說的網頁位址。url是標準的internet協議，由協議型別，主機名，資源路徑等組成。格式為 protocol hostname port path parameters query protocal 是協議型別，最常用的...

嘗試寫個爬蟲（2）

爬蟲的工作原理主要技術 1.獲得url，解析出主機，埠以及資源路徑 2.呼叫dns解析程式，將url轉換成ip位址 4.迴圈獲得伺服器端的返回資訊，並儲存到本地。dns 網域名稱系統用於網路資源的命名管理，將internet上的網域名稱與真實的ip位址進行一對一或一對多的對映，使用者可以通過輸入...

爬蟲初學,寫個簡單的爬蟲

使用Perl語言寫個簡單的爬蟲

嘗試寫個爬蟲（1）

嘗試寫個爬蟲（2）

相關推薦