python爬蟲我要個性網，獲取頭像

python爬蟲學習

提前宣告：請勿他用，僅限個人學習

運用模組有

import requests
import re
import os

較為常規，適合網路小白。lxml和bs4也是基礎。長話短說。

headers=
link=
""#編寫請求頭資訊
編寫請求頭，和要獲取的**，link，一般常用url，只是乙個簡稱。個人習慣吧。
然後開始分析這個**，這次用到的是re
運用正規表示式找到那段文字，
title=re.findall(
'>
',html)
divs=re.
compile(''
)divs=re.findall(divs,html)
# print(divs)
測試一下，開始使用迭代語句，進入我們真正想要爬取的位址
for div in divs:
links=
''+div
resp=requests.get(links,headers=headers)
htmls=resp.text
# print(htmls)
到我們找到之後，links就是我們要找的**，完善這個**，然後開始第二次請求
首先用到正規表示式，獲取我們的第二次想要爬取的**
hrefs=re.
compile(''
) hrefs=re.findall(hrefs,htmls)
ids=re.findall(
'',htmls)
同時編輯好，儲存的路徑，用到os模組，字元裡面有『
base_path =
'f://我要個性網/%s'
%title
foridin ids:
id=re.sub(
'[/]+'
,'--',id
)#字元裡面有/影響我們儲存，去掉
path = os.path.join(base_path,id)
# 建立路徑
完美收工。
全**
import requests
import re
import os
headers=
link=
""#編寫請求頭資訊
r=requests.get(link,headers=headers)
html=r.text
# print(html)
title=re.findall(
'>
',html)
divs=re.
compile(''
)divs=re.findall(divs,html)
# print(divs)
for div in divs:
links=
''+div
resp=requests.get(links,headers=headers)
htmls=resp.text
# print(htmls)
hrefs=re.
compile(''
) hrefs=re.findall(hrefs,htmls)
ids=re.findall(
'',htmls)
base_path =
'f://我要個性網/%s'
%title
foridin ids:
id=re.sub(
'[/]+'
,'--',id
) path = os.path.join(base_path,id)
# 建立路徑
安排一波
 Python爬蟲獲取拉勾網招聘資訊
之前寫過乙份爬取拉勾網搜尋 資料分析 相關職位的文章拉勾網職位資訊爬蟲練習 最近入職了一家設計為主的公司，所以想做乙份關於 設計 的資料分析報告，發現直接跑原來的 會爬不到資料，所以稍微修改了一下。本篇主要記錄爬蟲 匯入使用的庫 import requests from bs4 import bea...
Python 爬蟲（獲取小說）
以 筆趣閣 為例 需求 python3版本以上 安裝方法如下 先安裝python3 pip，然後檢查下版本，如果版本可以公升級，就 upgrade pip 一下，然後再安裝beautifulsoup4 sudo apt get install python3 pip pip3 version pip...
Python 花瓣網動態爬蟲
好久沒有寫爬蟲了，之前只是止步於爬取靜態網頁，於是準備找個簡單的動態網頁進行爬取，在學長的建議下，進軍花瓣網。首先在爬取之前肯定要對網頁原始碼進行分析 這裡可以使用chrome的f12開發人員工具，很簡單就找到了的url，這不就直接可以開始爬取了麼2333 但是事實並不是這樣，我使用requests...

python爬蟲我要個性網，獲取頭像

Python爬蟲獲取拉勾網招聘資訊

Python 爬蟲（獲取小說）

Python 花瓣網動態爬蟲

相關推薦