python 抓取拉勾網攻略

廢話不多說，直接上**，將資料存入mongdb

import
requests
import
pymongo
import
time
import
random
mycon = pymongo.mongoclient('
127.0.0.1
',27017) #
建立連線
mydb = mycon['
lagou_data
'] #
設定庫名
class
lagouspider():
def__init__
(self,city,kd):
self.headers =
self.city =city
self.max_pn = 1self.kd =kd
defget_start(self):
mycol = mydb[self.kd] #
設定集合名
url = "
"+ self.city +"
&needaddtionalresult=false
"for page in range(1,10):
data =
s =requests.session()
s.get(url = "
",headers =self.headers)
cookies =s.cookies
response = s.post(url=url,data=data,cookies = cookies,headers =self.headers).json()
content = response.get('
content')
ifcontent:
result = content['
positionresult
']['
result']
print('
崗位名稱:{},所在城市:{},開始抓取第:{}頁\n
'.format(self.kd,self.city,page))
for i in
result:
lagou_data ={}
lagou_data[
'positionname
'] = i['
positionname
'] #
崗位名稱
lagou_data['
companyfullname
'] = i['
companyfullname
'] #
公司全名
lagou_data['
workyear
'] = i['
workyear
'] #
工作經驗要求
lagou_data['
education
'] = i['
education
'] #
學歷要求
lagou_data['
jobnature
'] = i['
jobnature
'] #
工作性質
lagou_data['
salary
'] = i['
salary
'] #
薪資 lagou_data['
city
'] = i['
city
'] #
所在城市
lagou_data['
financestage
'] = i['
financestage
'] #
金融階段
lagou_data['
industryfield
'] = i['
industryfield
'] #
經營範圍
lagou_data['
companyshortname
'] = i['
companyshortname
']#公司簡名
lagou_data['
positionadvantage
'] = i['
positionadvantage
']#崗位優勢
lagou_data['
companysize
'] = i['
companysize
'] #
公司規模
lagou_data['
companylabellist
'] = i['
companylabellist
']#崗位待遇標籤
lagou_data['
district
'] = i['
district
'] #
所在區域
lagou_data['
positionlables
'] = i['
positionlables
'] #
技術範圍標籤
lagou_data['
firsttype
'] = i['
firsttype
'] #
崗位型別
lagou_data['
createtime
'] = i['
createtime
'] #
發布時間
print
(lagou_data)
mycol.insert(lagou_data)
time.sleep(random.uniform(3,7)) #
隨機休眠
if__name__ == '
__main__':
lagou = lagouspider('北京
','python')
lagou.get_start()

簡述：拉勾網反爬一般，也就是先獲取該搜尋頁面中的 cookies資訊，然後新增到返回的json資料介面中。

node爬蟲抓取拉勾網資料

初始化 1.安裝了node 2.新建乙個資料夾 3.在該資料夾中初始化node應用 npm init安裝依賴使用express框架使用superagent庫 superagent 是乙個輕量級漸進式的請求庫，內部依賴 nodejs 原生的請求 api,適用於 nodejs 環境使用cheer...

爬蟲拉勾網 selenium

使用selenium進行翻頁獲取職位鏈結，再對鏈結進行解析會爬取到部分空列表，感覺是網速太慢了，加了time.sleep 還是會有空列表 1 from selenium import webdriver 2import requests 3importre4 from lxml import et...

Python爬取拉勾網招聘資訊

最近自學研究爬蟲，特找個地方記錄一下就來到了51cto先測試一下。第一次發帖不太會。先貼個首先開啟拉勾網首頁，然後在搜尋框輸入關鍵字python。開啟抓包工具。因為我的是mac os，所以用的自帶的safari瀏覽器的開啟時間線錄製。通過抓取post方法，可以看到完整url 然後可以發現post...

python 抓取拉勾網 攻略

node爬蟲抓取拉勾網資料

爬蟲 拉勾網 selenium

Python爬取拉勾網招聘資訊

相關推薦

python 抓取拉勾網攻略

爬蟲拉勾網 selenium