Python爬取拉勾網招聘資訊

最近自學研究爬蟲，特找個地方記錄一下**。就來到了51cto先測試一下。第一次發帖不太會。先貼個**。

首先開啟拉勾網首頁，然後在搜尋框輸入關鍵字python。開啟抓包工具。因為我的是mac os，所以用的自帶的safari瀏覽器的開啟時間線錄製。通過抓取post方法，可以看到完整url=

然後可以發現post的資料有三個，乙個是first，kd，pn。其中first應該是判斷是不是首頁，kd就是你輸入的關鍵字，pn就是頁碼。除了第一頁的first是true以外都是false。所以就可以用過if判斷每次要post的資料。你從瀏覽器輸入上面的**他給你返回的應該是遺傳json資料。所以需要json.loads()來處理這些資料。看了一下json，跟多維陣列的使用比較類似。。。最後就是把我需要的資料趴下來寫到文字檔案中。

#coding=utf-8

importjson

importurllib2

importurllib

importsys

reload(sys)

sys.setdefaultencoding('utf-8')

page=1

length=0

index=1

f=open('lagoudata.txt','a+')

whilepage<5:

if(page==1):

post_data=

else:

post_data=

page=page+1

r=urllib2.request("", urllib.urlencode(post_data))

html=urllib2.urlopen(r).read()

hjson=json.loads(html)

result=hjson['content']['result']

# print result

length=length+len(result)

foriinrange(len(result)):

string=str(index)+','+result[i]['companyname']+','+result[i]['financestage']+','+result[i]['positionadvantage']+','+result[i]['education']+','+result[i]['workyear']+','+result[i]['city']+','+result[i]['salary']

f.write(string)

f.write('\r\n')

index=index+1

#print string

f.close()

printlength

因為這邊拉鉤網返回的json資料，所以要做處理。反正下圖是我最後爬的資料

Python爬取拉勾網招聘資訊

初級爬蟲爬取拉勾網職位資訊

Python爬蟲獲取拉勾網招聘資訊

scrapy爬蟲之爬取拉勾網職位資訊

Python爬取拉勾網招聘資訊

初級爬蟲 爬取拉勾網職位資訊

Python爬蟲獲取拉勾網招聘資訊

scrapy爬蟲之爬取拉勾網職位資訊

相關推薦

初級爬蟲爬取拉勾網職位資訊