python爬蟲爬取騰訊招聘資訊（靜態爬蟲）

環境：

windows7，python3.4

**：（親測可正常執行）

1
import
requests
2from bs4 import
beautifulsoup
3from math import
ceil
45 header =78
9#獲取崗位頁數
10def
getjobpage(url):
11 ret = requests.get(url, headers=header)
12 ret.encoding = "
utf-8"#
解決亂碼問題
13 html =ret.text
14 soup = beautifulsoup(html, '
html.parser')
15#獲取崗位總數，< span class ="lightblue total" > 512 < / span >
16 totaljob = soup.select('
span[class="lightblue total"]
')[0].text
17 jobpage = ceil(int(totaljob) / 10)
18return
jobpage
1920
21def
getjoborder(url):
22 ret = requests.get(url, headers=header)
23 ret.encoding = "
utf-8"#
解決亂碼問題
24 html =ret.text
25 soup = beautifulsoup(html, '
html.parser')
26#工作職責
27 jobrequests = soup.select('
ul[class="squareli"]
')[0].text28#
工作要求
29 joborder = soup.select('
ul[class="squareli"]
')[1].text
30return
jobrequests, joborder
313233#
獲取崗位資訊
34def
getjobinfo(url):
35 myfile = open("
tencent_job.txt
", "
a", encoding='
gb18030
', errors='
ignore
') #
解決亂碼問題
36 ret = requests.get(url, headers=header)
37 ret.encoding = "
utf-8"#
解決亂碼問題
38 html =ret.text
39 soup = beautifulsoup(html, '
html.parser')
40 joblist = soup.find_all('
tr', class_=['
even
', '
odd'
])41
for job in
joblist:42#
url43 joburl = "
" + job.select('
td:nth-of-type(1) > a
')[0]['
href']
44#職位名稱
45 jobname = job.select('
td:nth-of-type(1) > a
')[0].text46#
人數47 jobpeople = job.select('
td:nth-of-type(3)
')[0].text48#
地點49 jobaddre = job.select('
td:nth-of-type(4)
')[0].text50#
發布時間
51 jobtime = job.select('
td:nth-of-type(5)
')[0].text52#
工作職責
53 jobrequests =getjoborder(joburl)[0]54#
工作要求
55 joborder = getjoborder(joburl)[1]
5657
#print(jobname, joburl, jobaddre, jobpeople, jobtime, jobrequests, joborder)
5859 tt = jobname + "
" + joburl + "
" + jobaddre + "
" + jobpeople + "
" + jobtime + "
" + jobrequests + "
" +joborder
60 myfile.write(tt + "\n"
)616263
if__name__ == '
__main__':
64 mainurl = '
position.php?keywords=python
'65 jobpage =getjobpage(mainurl)
66print
(jobpage)
67for page in
range(jobpage):
68 pageurl = '
position.php?keywords=python&start=
' + str(page * 10) + '#a'
69print("
第" + str(page + 1) + "頁"
)70 getjobinfo(pageurl)

python爬蟲爬取騰訊網招聘資訊

話不多說，直接上 from bs4 import beautifulsoup import urllib2 import json 使用了json格式儲存 deftengxun detail,num url detail position.php?start 0 a request urllib2....

爬蟲爬取騰訊熱點

1.了解ajax載入 2.通過chrome的開發者工具，監控網路請求，並分析 3.用selenium完成爬蟲 4.實現用selenium爬取的熱點精選，熱點精選至少爬50個出來，儲存成 csv 每一行如下標號從1開始標題,鏈結,前三個為必做，後面內容可以自己加 import time fr...

python3 scrapy 爬取騰訊招聘

安裝scrapy不再贅述，在控制台中輸入scrapy startproject tencent 建立爬蟲專案名字為 tencent 接著cd tencent 用pycharm開啟tencent專案構建item檔案 coding utf 8 define here the models for yo...

python爬蟲爬取騰訊招聘資訊 （靜態爬蟲）

python爬蟲爬取騰訊網招聘資訊

爬蟲 爬取騰訊熱點

python3 scrapy 爬取騰訊招聘

相關推薦

python爬蟲爬取騰訊招聘資訊（靜態爬蟲）

爬蟲爬取騰訊熱點