python 爬取智聯招聘

2021-09-27 00:03:44 字數 1557 閱讀 7205

乙個爬取智聯的乙個小爬蟲

python版本:python3.7

依賴模組:selenium、pyquery

廢話少說,上**

from selenium import webdriver

from selenium.webdriver.chrome.options import options

from selenium.webdriver.common.keys import keys

from pyquery import pyquery as pq

import time

class zhilian:

def __init__(self):

# 設定 chrome 無介面化模式

self.chrome_options = options()

self.chrome_options.add_argument('--headless')

self.chrome_options.add_argument('--disable-gpu')

self.driver = webdriver.chrome(chrome_options=self.chrome_options)

def get_url(self, search='python'):

"""獲取搜尋職位的url, demo裡面預設搜尋python

:param search:

:return:

"""self.driver.get("")

element = self.driver.find_element_by_class_name("zp-search__input")

element.send_keys(f"")

element.send_keys(keys.enter)

# 切換視窗

self.driver.switch_to.window(self.driver.window_handles[1])

# 等待js渲染完成後,在獲取html

time.sleep(4)

html = self.driver.find_element_by_xpath("//*").get_attribute("outerhtml")

return html

def data_processing(self):

"""處理資料

:return:

"""html = self.get_url()

doc = pq(html)

for content in contents.items():

yield jobname, companyname, saray, ",".join(demand.split("\n"))

datas = zhilian().data_processing()

for data in datas:

print(data)

執行結果:

python爬取智聯招聘資訊

importrandom importre fromtimeimportsleep importrequests fromtqdmimporttqdm importuser agents importcsv defget page city,keyword,page 構造請求位址 paras 完整網...

python爬取智聯招聘資訊

分享今天寫的乙個爬取智聯招聘資訊的爬蟲,使用了requests和re模組,沒有寫注釋,但是 都比較簡單,不是太難,這是爬取的資訊 coding utf 8 import requests import re from itertools import izip from json import du...

Python爬取智聯招聘職位資訊

from urllib import request from urllib import parse from bs4 import beautifulsoup import csv 管理json資料的模組的 import json 定義智聯的爬蟲類 class zhilianspider obj...