爬取全國各大院校2019考研調劑資訊

**實現：

# kaoyan.py
# -*- coding: utf-8 -*-
from copy import deepcopy
import scrapy
from scrapy.linkextractors import linkextractor
from scrapy.spiders import crawlspider, rule
class kaoyanspider(crawlspider):
name = 'kaoyan'
allowed_domains = ['kaoyan365.cn']
start_urls = ['']
rules = (
# 提取各個省份的url位址
rule(linkextractor(allow=r''), callback='parse_list',
follow=false),
)def parse_list(self, response):
# 提取各個大學名稱及鏈結
td_list = response.xpath('//div[@class="zg_list_left01_cont"]//td')
for td in td_list:
item = {}
item["university"] = td.xpath('.//text()').extract_first()
item["href"] = td.xpath('./a/@href').extract_first()
if item["href"]:
yield scrapy.request(
item["href"],
callback=self.parse_university,
meta=
)def parse_university(self, response):
# 獲取網頁詳細內容
item = response.meta["item"]
item["content"] = response.xpath("//div[@class='zg_list_left01_cont']//text()").extract()
yield item

# pipelines.py
# -*- coding: utf-8 -*-
# define your item pipelines here
## don't forget to add your pipeline to the item_pipelines setting
# see: 
import re
class tiaojipipeline(object):
def process_item(self, item, spider):
# 寫入檔案
with open("考研調劑資訊.txt", "a", encoding="utf-8") as f:
f.write("***" + item["university"] + "：" + item["href"] + "\n")
# 清理無效資料
self.clear_item(item["content"])
return item
def clear_item(self, content_list):
"""清理無效資料"""
for content in content_list:
content = re.sub(r"u3000", "", content)
with open("考研調劑資訊.txt", "a", encoding="utf-8") as f:
f.write(content.strip() + "\n")

xpath案例全國城市名爬取

usr bin python import requests from lxml import etree 專案需求解析出所有的城市名稱 if name main headers url page text requests.get url url,headers headers text tre...

requests二次爬取全國郵編

全國郵編的我們這次是爬取每乙個省裡面的所有郵編資訊這裡要進行二次爬取，才能完全獲取完資料.import requests,re ip proxy 偽裝頭資訊 headers 根據正常跳轉獲取分析再進行拼接 url url response requests.get url,headers ...

Python實現爬取全國疫情資料和地區疫情查詢

乙個小小的爬蟲程式，練練手，沒什麼實際作用，希望疫情趕快過去。1.獲取url 經過尋找，發現包含疫情資料的url為 2.為了避免反爬，偽裝成瀏覽器 headers 3.最關鍵的一步，分析url，找到資料存放的規律這些json資料看似雜亂無章，其實很好找到規律如下 4.完成 import requ...

爬取全國各大院校2019考研調劑資訊

xpath案例 全國城市名爬取

requests二次爬取全國郵編

Python實現爬取全國疫情資料和地區疫情查詢

相關推薦

xpath案例全國城市名爬取