Xpath實戰之爬取學習猿地的猿著上

import requests, json
from lxml import etree
//封裝為類，便於管理
class xp_test():
#請求的位址 猿著
url = ''
headers = 
# 爬取的資料
data = ''
#儲存資料
filepath = './yq.json'
#初始化
def __init__(self):
#傳送請求
res = requests.get(url = self.url,headers = self.headers)
if res.status_code == 200:
# 請求內容寫入檔案
with open('./yq.html','wb') as fp:
fp.write(res.content)
if self.parth_data():
self.write_data()
def parth_data(self):
#解析資料
html = etree.parse('./yq.html',etree.htmlparser())
authors = html.xpath('//div[contains(@class,"old_content")]//div[contains(@class,"list-group-item-action")]//strong/a/text()')
titles = html.xpath('//div[contains(@class,"old_content")]//div[contains(@class,"list-group-item-action")]//div[contains(@class,"flex-fill")]//div/text()')
titleurl = html.xpath('//div[contains(@class,"old_content")]//div[contains(@class,"list-group-item-action")]//div[contains(@class,"flex-fill")]//a/@href')
#整理資料
data = 
for i in range(0,len(authors)):
res = 
self.data = data
return true
def write_data(self):
#寫入資料
print(self.data)
with open(self.filepath,'w',encoding='utf-8') as fp:
json.dump(self.data,fp,ensure_ascii=false)
#例項化物件
xp_test()

4，爬取結果：（yq.json）

[, 
, , 
, , 
, , , ,
, , 
, ,
]

Python爬取學習猿地文章標題，鏈結，時間，作者

爬取學習猿地猿圈爬取內容文章標題，文章連線，作者，時間工具 bs4,requests 結果爬取到檔案之中 from bs4 import beautifulsoup import requests,json 1.定義請求頭和url url headers alldata 2.請求獲取資料 ...

爬取51崗位（xpath的運用）

coding utf 8 import os import re import requests import lxml from lxml import etree 請求頭獲取城市列表 def getcitylist url html requests.get url,headers heade...

Python實戰演練之跨頁爬取

上章回顧上一章python實戰演練之scrapy初體驗中講到了scrapy專案的建立，爬蟲的建立，以及資料的提取。跨頁爬取如何實現不過這些都是建立在單頁網頁資料的爬取之中，很多時候我們需要跨很多頁去爬取資料，這個時候該怎麼實現呢？跨頁爬取的實現所以，srcapy的跨頁爬取很好實現，只用在cou...

Xpath實戰之爬取學習猿地的猿著 上

Python爬取學習猿地文章標題，鏈結，時間，作者

爬取51崗位（xpath的運用）

Python實戰演練之跨頁爬取

相關推薦

Xpath實戰之爬取學習猿地的猿著上