jsoup多級爬取鏈家租房資料

主要是由於現在最近正在找房子，所以對鏈家的**進行了分析

##**

for (int i =0;i<50;i++) catch (ioexception e) 
elements elements = document.select("div[class=content__list--item]");
for (element element : elements) catch (ioexception e) 
string title = doc.select("p[class=content__title]").text(); //標題
zu.settitle(title);
string description = doc.select("p[class=content__aside--tags]").text();//特點描述
zu.setdescription(description);
string brand = "鏈家";//品牌
zu.setbrand(brand);
string time = doc.select("div[class=content__subtitle]").text();//發布時間
zu.settime(time);
string price = doc.select("p[class=content__aside--title]").select("span").text();//**
zu.setprice(price);
string feature = doc.select("p[class=content__article__table]").text();
zu.setfeature(feature);
string floor = doc.select("li[class=fl oneline]").eachtext().get(7)+"-------"+
doc.select("li[class=fl oneline]").eachtext().get(8);
zu.setfloor(floor);
string around = doc.select("div[id=around]").select("ul").text();
zu.setaround(around);
string housecoment = doc.select("div[class=content__article__info3]").select("p").
attr("data-el","housecomment").attr("data-desc");
zu.sethousecoment(housecoment);
string lxr = doc.select("ul[id=agentlist]").select("li:nth-child(1)").select("div[class=desc]").
select("div[class=title]").select("a[class=name]").text()+"--------"+
doc.select("ul[id=agentlist]").select("li:nth-child(1)").select("div[class=desc]")
.select("div[class=phone]").text();
zu.setlxr(lxr);
}}

沒有讀取頁數，因為鏈家只展示100頁，此處我是按發布時間爬取的前50頁。

其實爬蟲最主要的是分析網頁結構但是，對於這個爬蟲最主要的是怎麼爬取第二級頁面，最主要的方案是 doc = jsoup.connect(allurl).get();

即使用jsoup自己的請求而不是使用httpclient的請求就行了。

爬取鏈家網房價資料

感覺最近做的東西好菜隨便了。d 鏈家房價資料.csv wt newline encoding utf8 writer csv.writer fp writer.writerow 樓盤名位址房間格式房間面積起價優點 defget html url try response requests...

Python爬取鏈家地鐵房資料

coding gbk 因為涉及到中文，utf 8會報錯環境 python 3.6 import requests import re import pandas as pd import csv from bs4 import beautifulsoup def generate allurl u...

使用Scrapy框架爬取鏈家資料

coding utf 8 import scrapy from pachong6.items import pachong6item class lianjiaspider scrapy.spider name lianjia allowed domains m.lianjia.com start ...

jsoup多級爬取鏈家租房資料

爬取鏈家網房價資料

Python爬取鏈家地鐵房資料

使用Scrapy框架爬取鏈家資料

相關推薦