用xpath爬取58同城二手房並存入csv檔案中

2021-10-07 03:18:24 字數 2306 閱讀 7877

import requests

from lxml import etree

import csv

#爬取58同城二手房

if __name__ ==

"__main__"

: headers =

fp =

open

('d:/python/58.csv'

,'w'

, encoding=

'utf8'

, newline='')

csv_writer = csv.writer(fp)

csv_writer.writerow(

["詳情"

,"面積"

,"**(萬)"

,"小區"

,"位置"])

for i in

range(1

,2):

url =

""+str

(i)+

"/?pgtid=0d30000c-0000-1993-57e1-4d75c7a389ad&clickid=1"

#爬取頁面原始碼資料

page_text = requests.get(url=url,headers = headers)

.text

#資料解析

tree = etree.html(page_text)

#載入到etree物件中

#儲存的就是li標籤物件

li_list = tree.xpath(

'//ul[@class="house-list-wrap"]/li'

)print

(li_list)

#迴圈每個li標籤

fp =

open

('58.txt'

,'w'

,encoding=

'utf8'

)#建立乙個文字檔案

for li in li_list:

a_url = li.xpath(

'./div[2]/h2/a/@href')[

0]page_text1 = requests.get(url=a_url, headers=headers)

.text

tree1 = etree.html(page_text1)

li_list1 = tree1.xpath(

'//ul[@class="house-basic-item3"]'

)#li.xpath('//div[@class="list-info"]/h2/a')

title = li.xpath(

'./div[2]/h2/a/text()')[

0]mianji = li.xpath(

'./div[2]/p/span[2]/text()')[

0]jiage = li.xpath(

'./div[3]/p/b/text()')[

0]jiage1 = li.xpath(

'./div[3]/p/text()')[

0]# print(title, mianji, jiage, jiage1)

for li1 in li_list1:

xiaoqu = li1.xpath(

'./li[1]/span[2]/a/text()')if

len(xiaoqu)

>0:

xiaoqu = xiaoqu[0]

.replace(

"\n ",""

);else

: xiaoqu =

"null"

weizhi = li1.xpath(

'./li[2]/span[2]/a/text()')if

len(weizhi)

>0:

weizhi = weizhi[0]

.replace(

"\n ",""

);else

: weizhi =

"null"

csv_writer.writerow(

[title, mianji, jiage,xiaoqu,weizhi]

)print

(title, mianji, jiage,xiaoqu,weizhi)

print

("第%s頁爬取完畢"

%i)print

("檔案儲存完畢"

)

Python爬取58同城二手房資訊的標題名稱

今天,我們用python來爬取58同城頁面二手房資訊的資料。首先開啟 爬取頁面原始碼資料 page text requests.get url url,headers headers text 資料解析 tree etree.html page text 儲存li標籤物件 li list tree....

爬取二手房資訊

開源到github了 專案位址 基於springboot,idea 匯入依賴 org.jsoupgroupid jsoupartifactid 1.10.2version dependency 資料放入redis中,引人redis org.springframework.bootgroupid sp...

爬取58二手房的放原標題

import requests from bs4 import beautifulsoup import re from lxml import etree import time 需求 爬取58二手房的 資訊 if name main headers 爬取到頁面原始碼資料 url page tex...