Python爬取安居客經紀人資訊

python2.7.15

今天我們來爬取安居客經紀人的資訊。這次我們不再使用正則，我們使用beautifulsoup。不了解的可以先看一下這個文件，便於理解。

for page in range(1,8):
url ="" + str(page)+"/"
response = urllib2.urlopen(url)
content = response.read()

老套路urllib2

首先看原始碼，找到經紀人資訊對應的標籤，然後使用beautifulsoup方法，這裡的html.parser是對應的解析器

soup = beautifulsoup(content,'html.parser')
a = soup.find_all('h3')
b = soup.find_all(class_=re.compile("brokercard-sd-cont clearfix"))
c = soup.find_all("p", attrs=)
d = soup.find_all("p", attrs=)
e = soup.find_all(class_=re.compile("broker-tags clearfix"))

a,b,c,d,e分別對應經紀人姓名，評價，門店，熟悉，業務

每一項都是列表

將它們迴圈輸出

n = 0
for jjr in a:
o = jjr.get_text(strip=true).encode('utf-8')
p = b[n].get_text(strip=true).encode('utf-8')
q = c[2*n].get_text(strip=true).encode('utf-8')
r = d[n].get_text(strip=true).encode('utf-8')
s = e[n].get_text(strip=true).encode('utf-8')
n+=1

這裡要注意編碼問題，使用beautifulsoup解析後的文件是unicode編碼，直接輸出會亂碼，而且這個編碼模式也無法寫入文件或資料庫，所以後面要加上encode(『utf-8』)來重新編碼

insert_agent = ("insert into agent(姓名,評價,門店,熟悉,業務)" "values(%s,%s,%s,%s,%s)")
data_agent = (o,p,q,r,s)
cursor.execute(insert_agent, data_agent)

記得先建立資料庫連線，和要寫入的表

# coding=utf-8
from bs4 import beautifulsoup
import urllib2
import re
import mysqldb
conn=mysqldb.connect(host="127.0.0.1",user="root",passwd="199855pz",db="pz",charset='utf8')
print '連線成功'
cursor = conn.cursor()
cursor.execute("drop table if exists agent")
sql = '''create table agent(姓名 char(4) ,評價 char(50) ,門店 char(50) ,熟悉 char(50) ,業務 char(50))'''
cursor.execute(sql)
for page in range(1,8):
url ="" + str(page)+"/"
response = urllib2.urlopen(url)
content = response.read()
soup = beautifulsoup(content,'html.parser')
a = soup.find_all('h3')
b = soup.find_all(class_=re.compile("brokercard-sd-cont clearfix"))
c = soup.find_all("p", attrs=)
d = soup.find_all("p", attrs=)
e = soup.find_all(class_=re.compile("broker-tags clearfix"))
n = 0
for jjr in a:
o = jjr.get_text(strip=true).encode('utf-8')
p = b[n].get_text(strip=true).encode('utf-8')
q = c[2*n].get_text(strip=true).encode('utf-8')
r = d[n].get_text(strip=true).encode('utf-8')
s = e[n].get_text(strip=true).encode('utf-8')
n+=1
insert_agent = ("insert into agent(姓名,評價,門店,熟悉,業務)" "values(%s,%s,%s,%s,%s)")
data_agent = (o,p,q,r,s)
cursor.execute(insert_agent, data_agent)
conn.commit()

ps.安居客更新了，原始碼有一些變動，但爬取資訊還是老方法。

安居客資訊爬取

本篇是我第一次利用bs寫的爬蟲爬取每頁的變數是p後的數字，可能因為這是老早之前寫的了，所以現在一看，發現並沒有什麼難的，掌握基本要素即可。廢話不多說，直接上吧！encoding utf8 import re import urllib import urllib2 from bs4 imp...

用bs爬取安居客優秀經理人資訊

目標 1 安居客二手房，隨便乙個城市，選擇優秀經紀人，爬取所有頁數過程 1 在網頁源中找到所需要的資料 2 用bs提取出所需要的資訊，如下 encoding utf8 import re import urllib import urllib2 from bs4 import beautiful...

爬取安居客的資訊，並儲存到csv檔案中。

引入包 import requests from bs4 import beautifulsoup import time import csv 定製請求頭換成自己的請求頭 headers 輸出查詢資訊 chaxun input 請輸入要查詢的城市將要訪問的訪問該 r requests.get...

Python爬取安居客經紀人資訊

安居客資訊爬取

用bs爬取安居客優秀經理人資訊

爬取安居客的資訊，並儲存到csv檔案中。

相關推薦