Python爬網獲取全國各地律師電話號

2022-01-18 07:59:01 字數 3141 閱讀 1752

[本文出自天外歸雲的]

從64365**獲取全國各地律師**號,用到了python的lxml庫進行對html頁面內容的解析,對於xpath的獲取和正確性校驗,需要在火狐瀏覽器安裝firebug和firepath外掛程式。頁面內容如下(目標是爬「姓名+**」):

**如下:

#

coding:utf-8

from lxml import

etree

import

requests,lxml.html,os

class

myerror(exception):

def__init__

(self, value):

self.value =value

def__str__

(self):

return

repr(self.value)

defget_lawyers_info(url):

r =requests.get(url)

html =lxml.html.fromstring(r.content)

phones = html.xpath('

//span[@class="law-tel"]')

names = html.xpath('

//div[@class="fl"]/p/a')

if(len(phones) ==len(names)):

list(zip(names,phones))

phone_infos = [(names[i].text, phones[i].text_content()) for i in

range(len(names))]

else

: error = "

lawyers amount are not equal to the amount of phone_nums:

"+url

raise

myerror(error)

phone_infos_list =

for phone_info in

phone_infos:

if(phone_info[1] == ""

):

#print phone_info[0],u"沒留**"

info = phone_info[0]+"

: "+u"

沒留**\r\n"#

print phone_info[0],phone_info[1]

else

: info = phone_info[0]+"

: "+phone_info[1]+"

\r\n

"print

info

return

phone_infos_list

defget_pages_num(url):

r =requests.get(url)

html =lxml.html.fromstring(r.content)

result = html.xpath('

//div[@class="u-page"]/a[last()-1]')

pages_num =result[0].text

ifpages_num.isdigit():

return

pages_num

defget_all_lawyers(cities):

dir_path = os.path.abspath(os.path.dirname(__file__

))

print

dir_path

file_path = os.path.join(dir_path,"

lawyers_info.txt")

print

file_path

ifos.path.exists(file_path):

os.remove(file_path)

#input()

with open("

lawyers_info.txt

","ab

") as file:

for city in

cities:

#file.write("city:"+city+"\n")

#print city

pages_num = get_pages_num("

"+city+"

/lawyer/page_1.aspx")

ifpages_num:

for i in

range(int(pages_num)):

url = "

"+city+"

/lawyer/page_

"+str(i+1)+"

.aspx

"info =get_lawyers_info(url)

for each in

info:

file.write(each.encode(

"gbk"))

if__name__ == '

__main__':

cities = ['

beijing

','shanghai

','guangdong

','guangzhou

','shenzhen

','wuhan

','hangzhou

','ningbo

','tianjin

','nanjing

','jiangsu

','zhengzhou

','jinan

','changsha

','shenyang

','chengdu

','chongqing

','xian']

get_all_lawyers(cities)

這裡對熱門城市進行了爬網,輸入結果如下(儲存到了當前目錄下的「lawyers_info.txt」檔案中):

Python爬網獲取全國各地律師電話號

本文出自天外歸雲的 從64365 獲取全國各地律師 號,用到了python的lxml庫進行對html頁面內容的解析,對於xpath的獲取和正確性校驗,需要在火狐瀏覽器安裝firebug和firepath外掛程式。頁面內容如下 目標是爬 姓名 如下 coding utf 8 from lxml imp...

全國各地DNS(電信,移動,聯通,教育網)

202.96.199.133 202.96.0.133 202.106.0.20 202.106.148.1 202.97.16.195 202.96.199.132 202.96.199.133 202.96.209.5 202.96.209.6 202.96.209.133 202.99.96....

全國各地的美女都是怎樣的

全國各地的美女都是怎樣的?2011年05月13日 全國各地的美女都是怎樣的?北京姑娘 華貴 美麗評價 中上 天津姑娘 清麗 美麗評價 中上 河北姑娘 文靜 美麗評價 中 山西姑娘 英武 美麗評價 中上 河南姑娘 勤勞 美麗評價 下 山東姑娘 直爽 美麗評價 中 內蒙姑娘 豪邁 美麗評價 下 遼寧姑娘...