爬蟲實戰天氣網合肥地區天氣資料

抓取天氣網中合肥地區11年到18年所有天氣資料，並對輸出的資料進行視覺化。

目標**：

import
requests
import
refrom bs4 import
beautifulsoup
import
time
defget_response(url):
try:
r =requests.get(url)
r.raise_for_status()
return
r.text 
except
: 
print('')#
獲取每年中每月的資料的鏈結，並將其儲存在u_list裡面 
defget_year_href(r):
u_list =
soup = beautifulsoup(r,'
lxml')
res = soup.find('
div','
tqtongji1
').contents
a_soup = beautifulsoup(str(res),'
lxml')
a = a_soup.find_all('a'
) 
for each ina:'
href'))
return
u_list
#根據獲得每一月的鏈結，分別抓取每個鏈結對應的資料 
defwrite_month_weather(url):
r =requests.get(url).text
soup = beautifulsoup(r,'
lxml')
#儲存每一天的資料的div，class = tqtongji2
weather_res = soup.find('
div',class_ = '
tqtongji2')
#進一步解析weather_res
ul_soup = beautifulsoup(str(weather_res),'
lxml')
ul_list = ul_soup.find_all('ul'
) 
"""2023年12月的url:
可以使用split()方法獲得201712.html，
再用正規表示式獲取201712進行簡單的檔案命名：
date = re.compile(r'[\d]*')
mo = date.search(url.split('/')[-1])
file_name = mo.group() + '.csv'
"""date = re.compile(r'
[\d]*')
mo = date.search(url.split('
/')[-1])
file_name = mo.group() + '
.csv'#
將每個月的資料儲存到excel檔案當中
with open(file_name,'w+'
) as f:
for ul in
ul_list:
li_soup = beautifulsoup(str(ul),'
lxml')
li_list = li_soup.find_all('li'
) 
try:
for each in
li_list:
"""執行**後在抓取到2023年08月的資料之後報錯：
typeerror: unsupported operand type(s) for +: 'nonetype' and 'str'
這是因為從2023年08月開始，後面的部分資料可能出現為none型別的情況。
我的處理如下：
（1）將none改為'空'
if each.string == none:
each.string = '空'
else:
pass
（2）直接使用python的異常處理忽略掉。
"""#
用逗號分隔
str_w = each.string + ','
f.write(str_w)
except
: f.write('\n
')#continue要在for迴圈裡面使用
continue
f.write('\n
')defmain():
url = "
"r =get_response(url)
u_list =get_year_href(r)
for i in
range(len(u_list)):
write_month_weather(u_list[i])
print
(u_list[i])
time.sleep(1)
main()

二對生成的資料進行視覺化

使用柱狀圖和折線圖分別表示（以201607的資料為例）

#
-*- coding: gbk-*- #
#編碼問題
import
pandas as pd
import
matplotlib.pyplot as plt
import
numpy as np
"""使用中文做label的時候，會出無法顯示的問題，中文標籤全部變成了小方格。
產生中文亂碼的原因就是字型的預設設定中並沒有中文字型，
所以我們只要手動新增中文字型的名稱就可以了。
"""from pylab import *mpl.rcparams[
'font.sans-serif
'] = ['
simhei
'] 
##a = np.loadtxt(open("201607.csv","rb"),delimiter=",") 
file = pd.read_csv('
201607.csv
',encoding="
gbk")#
分別獲取最高氣溫和最低氣溫
df1 = file.iloc[:,1:2]
df2 = file.iloc[:,2:3]
#獲取折線圖的橫縱座標
x =df1.index
y1 = df1[u"
最高氣溫"]
y2 = df2[u"
最低氣溫"]
"""也可以使用柱狀圖表示溫差，**如下：
plt.bar(x,y1,width=0.8,facecolor="blue",edgecolor="white")
plt.bar(x,y2,width=0.8,facecolor="#9999ff",edgecolor="white")
plt.show()
"""plt.title(
'晝夜溫差
') 
plt.xlabel(
'data size
') 
plt.ylabel(
'time(s)')
plt.plot(x, y1,'r
', label='
最高氣溫
') 
plt.plot(x, y2,'b
',label='
最低氣溫')
plt.legend(bbox_to_anchor=[0.3, 1]) 
plt.grid() 
plt.show()

執行結果如下圖：

（1）柱狀圖

（2）折線圖

Python爬取中國天氣網天氣資料

由於一些需要，想要獲取今天的天氣資料，於是又撿起了python寫了個爬蟲用來獲取中國天氣網上的氣象資料。由於我需要的資料比較簡單，因為我只需要北京地區當天的溫度最低溫度和最高溫度和天氣，因此部分比較簡單，下面就來講講這個爬取的過程。第一步網頁分析要進行爬蟲設計，首先得分析網頁的請求過程。首...

Python 爬蟲，爬取歷史天氣資料

先上原始碼這次用的是beautifulsoup，解析html,非常的便捷 import datetime import pandas as pd import re import requests import time from bs4 import beautifulsoup headers ...

簡單的API爬蟲和風天氣資料獲取

不知不覺，我已經是快大三了，接觸過c,c python。但感覺什麼都會一點點，但細想又什麼都不會，最後下定決心開始走上精通python的路。從爬蟲到資料分析，最後到機器學習。希望每一天都有收穫。今天呢，我就開始系統的學習爬蟲。簡單來說，就是乙個介面，你可以通過攜帶一些引數來訪問這個介面獲得你想要的資...

爬蟲實戰 天氣網合肥地區天氣資料

Python爬取中國天氣網天氣資料

Python 爬蟲，爬取歷史天氣資料

簡單的API爬蟲 和風天氣資料獲取

相關推薦

爬蟲實戰天氣網合肥地區天氣資料

簡單的API爬蟲和風天氣資料獲取