使用python獲取天氣網頁上的內容，並存入資料庫

使用python獲取天氣**中的內容，並存入資料庫

例如：獲取中的城市，lever，quality，pm2.5 中的內容**

**是這樣的：

**原始碼：

"自己的資料庫密碼"

,host=

"localhost"

)#連線到本地資料庫

cursor=conn.cursor(

)cursor.execute(

"create database if not exists pm25"

)#建立資料庫pm25

conn=mysql.connector.connect(user=

"root"

,password=

"自己的資料庫密碼"

,host=

"localhost"

,database=

"pm25"

)#連線到資料庫pm25

cursor=conn.cursor(

)#建立乙個cursor物件，以便用它的方法執行sql**

cursor.execute(

"create table if not exists table_pm25 (no smallint primary key not null ,sitename char(20) not null ,level varchar(10) ,quality char(4),pm25 double)"

)#執行sql**，建立乙個喊有no,sitename等字段的表

conn.commit(

)#重新整理

url=

""#確定要爬取資料的**

htmltext=requests.get(url)

.text.encode(

"utf-8-sig"

)#獲取原始碼，並制定編碼格式

md5=hashlib.md5(htmltext)

.hexdigest(

)#獲取判斷該**是否更新的md5編碼

ifnot os.path.exists(

"pm25.txt"):

#如果當前檔案所在的目錄不存在pm25.txt 文字檔案，則建立乙個（用來儲存md5編碼）

f=open

("pm25.txt"

,"w")#

f.write(md5)

#在文字檔案中寫入md5

f.close()#

else

:with

open

("pm25.txt"

,"r"

)as f:

#如果存在pm25.txt檔案，則讀取其中的上次存入的md5碼，並存入新的md5碼

old_md5=f.read(

) f=

open

("pm25.txt"

,"w"

) f.write(md5)

f.close(

)if md5 != old_md5:

#比較新舊md5碼，如果不同，則表示網頁已更新

("old_md5={}"

.format

(old_md5)

)print

("new_md5={}"

.format

(md5)

)print

("資料已更新，正在獲取....."

) cursor.execute(

"delete from table_pm25"

)#刪除資料表中所有資料

conn.commit(

) sp=beautifulsoup(htmltext,

"html.parser"

) content=sp.text #獲取所有html標籤內的內容（不包括標籤）

content=re.findall(

r"aqidata[^$]+欄位說明"

,content)

#用排除法[^$] 獲取固定字元間的內容

strs=re.findall(

r"",

str(content)

)#因為該原始碼內容較複雜，所以要用正規表示式多進行幾次內容的篩選

strs=re.findall(

r"^]+}"

,str

(strs)

)#最後得到的內容恰好是符合python的字典型別的資料

n=1#作為對每行資料的編號

forstr

in strs:

#獲取每一行的資料，

jor=ast.literal_eval(

str)

#轉變為自典型

("城市：{} lever:{} quality:{} pm2.5:{}"

.format

(jor[

"city"

],jor[

"level"

],jor[

"quality"

],jor[

"pm2_5"])

)#輸出，並將其插入到表中

sql=

"insert into table_pm25 values(%d,'%s','%s','%s',%f)"

cursor.execute(sql %

(n,jor[

"city"

],jor[

"level"

],jor[

"quality"

],jor[

"pm2_5"])

) n+=

1 conn.commit(

)else

:#如果網頁未更新，則取出之前儲存的資料，並輸出

("資料未更新...."

) cursor.execute(

"select * from table_pm25"

) jor=cursor.fetchall(

)for

strin jor:

("城市：{} lever:{} quality:{} pm2.5:{}"

.format

(str[1

],str[2]

,str[3

],str[4]

))cursor.close(

)conn.close(

)最後的輸出結果：

插入在資料庫中的內容：

為了做這個任務，還特意去學了學資料庫，一步乙個坑。萬事開頭難，剛開始啥都不懂，光是連線資料庫就耗了半天，後來慢慢找教程，思路漸漸就開始清晰了。個人感覺搞清楚連線資料庫和sql語句的基本用法後，實驗就完成了一半了，剩下就是爬取**啊，獲取指定的字元這些。因為不同的網頁架構不一樣，有些網頁的**簡潔漂亮，爬起來那叫個歡暢，有些網頁的就…比如這次這個算是比較亂的，所以想要排除掉其他的多餘字元，獲取自己想要的那部分還是花了我很大一部分時間。

使用python獲取天氣網頁上的內容，並存入資料庫

Python爬蟲學習，抓取網頁上的天氣資訊

使用介面獲取天氣

python網頁爬蟲之天氣查詢

使用python獲取天氣網頁上的內容，並存入資料庫

Python爬蟲學習，抓取網頁上的天氣資訊

使用介面獲取天氣

python網頁爬蟲之天氣查詢

相關推薦