練習將網頁抓取的內容通過正則篩選傳進資料庫

import pymysql
import re
import time
class
catchstockindexdata
(object):
def__init__
(self,fund_data,index_name,index_code)
: self.id=
0 self.index_name=index_name
self.index_code=index_code
self.fund_data=fund_data 
defre_handle_funddata
(self)
: re_template=r">([^\：\s"
"]+)<"
ret=re.findall(re_template,self.fund_data)
return ret
defcombine_fund_datalist
(self,catched_data)
: name_code_list=
[self.
id,self.index_code,self.index_name]
catch_time=
[str
(time.ctime())
] fund_data_list=name_code_list+catched_data+catch_time
return fund_data_list
defthread_data_to_mysql
(self,combined_data_list)
: conn=pymysql.connect(host=
'localhost'
,port=
3306
,user=
'root'
,password=
'mysql'
,database=
'stock_info'
,charset=
'utf8'
) cc=conn.cursor(
) 
sql=
"""insert into stock_information value(%s,%s,%s,%s,%s,%s,%s,%s,%s)"""
cc.execute(sql,combined_data_list)
conn.commit(
)print
("資料插入成功"
) cc.close(
) conn.close(
)def
run(self)
:# 1. 處理網頁動態資料
catched_data=self.re_handle_funddata(
)# 2. 整合指數資料列表
combined_data_list=self.combine_fund_datalist(catched_data)
# 3. 將整合後列表中的資料傳給資料庫
self.thread_data_to_mysql(combined_data_list)
if __name__==
"__main__"
: 
fund_data=
"""今開：
3134.75
最高：3138.46
漲跌幅：
-0.43%
換手：0.47%
成交量：
1.65億手
""" eastmoney_shangzhen50=catchstockindexdata(fund_data,index_name=
"上證50"
,index_code=
"000016"
) eastmoney_shangzhency=catchstockindexdata(fund_data,index_name=
"創業板指數"
,index_code=
"399006"
) eastmoney_shangzhen300=catchstockindexdata(fund_data,index_name=
"滬深300"
,index_code=
"000300"
) eastmoney_shangzhena=catchstockindexdata(fund_data,index_name=
"a股指數"
,index_code=
"000002"
) 
eastmoney_shangzhena.run(
)

備註：網頁內容是手動複製的，好low(≧▽≦)/

通過Url抓取網頁內容

近來想學習一下網頁抓取技術,監於之前沒有這方面的基礎，都只是在socke方面的程式設計，對http方面了解很少，現在到個較好的入門例子，共享學習一下，如果大家以前看過的話，就當是複習吧。還希望高手可以指導一下如何學習這方面的內容，給點指引。using system using system.text...

用於抓取網頁內容的常用正則

下面列出在抓取網頁中常用的正則規則，其中 content代表網頁內容，tmparray為抓取的結果儲存陣列。抓取html中css裡背景位址 preg match all background background image url i content,tmparray 抓取html中標籤中的位址 ...

用於抓取網頁內容的常用正則

下面列出在抓取網頁中常用的正則規則，其中 content代表網頁內容，tmparray為抓取的結果儲存陣列。抓取html中css裡背景位址 preg match all background background image url i content,tmparray preg match all...

練習將網頁抓取的內容通過正則篩選傳進資料庫

通過Url抓取網頁內容

用於抓取網頁內容的常用正則

用於抓取網頁內容的常用正則

相關推薦