雪球網爬取資料並存入資料庫

from urllib import request
import json
import pymysql
class
mysql_connect
(object):
# 初始化的建構函式
def__init__
(self):
self.db = pymysql.connect(host='127.0.0.1',user='root',password='yao123',port=3306,database='pachong')
self.cursor = self.db.cursor()
# 執行修改操作
defmysql_do
(self,sql):
self.cursor.execute(sql)
self.db.commit()
# 結束函式
def__del__
(self):
self.cursor.close()
self.db.close()

url = ''
# 預設從第一頁開始爬取
defxueqiu
(number=1,max_id=none,count=none):
if max_id is
none:
full_url = url.format(-1,10)
else:
full_url = url.format(max_id,count)
count = 15
headers = 
# 最大頁碼數
if number<=4:
print('第%d頁:'%number)
number += 1
req = request.request(full_url,headers=headers)
response = request.urlopen(req)
result = response.read().decode('utf-8')
# json處理
j = json.loads(result)
m = mysql_connect()
for i in j['list']:
detail = json.loads(i['data'])
print(i['id'],detail['title'])
description = detail['description']
# 此處sql語句description有特殊字元會轉義sql語句,只能傳入前幾條語句,所以進行為none處理
sql = 'insert into snowball values ("{}","{}","{}","{}");'.format(detail['id'],detail['title'],none,detail['target'])
m.mysql_do(sql)
print(j['list'][0])
xueqiu(number,j['list'][-1]['id'],count)
if __name__ == '__main__':
xueqiu(1,-1,10)

爬取拉勾網資料，並存入Mongodb資料庫

import time import pymongo import requests from bs4 import beautifulsoup 簡歷資料庫連線 client pymongo.mongoclient localhost 27017 mydb client mydb lagou myd...

爬取京東商品評論並存入資料庫（二）

前言上篇我詳細說明了爬取得過程，這裡就不過多解釋，直接上爬取得簡單明瞭 import requests import json import pymysql comment url def jd page params headers comment resp requests.get url ...

使用scrapy框架爬取資料並存入excel表中

爬取爬取目標獲得乙個地區七天之內的天氣狀況,並存入excel 中爬蟲檔案部分 import scrapy from items import tianqiyubaoitem class tianqispider scrapy.spider name tianqi allowed domains...

雪球網爬取資料並存入資料庫

爬取拉勾網資料，並存入Mongodb資料庫

爬取京東商品評論並存入資料庫（二）

使用scrapy框架爬取資料並存入excel表中

相關推薦