爬取新浪微博

學到的東西。

1 習慣用logger，而不是用print

self.logger.debug('
{}開始解析
'.format(response.url))

2 習慣用正規表示式

這是在pipeline清理資料時用到的

s = '
5分鐘前
'if re.match('
\d+分鐘前
',s):
minute = re.match('
(\d+)
',s).group(1
) datetime = time.strftime('
%y-%m-%d %h:%m
',time.localtime(time.time()-float(minute)*60
)) print(datetime)

3 time模組都快忘乾淨了吧

4 eval的妙用，還可以這樣用！

注釋的是lowb**。如果欄位更多，更顯的low。

學到了

def parse_detail(self,response):
self.logger.debug(
'{}開始解析
'.format(response.url))
item =yqtem()
title = response.css('
body > div.wrap > div.mainbox > div.main2 > div.left > div.title > strong > a::text
').extract_first()
author = response.css('
body > div.wrap > div.mainbox > div.main2 > div.right > div.autherinfo > div.au_name > p:nth-child(2) > a::text
').extract_first()
popularity = response.css('
body > div.wrap > div.mainbox > div.main2 > div.left > div.num > table > tbody > tr > td:nth-child(2)::text
').extract_first()
count = response.css('
body > div.wrap > div.mainbox > div.main2 > div.left > div.num > table > tbody > tr > td:nth-child(4)::text
').extract_first()
# item[
'title
'] =title
# item[
'author
'] =author
# item[
'popularity
'] =popularity
# item[
'count
'] =count
for field in
item.fields:
item[field] =eval(field)
yield item

5 formrequest 的用法

data =
yield formrequest(url, callback=self.parse_index, formdata=data)

爬取新浪微博熱搜榜

一主題式網路爬蟲設計方案 15分 3.主題式網路爬蟲設計方案概述包括實現思路與技術難點本案例使用requests庫獲取網頁資料，使用beautifulsoup庫解析頁面內容，再使用pandas庫把爬取的資料輸出，並對資料視覺化，最後進行小結技術難點爬取有用的資料，將有礙分析的資料剔除，回歸...

Python爬取新浪微博評論資料，寫入csv檔案中

操作步驟如下 2.開啟m.weibo.cn 3.查詢自己感興趣的話題，獲取對應的資料介面鏈結 4.獲取cookies和headers coding utf 8 import requests import csvimport osbase url cookies headers path os.ge...

自動獲取cookie，爬取新浪微博熱門評論

目錄一前言二網盤 selenium僅僅用於獲取cookie，實際爬取將直接使用requests請求，以保證爬取效率話不多說，也不複雜，直接上了，關鍵的地方有注釋 import requests import selenium from selenium import webdriver ...

爬取新浪微博

爬取新浪微博熱搜榜

Python爬取新浪微博評論資料，寫入csv檔案中

自動獲取cookie，爬取新浪微博熱門評論

相關推薦