scrapy splash抓取動態資料例子四

本例子用scrapy-splash抓取微眾圈**給定關鍵字抓取諮詢資訊。

給定關鍵字：打通；融合；電視

抓取資訊內如下：

1、資訊標題

2、資訊鏈結

3、資訊時間

4、資訊**

針對上面的**資訊，來進行抓取

1、首先抓取資訊列表

抓取**：sels = site.xpath('//li[@class="itemtitle"]')

2、抓取標題

抓取**：titles = sel.xpath('.//text()')

title =str(titles[1].extract())

3、抓取鏈結

抓取**：url = ''+ str(sel.xpath('.//a/@href')[0].extract())

4、抓取日期

抓取**：flag,date =self.date_isvalid(str(titles[2].extract()))

5、抓取**

抓取**：sources = site.xpath('//div[@class="wxshare"]/span/a/text()')

#request需要封裝成splashrequest

defstart_requests(self):

for url in

self.start_urls:

index = url.rfind('='

)

yield

splashrequest(url

, self.parse

, args=,

meta=

)defcomapre_to_days(self,leftdate, rightdate):

'''比較連個字串日期，左邊日期大於右邊日期多少天

:param leftdate: 格式：2017-04-15

:param rightdate: 格式：2017-04-15

:return: 天數

'''l_time = time.mktime(time.strptime(leftdate, '

%y-%m-%d'))

r_time = time.mktime(time.strptime(rightdate, '

%y-%m-%d'))

result = int(l_time - r_time) / 86400

return

result

defdate_isvalid(self, strdatetext):

'''判斷日期時間字串是否合法：如果給定時間大於當前時間是合法，或者說當前時間給定的範圍內

:param strdatetext: 四種格式 '2小時前'; '2天前' ; '昨天' ;'2017.2.12 '

:return: true:合法；false:不合法

'''currentdate = time.strftime('

%y-%m-%d')

datepattern = re.compile(r'

\d-\d-\d')

strdate =re.findall(datepattern, strdatetext)

if len(strdate) == 1:

if self.comapre_to_days(currentdate,strdate[0])==0:

return

true,currentdate

return false, ''

defparse(self, response):

site =selector(response)

keyword = response.meta['

keyword']

sels = site.xpath('

//li[@class="itemtitle"]')

for sel in

sels:

titles = sel.xpath('

.//text()')

title =str(titles[1].extract())

flag,date =self.date_isvalid(str(titles[2].extract()))

if flag and title.find(keyword)>-1:

url = '

'+ str(sel.xpath('

.//a/@href

')[0].extract())

yield

splashrequest(url

, self.parse_item

, args=,

meta=

)defparse_item(self, response):

site =selector(response)

it =splashtestitem()

it['title

'] = response.meta['

title']

it['url

'] = response.meta['

url'

] it[

'date

'] = response.meta['

date']

it['keyword

'] = response.meta['

keyword']

sources = site.xpath('

//div[@class="wxshare"]/span/a/text()')

if len(sources)>0:

it['source

'] =sources[0].extract()

return it

scrapy splash抓取動態資料例子四

本例子用scrapy splash抓取微眾圈給定關鍵字抓取諮詢資訊。給定關鍵字打通融合電視抓取資訊內如下 1 資訊標題 2 資訊鏈結 3 資訊時間 4 資訊針對上面的資訊，來進行抓取 1 首先抓取資訊列表抓取 sels site.xpath li class itemtitle 2 ...

scrapy splash抓取動態資料例子八

本例子用scrapy splash抓取介面給定關鍵字抓取諮詢資訊。給定關鍵字個性化融合電視抓取資訊內如下 1 資訊標題 2 資訊鏈結 3 資訊時間 4 資訊針對上面的資訊，來進行抓取 1 首先抓取資訊列表抓取 sels site.xpath div contains class,ne...

scrapy splash基本使用

1.scrapy splash是scrapy的乙個元件 2.scrapy splash的作用 scrpay splash能夠模擬瀏覽器載入js，並返回js執行後的資料 3.scrapy splash的環境安裝 3.1 使用splash的docker映象 splash的dockerfile 觀察發現s...

scrapy splash抓取動態資料例子四

scrapy splash抓取動態資料例子四

scrapy splash抓取動態資料例子八

scrapy splash基本使用

相關推薦