python 協程爬蟲爬取豆瓣電影top250

from lxml import etree #html解析庫
from time import time #獲取時間
import asyncio #協程庫
import aiohttp #協程http請求

url =
""headers =

#主要做html頁面抓取
async
deffetch_content
(url)
:await asyncio.sleep(2)
# 防止請求過快 等待2秒
)

#主要做html頁面解析
async
defparse
(url)
: page=
await fetch_content(url)
html = etree.html(page)
xpath_movie =
'//*[@id="content"]/div/div[1]/ol/li'
xpath_title =
'.//span[@class="title"]'
xpath_pages =
'//*[@id="content"]/div/div[1]/div[2]/a'
pages = html.xpath(xpath_pages)
# 所有頁面的鏈結都在底部獲取
fetch_list =
#此定義主要用於組裝所有頁面完整鏈結
result =
#此定義主要用於版塊解析
for element_movie in html.xpath(xpath_movie)
:for p in pages:
"href"))
# 解析翻頁按鈕對應的鏈結 組成完整後邊頁面鏈結 
tasks =
[fetch_content(url)
for url in fetch_list]
# 並行處理所有翻頁的頁面
#併發執行
pages =
await asyncio.gather(
*tasks)
for page in pages:
html = etree.html(page)
for element_movie in html.xpath(xpath_movie)
:for i, movie in
enumerate
(result,1)
: title = movie.find(xpath_title)
.text
print
(i,title)

async
defmain()
: start = time(
)await parse(url)
end = time(
)print
("cost {} seconds"
.format
((end - start)/5
))if __name__ ==
"__main__"
:await main(
)

協程爬取整站豆瓣網路

爬取豆瓣網路思路 coding utf8 from gevent import monkey monkey.patch all 用於隨機獲取請求頭，用法random.choice list 會從列表中隨機取出乙個元素 import random 用法 urljoin base url,result ...

python爬蟲之scrapy爬取豆瓣電影（練習）

開發環境 windows pycharm mongodb scrapy 任務目標任務目標爬取豆瓣電影top250 將資料儲存到mongodb中。items.py檔案 coding utf 8 define here the models for your scraped items see d...

Python爬蟲爬取豆瓣電影（二）

檢視上乙個專案，請看上乙個專案中獲取到了一定數量的電影url資訊，這次來獲取單個電影的電影詳情。對傳遞的url返回乙個名為soup的beautifulsoup物件 defget url html soup url header request body.get header proxies req...

python 協程爬蟲 爬取豆瓣電影top250

協程爬取整站豆瓣網路

python爬蟲之scrapy爬取豆瓣電影（練習）

Python爬蟲 爬取豆瓣電影（二）

相關推薦

python 協程爬蟲爬取豆瓣電影top250

Python爬蟲爬取豆瓣電影（二）