協程下的爬蟲

from urllib import request
import gevent, time
from gevent import monkey //在沒有加上此句和下一句時，執行速度理論上是一樣的，因為gevent檢測不到i/o埠
monkey.patch_all()
def f(url):
print('get:%s'%url)
resp = request.urlopen(url)
data = resp.read()
print('%d bytes received from %s' % (len(data),url))
#用迴圈的方式爬蟲，也就時序列
urls = ['','']
start_time = time.time()
for url in urls:
f(url)
print('the asynchronous total time is '.format(time = time.time() - start_time))
#用協程方式爬蟲
async_time = time.time()
gevent.joinall([gevent.spawn(f,''),
gevent.spawn(f,''),
])print('the total time is '.format(time = time.time() - async_time))

執行的結果如下：

get:

48835 bytes received from

get:

498399 bytes received from

the total time is 12.665598630905151

get:

48835 bytes received from

498546 bytes received from

the asynchronous total time is 5.80000114440918

python 協程爬蟲

協程又叫微執行緒 python的多執行緒沒法利用多核，只能用乙個核去切換，沒辦法實現真正的並行效果。多執行緒的意義，對於io密集型是有意義的。大部分處理都是io的，多執行緒是可以解決大多數情況的。但是解決不了並行的多程序。協程非搶占式的程式，執行緒和程序都是搶占式的。協程也是要切換的，不過這種切...

多協程爬蟲

要實現非同步的爬蟲方式的話，需要用到多協程。同步的爬蟲方式爬取這8個 import requests,time 匯入requests和time start time.time 記錄程式開始時間 url list 把8個封裝成列表 for url in url list 遍歷url list r r...

13 爬蟲之協程

首先我們需要知道的是requests是同步的方法。而我們若想使用協程，寫的方法都盡量不是使用同步的方法。因些我們，選擇使用乙個新的模組庫 aiohttp 官網1.1 安裝pip install aiohttp1.2 快速開始import aiohttp loop.run until complete...

協程下的爬蟲

python 協程 爬蟲

多協程爬蟲

13 爬蟲之協程

相關推薦

python 協程爬蟲