python多執行緒爬蟲

先記錄一下，普通的糗事百科爬蟲：

import urllib.request
import re
import time
import urllib.error
headers=('user-agent','mozilla/5.0 (windows nt 10.0; win64; x64; rv:63.0) gecko/20100101 firefox/63.0')
opener=urllib.request.build_opener()
opener.addheaders=[headers]
urllib.request.install_opener(opener)
for i in range(1,3):
url=''+str(i)
pagedata=urllib.request.urlopen(url).read().decode('utf-8','ignore')
pat='.*?(.*?).*?
' datalist=re.compile(pat,re.s).findall(pagedata)
for j in range(0,len(datalist)):
print('第'+str(i)+'頁第'+str(j)+'幾個段子的內容是：')
print(datalist[j])

多執行緒爬蟲可以實現不同例項化的爬蟲類，同步進行處理。

import urllib.request
import re
import time
import urllib.error
import threading
headers=('user-agent','mozilla/5.0 (windows nt 10.0; win64; x64; rv:63.0) gecko/20100101 firefox/63.0')
opener=urllib.request.build_opener()
opener.addheaders=[headers]
urllib.request.install_opener(opener)
class one(threading.thread):
def __int__(self):
threading.thread.__init__(self)
def run(self):
for i in range(1,36,2):
url=''+str(i)
pagedata=urllib.request.urlopen(url).read().decode('utf-8','ignore')
pat='.*?(.*?).*?
' datalist=re.compile(pat,re.s).findall(pagedata)
for j in range(0,len(datalist)):
print('第'+str(i)+'頁第'+str(j)+'幾個段子的內容是：')
print(datalist[j])
class two(threading.thread):
def __int__(self):
threading.thread.__init__(self)
def run(self):
for i in range(0,36,2):
url=''+str(i)
pagedata=urllib.request.urlopen(url).read().decode('utf-8','ignore')
pat='.*?(.*?).*?
' datalist=re.compile(pat,re.s).findall(pagedata)
for j in range(0,len(datalist)):
print('第'+str(i)+'頁第'+str(j)+'個段子的內容是：')
print(datalist[j])
one = one()
one.start()
two=two()
two.start()

python爬蟲多執行緒爬蟲

在進行爬蟲工作的時候，考慮到爬蟲執行的速度慢，那麼怎樣提公升爬蟲的速度呢，那麼就得使用多執行緒爬蟲了，接下來我以糗事百科段子的爬取進行對多執行緒爬蟲的概述 github鏈結鏈結一不使用多執行緒爬取糗事百科 1.上 import urllib.request import re headers f...

python多執行緒爬蟲

python多執行緒爬蟲 python單執行緒爬蟲對於應付小規模資料是可以的，但是面對大量資料，我們就要用到多執行緒爬蟲技術。使用多執行緒，一方面可能會加快效率，另一方面可以施加一些小技巧，如不同的執行緒使用不同的 ip從而避免出發反爬機制。python 多執行緒 python的多執行緒可以用thr...

python多執行緒爬蟲

blog spider.py coding utf 8 author mr.luo date 2021 3 16 13 57 import requests import csv urls f for page in range 1,6 def craw url r requests.get url...

python多執行緒爬蟲

python爬蟲 多執行緒爬蟲

python多執行緒爬蟲

python多執行緒爬蟲

相關推薦

python爬蟲多執行緒爬蟲