Python 爬蟲練習專案非同步載入爬取

專案**

from bs4 import beautifulsoup
import requests
url_prefix = ''
infos = 
# 獲取單個頁面資料
def getapage(url,data = none):
web_data = requests.get(url)
soup = beautifulsoup(web_data.text,'lxml')
# print(soup)
images = soup.select('header > a > img')
titles = soup.select('section > h4 > a')
links = soup.select('a.cover-inner')
likes = soup.select('span.fanciers_count')
if data == none:
for image,title,link,like in zip(images,titles,links,likes):
data = 
print(data)
# 獲取多個載入的資料
def getmorepages(start,end):
for url_suffix in range(start,end):
getapage(url_prefix + str(url_suffix))
print('---------------已經獲取{}條資料---------------'.format(len(infos)), sep='\n')
# 獲取點讚排名前幾的資料
def getinfosbylikes(order,infos =infos):
infos = sorted(infos,key= lambda info:info['like'],reverse = true)
for info in infos[:order]:
print(info['like'],info['title'],info['image'],info['link'])
getmorepages(1,4)
getinfosbylikes(5)

專案特點：

【**】同步載入、非同步載入、延遲載入

爬取的**鏈結

knewone

python爬蟲非同步爬蟲

壞處無法無限制的開啟多執行緒或者多程序。執行緒池程序池適當使用使用非同步實現高效能的資料爬取操作人多力量大環境安裝 pip install aiohttp 使用該模組中的clientsession 2表示同時存在兩個協程 pool pool 2 urls for i in range 1...

python爬蟲練習

目錄通用爬蟲聚焦爬蟲聚焦爬蟲是根據指定的需求抓取網路上指定的資料。例如獲取豆瓣上電影的名稱和影評，而不是獲取整張頁面中所有的資料值。增量式爬蟲通過爬蟲程式監測某資料更新的情況，以便可以爬取到該更新出的新資料。1 發起請求使用http庫向目標站點發起請求，即傳送乙個request req...

python簡單爬蟲練習

開始學爬蟲了，記錄一下這兩天的瞎鼓搗先從最簡單的來，指定乙個url，把整個網頁抓下來，這裡就拿csdn的主頁實驗 coding utf 8 from urllib import request url html request.urlopen url 注意這裡要以utf 8編碼方式開啟 with...

Python 爬蟲練習專案 非同步載入爬取

python爬蟲 非同步爬蟲

python爬蟲 練習

python簡單爬蟲練習

相關推薦

Python 爬蟲練習專案非同步載入爬取

python爬蟲非同步爬蟲

python爬蟲練習