打造乙個輕量級企業基本資訊採集框架（四）

我們話不多說，show your code！！！

def
req_data
(url,data,page,keyword,timeout=
10,proxies=
none):
""" 用來請求採集**
:param data: 請求引數
:param url: url
:param page: 頁數
:return: html
"""count =
1while count <6:
try:
req = requests.post(url+
str(page)
,headers=get_header(
),timeout=timeout,data=data)
if req.status_code in valid_status_code and
len(req.text)
>
100:
return req.text
else
:# logger.error("正在重試請求 第" + str(count) + "次" )
count +=
1 time.sleep(random.randint(1,
2))except
: count +=
1 time.sleep(random.randint(1,
2))else
: logger.debug(
+ keyword +
" 第%s頁採集失敗"
%str
(page)
)return
none

簡單做下說明。定義了乙個req_data函式，主要傳入四個引數，上面都有描述，**部分主要是對每乙個請求做5次嘗試，如果返回的狀態碼不是200和201，且其返回值長度小於100，我們就重新嘗試，如果5次還是這樣，就直接返回none值，如果返回正確，將其返回到解析器去解析返回值，邏輯很簡單。

下面是aiohttp請求的部分，沒怎麼用到，簡單貼一下吧:

async
deffetch
(session,data,url,entername,timeout=5)
: count =
1while count <6:
try:
async
with session.post(url=url, data=data, timeout=timeout, allow_redirects=
false
,headers=get_headers2())
as response:
html =
await response.text(errors=
'ignore'
)await asyncio.sleep(random.randint(1,
2))if response.status in valid_status_code and
len(html)
>
100:
return html
else
:# logger.error("正在嘗試重新請求" + url)
count +=
1 time.sleep(1)
except
:# logger.info("正在嘗試重新請求" + url)
count +=
1 time.sleep(1)
else
: logger.debug(
"企業名稱為… "
+ entername +
"企業詳細資訊採集失敗"
打造乙個輕量級企業基本資訊採集框架（三）
在上文中我們主要定義了兩個資料庫，乙個是儲存表mysql的儲存設定，還有乙個就是redis初始化設定，包括取資料，計數等功能函式的設定。沒有看過的小夥伴請移步打造乙個輕量級企業基本資訊採集框架 二 本文主要對如何實現排程器，對url進行排程以及設定。coding utf 8 from config ...
打造乙個輕量級企業基本資訊採集框架（七）
在前面六篇系列的文章中，我們已經初步搭建了乙個企業基本資訊採集框架，但是從可配置性以及視覺化方面還有很多地方需要去完善。這篇文章我們對schedule.py這個檔案做一些整體的修改，使其看上去更加符合我們的平常需求。我在原有的框架上面，又建立了乙個start crawl.py這個檔案來啟動全域性，主...
peewee 乙個輕量級的ORM 四
class database last insert id cursor,model parameters return type 最後乙個插入的記錄的那行的主鍵，不一定非得叫 id rows affected cursor return type 受影響的行數 create table model...

打造乙個輕量級企業基本資訊採集框架（四）

打造乙個輕量級企業基本資訊採集框架（三）

打造乙個輕量級企業基本資訊採集框架（七）

peewee 乙個輕量級的ORM 四

相關推薦