python爬蟲之web伺服器連線

一、web伺服器整體處理過程

1、輸入：url

http(https)://網域名稱部分：埠號/目錄/檔名.檔案字尾

http(https)://網域名稱部分：埠號/目錄/

2、處理過程

爬蟲抓取多個頁面只需解析robots.txt 一次，，http1.1中設定的connection屬性設定為keep-alive,表示連線會保持，服務端不會主動斷開連線

2、requests和response的使用

requests.request():用於構造乙個請求

requests.get():獲取html網頁的get方法

requests.head()：獲取html網頁的頭部資訊

requests.post()：向html網頁提交post請求t方法

requests.put()：向html網頁提交put請求t方法

requests.向html網頁提交區域性修改請求方法

requests.delete()：向html網頁提交刪除

requests.session()：在不同次請求中web伺服器保持某些引數

請求引數：

params:url的額外引數

proxies：字典，設定訪問**伺服器

import requests
url =''
headers = 
response = requests.get(url,headers=headers,timeout = 10,params=kw)
response.encoding= 'utf-8'
print(response.text)

3、錯誤異常處理

import requests
from requests.exceptions import readtimeout,connectionerror,requestexception
url = '' #""
url2 = ''
try:
req = requests.get(url2,timeout = 5)
print(req.status_code)
except readtimeout:
# 超時異常
print('timeout')
except connectionerror:
# 連線異常
print('connection error')
except requestexception:
# 請求異常
print('error')
else:
if req.status_code == 200:
print("訪問正常！")
# 將爬取的網頁儲存在本地
fb = open('t.html','wb')
fb.write(req.content)
fb.close()
if req.status_code == 404:
print("頁面不存在")
if req.status_code == 403:
print("頁面禁止訪問！")
if req.status_code == 503:
print("頁面臨時不可訪問！")

python學習之Web靜態伺服器

通過近幾天學習，完成乙個看可以在命令視窗啟動，使用命令指定埠的多工靜態web服務區。這篇文章只附上了主要 html檔案不附 usr bin env python coding utf 8 import socket import re import sys import gevent from ge...

IIs之web伺服器，FTP伺服器

既往不戀，當下不雜，未來不迎。1 web伺服器也成為網頁伺服器或者http伺服器。2 web伺服器使用的協議是http或者https。3 http協議埠號 tcp 80，https協議埠號 tcp 443。ftp協議埠號 21。linux apache lamp tomcat nginx etc，第...

python 簡易WEB伺服器

設計乙個web服務，滿足以下基本功能建立套接字使用埠號 6699 獲取http請求，並解析http請求報文顯示請求報文各字段的欄位名和值，對部分字段進行說明根據http請求報文獲得物件檔案路徑名根據路徑名開啟本地檔案封裝本地檔案到http響應報文使用套接字傳送http相應報文瀏覽器輸...

python爬蟲之web伺服器連線

python學習之Web靜態伺服器

IIs之web伺服器，FTP伺服器

python 簡易WEB伺服器

相關推薦