爬取需要登入的網頁內容，獲取cookie

首先需要獲取到網頁的cookie,例如爬取人人網登入後的**為找到cookie，如下圖

最後**如下就可以模擬網頁登入了

import urllib.request
import urllib.parse
url = ''
'cookie':' anonymid=jxsntfqs-ofu0t6; depovince=zgqt; _r01_=1; jebe_key=37a5e620-420b-42e4-8361-ab88024b3324%7cfe293cb2ffcb374252a27291355dc10f%7c1562486156712%7c1%7c1562486158570; jebe_key=37a5e620-420b-42e4-8361-ab88024b3324%7cfe293cb2ffcb374252a27291355dc10f%7c1562486156712%7c1%7c1562486158576; wp=0; ick_login=7a1fbbfe-cf38-498b-b031-18cd95c838e0; jebecookies=d3a55224-0887-4972-b301-fde9306964d5|||||; jsessionid=abcc8n0zwgktj9euehlvw; _de=c5e0e40596487205091539360ea5d908; p=f90a637b5d70675e5226acc6ef2d753a9; first_login_flag=1; ln_uact=18701902391; ln_hurl= t=e69c70422cb6a8c8f0a57170308239c19; societyguester=e69c70422cb6a8c8f0a57170308239c19; id=971405629; xnsid=3479f0f9; ver=7.0; loginfrom=null; wp_fold=0',
}request = urllib.request.request(url=url, headers=headers)
response = urllib.request.urlopen(request)
with open('renren.html', 'wb') as fp:
fp.write(response.read())

Python爬取網頁內容

其時序圖如圖所示。給定乙個要訪問的url，獲取這個html及內容，遍歷html中的某一類鏈結，如a標籤的href屬性，從這些鏈結中繼續訪問相應的html頁面，然後獲取這些html的固定標籤的內容，如果需要多個標籤內容，可以通過字串拼接，最後通過正規表示式刪除所有的標籤，最後將其中的內容寫入.txt檔...

python lxml爬取網頁內容

from lxml import etree import requests url response requests.get url text response.text html etree.html text 先獲取到這個頁面的html，對了，這裡還用到了xpath來選擇節點，具體用法請參考...

靜態網頁內容爬取（python）

以漏洞掃瞄為例 from bs4 import beautifulsoup from urllib.request import urlopen import pymysql as mysqldb import re import os 插入資料 def insertdata lis cursor...

爬取需要登入的網頁內容，獲取cookie

Python爬取網頁內容

python lxml爬取網頁內容

靜態網頁內容爬取（python）

相關推薦