文字混淆反爬 CSS偏移

說明：利用css樣式將亂序的文字排版為人類正常閱讀的順序，但是爬蟲獲取到的是亂序的

例如：去哪兒網的機票**

步驟：

1、分析數字規律；

2、定位數字所在標籤，得到基準資料；

3、提取其他標籤的偏移量和數字；

4、根據偏移量決定基準資料列表的覆蓋元素。

import re
from parsel import selector
from selenium import webdriver
driver = webdriver.chrome(executable_path=driver_path)
url =
''resp = driver.get(url)
sel = selector(resp.page_source)
span = sel.css(
'span.prc_wp'
)em = sel.css(
'em.rel'
).extract(
)# 定位em標籤
# 定位_b標籤
for element in em:
element = selector(element)
element_b = element.css(
'b')
.extract(
) b1 = selector(element_b.pop(0)
)# 獲取第一對b標籤的值:base_price 基準資料列表的元素超過包裹i標籤的b標籤寬度，進行切片
b1_style = b1.css(
'b::attr("style")'
).extract_first(
) b1_width =
''.join(re.findall(
'width:(.*)px;'
, b1_style)
) number =
int(
int(b1_width)/16
) base_price = b1.css(
'i::text'
).extract()[
:number]
# 提取其他b標籤的的偏移量和數字
alternate_price =
for eb in element_b:
eb = selector(eb)
style = eb.css(
'b::attr("style")'
).get(
)# 提取b標籤的style屬性
position =
''.join(re.findall(
'left:(.*)px'
, style)
)# 具體位置
value = eb.css(
'b::text'
).get())
# 根據偏移量決定基準資料列表的覆蓋元素
for al in alternate_price:
position =
int(al.get(
'position'))
value = al.get(
'value'
) plus =
true
if position >=
0else
false
# 判斷位置數值是否為正整數
index =
int(position /16)
# 計算要替換的下標
base_price[index]
= value
print
(base_price)

python css偏移反爬（一）

目標 css法詳解可到崔佬的檢視，在此以表感謝，學習了。import requests import re from parsel import selector css法 def spider url resp requests.get url sel selector resp.text em ...

Python 反爬蟲文字混淆反爬蟲

文中案例參考 github專案注意相同的字形的寬高或者輪廓點可能會不一樣，但是它們描述的會是乙個字形因此，只有起止座標和點座標資料完全一樣的字形，我們才能肯定它們是相同的字元參考案例005及書中p202 瀏覽器器物件 bom 詳細dom和bom物件屬性和方法檢視圖書p66 p69 使用者憑證...

glidedsky挑戰 CSS反爬

相應頁面分析這個頁面的特點頁面顯示出來的資料不同頁面中部分顯示的資料可能在標籤中不顯示頁面現數字順序是亂的，不好組合拼接，如第3個字元 346對應634 頁面中，有些字元還出現其它的數字來擾亂數字。是不是這些資料似曾相識，沒錯了，這些css 就是頁面數字顯示出來的規則。沒辦法了，那就分析 ...

文字混淆反爬 CSS偏移

python css偏移反爬（一）

Python 反爬蟲 文字混淆反爬蟲

glidedsky挑戰 CSS反爬

相關推薦

Python 反爬蟲文字混淆反爬蟲