python css偏移反爬（一）

目標**：

css法詳解可到崔佬的檢視，在此以表感謝，學習了。

import requests
import re
from parsel import selector
# css法
def spider():
url = ''
resp = requests.get(url)
sel = selector(resp.text)
em = sel.css('em.rel')
for element in em:
# 定位所有的標籤
element_b = element.css('b').extract()
b1 = selector(element_b.pop(0))
b1_style = b1.css('b::attr("style")').get()
# print(b1_style) # width:48px;left:-48px width:64px;left:-64px
b1_width = ''.join(re.findall('width:(.*)px;', b1_style))
number = int(int(b1_width) / 16) # 3 4
# 獲取第 1 對標籤中的值(列表)
base_price = b1.css('i::text').extract()[:number]
# print(base_price)
alternate_price = 
for eb in element_b:
eb = selector(eb)
# 提取標籤的 style 屬性值
style = eb.css('b::attr("style")').get()
# 獲得具體的位置
position = ''.join(re.findall('left:(.*)px', style))
# 獲得該標籤下的數字
value = eb.css('b::text').get()
# 將標籤的位置資訊和數字以字典的格式新增到替補票價列表中
# print(alternate_price)
for al in alternate_price:
position = int(al.get('position'))
value = al.get('value')
# 計算下標，以 16px 為基準
index = int(position / 16)
# 替換第一對標籤值列表中的元素，也就是完成值覆蓋操作
base_price[index] = value
print(base_price)
# spider()
# xpath法一：
def spider1():
url = ''
resp = requests.get(url)
sel = selector(resp.text)
em = sel.xpath('//em[@class="rel"]')
for element in em:
element_b = element.xpath('./b').extract()
b1 = selector(element_b.pop(0))
b1_style = b1.xpath('//@style').extract_first()
b1_width = re.search('width:(.*)px;', b1_style).group(1)
number = int(int(b1_width) / 16) # 3 4
base_price = b1.xpath('//i/text()').extract()[:number]
# print(base_price)
alternate_price = 
for eb in element_b:
eb = selector(eb)
style = eb.xpath('//@style').extract_first()
position = re.search('left:(.*)px', style).group(1)
value = eb.xpath('//text()').extract_first()
# print(alternate_price)
for al in alternate_price:
position = int(al.get('position'))
value = al.get('value')
index = int(position / 16)
base_price[index] = value
print(base_price)
# spider1()
# xpath法二：
def spider2():
url = ''
resp = requests.get(url)
sel = selector(resp.text)
em = sel.xpath('//em[@class="rel"]')
for element in em:
b1_style = element.xpath('./b[1]/@style').extract_first()
b1_width = re.search('width:(.*)px;', b1_style).group(1)
number = int(int(b1_width) / 16) # 3 4
base_price = element.xpath('./b[1]/i/text()').extract()[:number]
# 取當前節點下所有兄弟標籤
element_b = element.xpath('./b[1]/following-sibling::*').extract()
# 效果等同於上一行
# element_b = element.xpath('./b[position()>1]').extract()
alternate_price = 
for eb in element_b:
eb = selector(eb)
style = eb.xpath('//@style').extract_first()
position = re.search('left:(.*)px', style).group(1)
value = eb.xpath('//text()').extract_first()
# print(alternate_price)
for al in alternate_price:
position = int(al.get('position'))
value = al.get('value')
index = int(position / 16)
base_price[index] = value
print(base_price)
spider2()

文字混淆反爬 CSS偏移

說明利用css樣式將亂序的文字排版為人類正常閱讀的順序，但是爬蟲獲取到的是亂序的例如去哪兒網的機票步驟 1 分析數字規律 2 定位數字所在標籤，得到基準資料 3 提取其他標籤的偏移量和數字 4 根據偏移量決定基準資料列表的覆蓋元素。import re from parsel import s...

反爬與反反爬（一）

1.伺服器反爬原因 2.伺服器常反什麼樣的爬蟲 3.反爬蟲領域常見的一些概念誤傷在反爬蟲的過程中，錯誤的將普通使用者識別為爬蟲。誤傷率高的反爬蟲策略，效果再好也不能用。攔截成功的阻止爬蟲訪問。這裡會有攔截率的概念。通常來說，攔截率越高的反爬蟲策略，誤傷的可能性就越高。因為需要做個權衡。資源機...

python爬京東反爬爬蟲怎麼測試反爬？

有沒有反爬，如果你沒有用爬蟲抓取過，你是不可能知道的。就算要測試，你還要嘗試不同的delay。如果設定的 delay 在的反爬頻率外，那就測不出來。如果在頻率內，那就被封。或者封ip，或者封賬號。如果一定要測出來，就簡單粗暴的方法，你不要設定delay，就不間斷的抓，最後出現兩種情況，1 有反爬，...

python css偏移反爬（一）

文字混淆反爬 CSS偏移

反爬與反反爬（一）

python爬京東 反爬 爬蟲怎麼測試反爬？

相關推薦

python爬京東反爬爬蟲怎麼測試反爬？