Python初學12 爬蟲

'''

爬蟲前奏：

1.明確目的

2.找到資料相對應得網頁

3.分析網頁結構，找到資料的所在標籤的位置

操作：模擬http請求，向伺服器傳送這個請求，獲取到伺服器返回的頁面

正規表示式提取我們需要的資料

'''虎牙主頁主播和人氣

107.3萬 ------------資料2

'''**實現

#encoding:utf-8
'''created on 2023年6月14日
@author: administrator
'''from urllib import request
import re
class spider():
url = ''
root_pattern = '([\s\s]*)' 
name_pattern = '([\s\s]*?)'
num_pattern = '([\s\s]*?)' 
def __fetch_content(self): #私有方法__xx__
r=request.urlopen(spider.url)
htmls = r.read()
htmls = str(htmls,encoding='utf-8')
#print(htmls)
return htmls
def __anaysis(self,htmls): 
root_html = re.findall(self.root_pattern, htmls)
#print(type(root_html),len(root_html))
for html in root_html:
#print(html)
name = re.findall(self.name_pattern,html)
num = re.findall(self.num_pattern,html)
anchors = 
#print(anchors)
data_name = 
data_num = 
for anchor_name in anchors['name']:
name=
for anchor_num in anchors['num']:
#end = anchor_num[len(anchor_num)-1]
#print(end,type(end))
#print('萬',type('萬'))
#print(end=='萬')
if '萬' in anchor_num:
r = re.findall('\d*',anchor_num)
anchor_num =str (float(r[0])*10000)
num=
#print(data_name)
#print(data_num)
datas = list(map(lambda name,num:,data_name,data_num))
#print('__anaysis',datas) 
return datas
#print(anchors[0]) 
def __sortdata(self,datas): 
datas.sort(key=lambda x:x['num'],reverse = true)
return datas
def __showdata(self,datas): 
for data in datas:
print(data)
def go(self):
htmls=self.__fetch_content()
datas=self.__anaysis(htmls)
sort_datas=self.__sortdata(datas)
self.__showdata(sort_datas)
spider = spider()
spider.go()

初學python爬蟲

上之前先說下這個簡易爬蟲框架的思路排程器爬蟲的入口知道沒有url或爬蟲終端，輸出結果上 1，排程器 from myspider import urls manager,html html paser,html outer class legendspider object def init...

python爬蟲初學

0x01環境搭建 import os import requests from lxml import etree from urllib.parse import urljoin import urllib pip installl 包名字0x02介紹這裡寫了乙個爬的爬蟲指令碼如果不能解決就手...

Python 爬蟲初學

爬取中的1import re 正規表示式庫 2import urllib url鏈結庫34 defgethtml url 5 page urllib.urlopen url 開啟鏈結 6 html page.read 像讀文字一樣讀取網頁內容 7return html89 defgetimg ht...

Python初學12 爬蟲

初學python爬蟲

python爬蟲初學

Python 爬蟲初學

相關推薦