爬取網易雲歌單標籤

import
reimport
urllib.request
import
urllib.error
import
urllib.parse 
import
jieba
defget_all_hotsong(url):
headers=
request = urllib.request.request(url=url, headers=headers)
html = urllib.request.urlopen(request).read().decode('
utf-8
') #
開啟url 
html = str(html) #
轉換成str 
pat1 = r'
playlist\?id=(\d*?)" class="t
'result_id = re.compile(pat1).findall(html) #
用正規表示式進行篩選id 
pat2=r'
'result_name = re.compile(pat2).findall(html) #
用正規表示式進行篩選歌單名字name 
return
result_name,result_id
defget_lables(url):
headers=
request = urllib.request.request(url=url, headers=headers)
html = urllib.request.urlopen(request).read().decode('
utf-8
') #
開啟url 
html = str(html) #
轉換成str 
w1='
標籤：'
w2='
，簡介'
pat = re.compile(w1+'
(.*?)
'+w2,re.s)
result =pat.findall(html)
#print(result)
return
result
f = open('
result3.txt
', '
a', encoding='
utf-8
') #
寫入檔案
for i in range(0,1):
url = '
' + str(i*35)
name,id=get_all_hotsong(url)
num=0
for j in id: #
遍歷歌單
t_url='
'+j label_=get_lables(t_url)
k=0iflen(label_):
#print(label_[k])
f.write(label_[k])
f.write('\n
')f.close()
f = open("
result3.txt
", '
r', encoding='
utf-8
').read()
f2 = open('
result4.txt
', '
a', encoding='
utf-8
') #
寫入檔案
counts ={}
wordslist =jieba.lcut(f)
for word in
wordslist:
word = word.replace("
，", "").replace("
！", "").replace("
「", ""
) \ .replace("」
", "").replace("
。", "").replace("
？", "").replace("
：", ""
) \ .replace(
"...
", "").replace("
、", "").strip('
').strip('
\r\n')
if len(word) == 1 or word == ""
: 
continue
else
: counts[word]=counts.get(word,0)+1 #
單詞計數
items = list(counts.items()) #
將字典轉為list
items.sort(key=lambda x:x[1],reverse=true) #
根據單詞出現次數降序排序
#列印前15個
for item in
items:
word,counter =item
print("
單詞：{},次數：{}
".format(word,counter))
f2.write(
"單詞：{},次數：{}
".format(word,counter))
f2.write('\n
')f2.close()

爬取網易雲歌單

偶爾在微博上看到，要是歌單裡誰的歌超過30首，那肯定是真愛吧。我看了連忙開啟網易雲我的歌單，結果1000多首歌。這讓我自己數得數到猴年馬月呀.於是萌生出了寫一段小爬蟲來統計的想法。剛開始想直接解析網頁元素，後發現很麻煩，很多資訊不能一次抓取到，於是找到網頁請求的介面，結果介面有加密引數，看了一下j...

爬網易雲歌單

學習爬蟲嘛，就是批量獲取目標上內容。首先需要知道目標的url，尤其是需要獲取目標裡面子鏈結中的內容時，需要先批量獲取所有子鏈結的url。其次是從大量的資訊中提取並整理自己想要的資訊。是不是很簡單一般用beautiful soup 庫，專門用來提取網頁的資料，用作爬蟲很好用。beautifu...

反爬蟲爬取網易雲歌單

一主題式網路爬蟲設計方案 1.主題式網路爬蟲名稱爬取網易雲歌單 2.主題式網路爬蟲爬取的內容與資料特徵分析 3.主題式網路爬蟲設計方案概述包括實現思路與技術難點實現思路使用單執行緒爬取，初始化資訊，設定請求頭部資訊，獲取網頁資源，使用etree進行網頁解析，爬取多頁時重新整理offset...

爬取網易雲歌單標籤

爬取網易雲歌單

爬網易雲歌單

反爬蟲爬取網易雲歌單

相關推薦