乙個爬取52破解的全部帖子位址的簡單爬蟲

剛剛學習python爬蟲拿乙個小例子練練手。

同樣的**在linux完美執行，但是在windows中就是各種編碼錯誤。

因為編碼問題搞得迷迷糊糊，所以只能用python3.x來操作了

**如下：

# -*- coding:utf-8 -*-  
import requests
from bs4 import beautifulsoup
import time
title_list=[,,,
,,,,
,,,,
,,,,
,,,,
,,,,
,,,,
,]
def get_html(url):
while true:
try:
response = requests.get(url)
return response.text
except exception as e:
time.sleep(10)
continue
# 得到區域總頁數 
def get_page(url):
html = get_html(url)
soup = beautifulsoup(html,'lxml')
label_list =soup.find_all('label')
page = int(label_list[3].span.string[3:-2])
return page
def page_down(url):
page = get_page(url)
print("總頁數："+str(page))
txt = input("請輸入儲存到的檔名(注意新增字尾):")
for j in range(1,page+1):
html = get_html(url[:-7]+'-'+str(j)+'.html') 
soup = beautifulsoup(html,'lxml')
label_list =soup.find_all('label')
a_list =soup.find_all('a',attrs=)
#寫入到檔案
for a in a_list:
#print(a.string)
#print(""+a.attrs['href'])
with open(txt,'a+',encoding='utf-8') as f:
f.write(a.get_text())
f.write('\n')
f.write(""+a.attrs['href'])
f.write('\n')
def main():
i = 0
time = 0
url = ''
# 輸出列表
for title in title_list:
#print(title)
for key in title:
url = str(title[key])
if time==1:
print((str(i)+':'+key).ljust(20))
time=0
else:
print((str(i)+':'+key).ljust(20),end=" ")
time+=1
i+=1
# 判斷輸入是否在範圍內
while true:
try:
print()
num = int(input('請輸入你要瀏覽的代號：'))
if num>28 or num<0:
print('輸入有誤請重新輸入')
continue
else:
break
except exception as e:
print('輸入有誤請重新輸入')
continue
# 獲得區域鏈結
dict_t = title_list[num]
for key in dict_t:
print(dict_t[key])
page_down(dict_t[key])
if __name__ == '__main__':
main()

之前在論壇看到乙個帖子講的是如何破解APK。。。

之前在論壇看到乙個帖子講的是如何破解apk。仔細看完了解到apk真的好脆弱，太容易被反編譯。辛辛苦苦寫得被別人那麼輕鬆就能得到總感覺心裡有點不舒服。於是乎開始研究如何如何保護apk不被反編譯，防止二次打包呢。通過這段時間在網上查資料與技術大神與指點研究出了一些保護apk的方法準備工作...

乙個簡單的爬蟲專案（爬取小說）

1.工具介紹 1.1我們所需要用到第三方庫 requests 爬蟲所需要的最基本的第三方庫 re 正規表示式 1.2安裝的方式 pip install requests pip install re 1.3匯入第三方庫的方式 import requests import re 2.詳細介紹首先請求...

記爬取某乙個小說

咳，寒假無聊看起收費章節，日常盜版。然後一搜一堆廣告看著就煩人噢 py爬蟲系列 import requests import time from bs4 import beautifulsoup header defgethtmltext url 照抄就完事了 try r requests.get...

乙個爬取52破解的全部帖子位址的簡單爬蟲

之前在論壇看到乙個帖子講的是如何破解APK。。。

乙個簡單的爬蟲專案（爬取小說）

記爬取某乙個小說

相關推薦