Python爬取學習猿地文章標題，鏈結，時間，作者

'''
爬取**：學習猿地猿圈
爬取內容：文章標題，文章連線，作者，時間
工具：bs4,requests
結果：爬取到檔案之中
'''from bs4 import beautifulsoup
import requests,json
#1.定義請求頭和url
url = ''
headers = 
alldata = 
#2.請求獲取資料
res = requests.get(url = url,headers = headers)
if res.status_code == 200:
#3.解析資料
soup = beautifulsoup(res.text,'lxml')
#獲取頁面中所有的文章
divclass = soup.find_all( 'div',class_="list-group-item list-group-item-action p-06")
for i in divclass:
my_title = i.find('div',class_="topic_title mb-0 lh-180")
if my_title:
my_title = my_title.text.split("\n")[0]
my_url = i.a["href"]
my_time = i.span['title']
my_author = i.strong.a.text
print(my_author)
print(my_url)
print(my_time)
print(my_title)
temp = 
#4.寫入檔案
with open('c:/users/lsy/desktop/lsy.json','w') as fp:
json.dump(alldata,fp)

Xpath實戰之爬取學習猿地的猿著上

import requests,json from lxml import etree 封裝為類，便於管理 class xp test 請求的位址猿著 url headers 爬取的資料 data 儲存資料 filepath yq.json 初始化 def init self 傳送請求 res r...

Python 爬取CSDN部落格文章

新建乙個module，用於根據使用者名稱來獲取文章的url coding utf 8 from bs4 import beautifulsoup import requests 獲取部落格文章數量 def get page size user name article list url user n...

如何爬取CSDN部落格中分欄的所有文章的標題鏈結

import re import requests from bs4 import beautifulsoup headers 網頁鏈結 link 獲取網頁 r requests.get link,headers headers,timeout 10 使用soup進行過濾 soup beautifu...

Python爬取學習猿地文章標題，鏈結，時間，作者

Xpath實戰之爬取學習猿地的猿著 上

Python 爬取CSDN部落格文章

如何爬取CSDN部落格中分欄的所有文章的標題 鏈結

相關推薦

Xpath實戰之爬取學習猿地的猿著上

如何爬取CSDN部落格中分欄的所有文章的標題鏈結