使用BeautifulSoup庫進行爬蟲

pip install beautifulsoup4

要求:

爬取該頁面的7天天氣資料

通過審查元素我們可以肯定我們需要的資訊都包含在div id="7d"裡

**編寫:

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import requests
from bs4 import beautifulsoup
url = ''
#使用request爬取網頁資訊
def gethtmltext(url,timeout=30):
try:
r=requests.get(url,timeout=30)
r.raise_for_status() #如果狀態碼不是200則返回異常
return r.text
except:
return '出現異常'
def get_data(html):
final_list = 
soup = beautifulsoup(html,'html.parser')
body = soup.body
data = body.find('div',) #查詢div中id為7d的標籤
ul = data.find('ul')
lis = ul.find_all('li')
for day in lis:
temp_list = 
date = day.find('h1').string #找到日期
#找到所有p標籤
info = day.find_all('p')
#第乙個p標籤沒有i標籤
#最高溫度 有可能沒有
if info[1].find('span'):
weather_high = info[1].find('span').string
#最低溫度
if info[1].find('i'):
weather_low = info[1].find('i').string
#風級wind_scale = info[2].find('i').string
return final_list
def main():
list = get_data(gethtmltext(url))
print(list)
if __name__ == '__main__':
main()

結果:

BeautifulSoup 安裝使用

linux環境 1.安裝方法一解壓 tar xzvf beautifulsoup4 4.2.0.tar.gz 安裝進入解壓後的目錄 python setup.py build sudo python setup.py install 方法二快速安裝 ubuntu sudo apt get i...

BeautifulSoup使用相關知識

1基礎使用，獲取某一內容的h1標籤 2複雜html解析 print name.get text get text 清除標籤，只保留內容 4通過網際網路採集外鏈 from urllib.request import urlopen from bs4 import beautifulsoup imp...

使用BeautifulSoup解析HTML

通過css屬性來獲取對應的標籤，如下面兩個標籤可以通過class屬性抓取網頁上所有的紅色文字，具體如下 from urllib.request import urlopen from bs4 import beautifulsoup html urlopen bsobj beautifulsou...

使用BeautifulSoup庫進行爬蟲

BeautifulSoup 安裝使用

BeautifulSoup使用相關知識

使用BeautifulSoup解析HTML

相關推薦