Prometheus告警規則配置

建立告警規則配置檔案first_rules.yml，建議放在與prometheus.yml同級目錄

修改配置檔案prometheus.yml，將告警規則配置檔案新增到prometheus.yml。注意路徑。

global: scrape_interval: 15s # 這個是每次資料手機的頻率 evaluation_interval: 15s # 評估告警規則的頻率。 # alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - 'localhost:9093' rule_files: - "first_rules.yml" # - "second_rules.yml" scrape_configs: # 通過這裡的配置控制prometheus監控的資源 - job_name: prometheus # prometheus自身預設的 static_configs:

- targets: ['localhost:9090'] # 預設暴露的是9090埠服務

在告警規則檔案中，我們可以將一組相關的規則設定定義在乙個group下。在每乙個group中我們可以定義多個告警規則(rule)。一條告警規則主要由以下幾部分組成：

alert：告警規則的名稱。

expr：基於promql表示式告警觸發條件，用於計算是否有時間序列滿足該條件。

for：評估等待時間，可選引數。用於表示只有當觸發條件持續一段時間後才傳送告警。在等待期間新產生告警的狀態為pending。

labels：自定義標籤，允許使用者指定要附加到告警上的一組附加標籤。

annotations：用於指定一組附加資訊，比如用於描述告警詳細資訊的文字等，annotations的內容在告警產生時會一同作為引數傳送到alertmanager。summary描述告警的概要資訊，description用於描述告警的詳細資訊。同時alertmanager的ui也會根據這兩個標籤值，顯示告警資訊。

這裡配置中的acs-ms為應用名字。也就是job的名字

alert: serverdown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
description: 例項：} 宕機了
description: '例項：}的}的介面發生了}異常 '
summary: 監控一定時間內介面請求異常的數量

exception!=「serviceexception」是將一些手動丟擲的自定義異常給排除掉。

description: '例項：} 的}介面請求時長超過了設定的閾值：5s，當前值}s '

summary: 監控一定時間內的介面請求時長
uri!~".excel."是將一些介面給排除掉。這個是將包含excel的介面排除掉。
alert: cputooheight
expr: process_cpu_usage > 0.3
for: 15s
labels:
severity: page
annotations:
description: '例項：} 的cpu超過了設定的閾值：30%，當前值} '
summary: 監測系統cpu使用的百分比
alert: tomcat_thread_height
expr: tomcat_threads_busy_threads
/ tomcat_threads_config_max_threads > 0.5
for: 15s
labels:
severity: page
annotations:
description: '例項：} 的tomcat活動執行緒佔匯流排程的比例超過了設定的閾值：50%，當前值} '
summary: 監控tomcat活動執行緒佔匯流排程的比例
./prometheus --config.file=prometheus.yml
訪問：http://ip:port:9090/rules

Prometheus 編寫告警規則案例
prometheus 編寫告警規則案例注確保alertmanager配置完畢！groups 組告警 groups name 組名。報警規則組名稱 name general.rules rules 定義角色 rules alert 告警名稱。任何例項5分鐘內無法訪問發出告警 alert nodef...

Prometheus告警收斂
告警面臨的最大問題就是告警訊息太多，很可能會導致運維人員遺漏重要的告警資訊，或者一些無關緊要的小警報太多，收件人很容易麻木，可能不再理會。如果遺漏關鍵警報沒有及時處理可能會對系統業務造成重大故障。在這個問題上，alertmanager的告警收斂配置就變得尤為重要了。合理的分組將類似的警報進行分類。...

Prometheus 告警收斂
prometheus 告警收斂告警面臨最大問題，是警報太多，相當於狼來了的形式。收件人很容易麻木，不再繼續理會。關鍵的告警常常被淹沒。在一問題中，alertmanger在一定程度上得到很好解決。prometheus成功的把一條告警發給了altermanager，而altermanager並不是簡簡...

Prometheus告警規則配置

Prometheus 編寫告警規則案例

Prometheus告警收斂

Prometheus 告警收斂

相關推薦