文章詳情頁

如何基于Python和Flask編寫Prometheus監控

瀏覽：101日期：2022-07-04 13:48:55

介紹

Prometheus 的基本原理是通過 HTTP 周期性抓取被監控組件的狀態。

任意組件只要提供對應的 HTTP 接口并且符合 Prometheus 定義的數據格式，就可以接入 Prometheus 監控。

Prometheus Server 負責定時在目標上抓取 metrics（指標）數據并保存到本地存儲。它采用了一種 Pull（拉）的方式獲取數據，不僅降低客戶端的復雜度，客戶端只需要采集數據，無需了解服務端情況，也讓服務端可以更加方便地水平擴展。

如果監控數據達到告警閾值，Prometheus Server 會通過 HTTP 將告警發送到告警模塊 alertmanger，通過告警的抑制后觸發郵件或者 Webhook。Prometheus 支持 PromQL 提供多維度數據模型和靈活的查詢，通過監控指標關聯多個 tag 的方式，將監控數據進行任意維度的組合以及聚合。

在python中實現服務器端，對外提供接口。在Prometheus中配置請求網址，Prometheus會定期向該網址發起申請獲取你想要返回的數據。

另外Prometheus提供4種類型Metrics：Counter, Gauge, Summary和Histogram。

準備

pip install flaskpip install prometheus_client

Counter

Counter可以增長，并且在程序重啟的時候會被重設為0，常被用于訪問量，任務個數，總處理時間，錯誤個數等只增不減的指標。

定義它需要2個參數，第一個是metrics的名字，第二個是metrics的描述信息：

c = Counter(’c1’, ’A counter’)

counter只能增加，所以只有一個方法：

def inc(self, amount=1): ’’’Increment counter by the given amount.’’’ if amount < 0: raise ValueError(’Counters can only be incremented by non-negative amounts.’) self._value.inc(amount)

測試示例：

import prometheus_clientfrom prometheus_client import Counterfrom prometheus_client.core import CollectorRegistryfrom flask import Response, Flaskapp = Flask(__name__)requests_total = Counter(’c1’,’A counter’)@app.route('/api/metrics/count/')def requests_count(): requests_total.inc(1) # requests_total.inc(2) return Response(prometheus_client.generate_latest(requests_total),mimetype='text/plain')if __name__ == '__main__': app.run(host='127.0.0.1',port=8081)

訪問http://127.0.0.1:8081/api/metrics/count/：

# HELP c1_total A counter# TYPE c1_total counterc1_total 1.0# HELP c1_created A counter# TYPE c1_created gaugec1_created 1.6053265493727107e+09

HELP是c1的注釋說明，創建Counter定義的。

TYPE是c1的類型說明。

c1_total為我們定義的指標輸出：你會發現多了后綴_total,這是因為OpenMetrics與Prometheus文本格式之間的兼容性，OpenMetrics需要_total后綴。

gauge

gauge可增可減，可以任意設置。

比如可以設置當前的CPU溫度，內存使用量，磁盤、網絡流量等等。

定義和counter基本一樣：

from prometheus_client import Gaugeg = Gauge(’my_inprogress_requests’, ’Description of gauge’)g.inc() # Increment by 1g.dec(10) # Decrement by given valueg.set(4.2) # Set to a given value

方法：

def inc(self, amount=1): ’’’Increment gauge by the given amount.’’’ self._value.inc(amount)def dec(self, amount=1): ’’’Decrement gauge by the given amount.’’’ self._value.inc(-amount) def set(self, value): ’’’Set gauge to the given value.’’’ self._value.set(float(value))

測試示例：

import randomimport prometheus_clientfrom prometheus_client import Gaugefrom prometheus_client.core import CollectorRegistryfrom flask import Response, Flaskapp = Flask(__name__)random_value = Gauge('g1', ’A gauge’)@app.route('/api/metrics/gauge/')def r_value(): random_value.set(random.randint(0, 10)) return Response(prometheus_client.generate_latest(random_value), mimetype='text/plain')if __name__ == '__main__': app.run(host='127.0.0.1',port=8081)

訪問http://127.0.0.1:8081/api/metrics/gauge/

# HELP g1 A gauge# TYPE g1 gaugeg1 5.0

LABELS的用法

使用labels來區分metric的特征，一個指標可以有其中一個label，也可以有多個label。

from prometheus_client import Counterc = Counter(’requests_total’, ’HTTP requests total’, [’method’, ’clientip’])c.labels(’get’, ’127.0.0.1’).inc()c.labels(’post’, ’192.168.0.1’).inc(3)c.labels(method='get', clientip='192.168.0.1').inc()

import randomimport prometheus_clientfrom prometheus_client import Gaugefrom flask import Response, Flaskapp = Flask(__name__)c = Gauge('c1', ’A counter’,[’method’,’clientip’])@app.route('/api/metrics/counter/')def r_value(): c.labels(method=’get’,clientip=’192.168.0.%d’ % random.randint(1,10)).inc() return Response(prometheus_client.generate_latest(c), mimetype='text/plain')if __name__ == '__main__': app.run(host='127.0.0.1',port=8081)

連續訪問9次http://127.0.0.1:8081/api/metrics/counter/：

# HELP c1 A counter# TYPE c1 gaugec1{clientip='192.168.0.7',method='get'} 2.0c1{clientip='192.168.0.1',method='get'} 1.0c1{clientip='192.168.0.8',method='get'} 1.0c1{clientip='192.168.0.5',method='get'} 2.0c1{clientip='192.168.0.4',method='get'} 1.0c1{clientip='192.168.0.10',method='get'} 1.0c1{clientip='192.168.0.2',method='get'} 1.0

histogram

這種主要用來統計百分位的，什么是百分位？英文叫做quantiles。

比如你有100條訪問請求的耗時時間，把它們從小到大排序，第90個時間是200ms，那么我們可以說90%的請求都小于200ms，這也叫做”90分位是200ms”，能夠反映出服務的基本質量。當然，也許第91個時間是2000ms，這就沒法說了。

實際情況是，我們每天訪問量至少幾個億，不可能把所有訪問數據都存起來，然后排序找到90分位的時間是多少。因此，類似這種問題都采用了一些估算的算法來處理，不需要把所有數據都存下來，這里面數學原理比較高端，我們就直接看看prometheus的用法好了。

首先定義histogram：

h = Histogram(’hh’, ’A histogram’, buckets=(-5, 0, 5))

第一個是metrics的名字，第二個是描述，第三個是分桶設置，重點說一下buckets。

這里(-5,0,5)實際劃分成了幾種桶：(無窮小，-5]，（-5，0]，(0,5]，（5，無窮大）。

如果我們喂給它一個-8：

h.observe(8)

那么metrics會這樣輸出：

# HELP hh A histogram# TYPE hh histogramhh_bucket{le='-5.0'} 0.0hh_bucket{le='0.0'} 0.0hh_bucket{le='5.0'} 0.0hh_bucket{le='+Inf'} 1.0hh_count 1.0hh_sum 8.0

hh_sum記錄了observe的總和，count記錄了observe的次數，bucket就是各種桶了，le表示<=某值。

可見，值8<=無窮大，所以只有最后一個桶計數了1次（注意，桶只是計數，bucket作用相當于統計樣本在不同區間的出現次數）。

bucket的劃分需要我們根據數據的分布拍腦袋指定，合理的劃分可以讓promql估算百分位的時候更準確，我們使用histogram的時候只需要知道先分好桶，再不斷的打點即可，最終百分位的計算可以基于histogram的原始數據完成。

測試示例：

import randomimport prometheus_clientfrom prometheus_client import Histogramfrom flask import Response, Flaskapp = Flask(__name__)h = Histogram('h1', ’A Histogram’, buckets=(-5, 0, 5))@app.route('/api/metrics/histogram/')def r_value(): h.observe(random.randint(-5, 5)) return Response(prometheus_client.generate_latest(h), mimetype='text/plain')if __name__ == '__main__': app.run(host='127.0.0.1',port=8081)

連續訪問http://127.0.0.1:8081/api/metrics/histogram/：

# HELP h1 A Histogram# TYPE h1 histogramh1_bucket{le='-5.0'} 0.0h1_bucket{le='0.0'} 5.0h1_bucket{le='5.0'} 10.0h1_bucket{le='+Inf'} 10.0h1_count 10.0# HELP h1_created A Histogram# TYPE h1_created gaugeh1_created 1.6053319432993534e+09

summary

python客戶端沒有完整實現summary算法，這里不介紹。

以上就是本文的全部內容，希望對大家的學習有所幫助，也希望大家多多支持好吧啦網。

Python 編程

上一條：通過Python pyecharts輸出保存圖片代碼實例下一條：python 基于wx實現音樂播放

相關文章：

1. 在Android中使用WebSocket實現消息通信的方法詳解2. python matplotlib:plt.scatter() 大小和顏色參數詳解3. Yii2.0引入CSS,JS文件方法4. JSP數據交互實現過程解析5. Python importlib動態導入模塊實現代碼6. vue使用webSocket更新實時天氣的方法7. 淺談python出錯時traceback的解讀8. android studio 打包自動生成版本號與日期,apk輸入路徑詳解9. Nginx+php配置文件及原理解析10. JavaMail 1.4 發布

排行榜

					
					在Android中使用WebSocket實現消息通信的方法詳解
常用數據庫JDBC連接寫法(轉摘)
django模型動態修改參數,增加 filter 字段的方式
你可能真沒用過這些 IDEA 插件(建議收藏)
python matplotlib:plt.scatter() 大小和顏色參數詳解
JSP數據交互實現過程解析
vue使用webSocket更新實時天氣的方法
Intellij IDEA 關閉和開啟自動更新的提示?
Nginx+php配置文件及原理解析
Yii2.0引入CSS,JS文件方法
解決啟動django,瀏覽器顯示“服務器拒絕訪問”的問題