參考:docker-prometheus/docker-compose.yml at master · Kev1nChan/docker-prometheus · GitHub
使用 Raspberry Pi 3/4 注意使用 32 位元都會有機會遇到記憶體問題
建議使用 64 bituname -m
不是 arrch64 都是 32bit
架設 Prometheus
調整部份 Raspberry PI
1 | version: '3.3' |
接下來設定在 Grafana 設定 Promethus Datasource 設定
就完成了
相關 alertmanager 調整請看 15分鐘建製 Promethus 環境
Node Exporter
我原本以為 Node Exporter 要抓本機機器一定要用 Systemd 去執行程式
但看範例是用 Docker
感覺不一定要用 Systemd
這邊我機器都有安裝 Docker
所以我就沒選則用這個方式
使用 Systemd 方式記錄一下
可參考
1 | sudo tee /etc/systemd/system/node_exporter.service <<"EOF" |
docker 要怎麼用呢?
其實上面就寫了
官方寫法跟我的有點不太一樣
1 | docker run -d \ |
1 |
|
這邊不太明白 network_mode 要用 host 模式?
最我我選擇調整別台使用 docker
1 |
|
dockr-compose up 起來
prometheus/config.yml 修改1
2
3
4
5
6
7
8
9
10- job_name: 'node-exporter'
# Override the global default and scrape targets from this job every 5 seconds.
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
scrape_interval: 5s
static_configs:
- targets:
- 'node_exporter:9100'
- '192.168.1.xx:9100'
- '192.168.1.xx:9100'
有遇到
1 | ERROR: Version in "./docker-compose.yml" is unsupported. You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1. |
請更新 docker-compose version
armv6 (pi 1 ) 不能跑
自己找網路上 build 都失敗
仔細想想,真的 PI 效能真的很低
所以就不放這個了
1 Node Exporter for Prometheus Dashboard CN v20201010 dashboard for Grafana | Grafana Labs
Node Exporter Full dashboard for Grafana | Grafana Labs
tranmission
- GitHub - sandrotosi/simple-transmission-exporter: A simple Prometheus exporter for Transmission
- GitHub - metalmatze/transmission-exporter: Prometheus exporter for Transmission metrics, written in Go.
仔細看了一下 dashboard
兩個差異還滿大的
主要我是要看網路傳輸速度
所以我這邊選擇 python 版本
1 | git clone https://github.com/sandrotosi/simple-transmission-exporter.git |
1 | FROM arm32v7/python:3.9 |
1 | docker build -t transmission_exporter . |
prometheus 設定增加後,重啟服務docker-compose restart prometheus
1
2
3
4
5
6
7- job_name: 'transmission-exporter'
# Override the global default and scrape targets from this job every 5 seconds.
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets:
- '192.168.1.203:29091'
調整成 docker-compose
換成 docker-compose 就不能運作了…
但一般docker 指令可正常執行…
真的很奇怪
1 | version: '2' |
20210408
後來發現 docker-compose up 起來 IP 好像不一樣
所以才不能跑
看 transmission 設定有一個設定白名單
1 | "rpc-whitelist": "127.0.0.1,192.168.1.*,172.17.0.*", |
172.17.0.* 預設會過
但是 docker-compose 起來不是這段 IP
透過 Docker Compose 設定 network | Titangene Blog
目前指令不先研究怎麼使用
最快方法就是把IP調成172.*.*.*
traefik
1 | command: |
原生 traefik 就有支援 prometheus
加上這個--metrics.prometheus=true
就能使用
granafan 加表Traefik dashboard for Grafana | Grafana Labs會看到1
Panel plugin not found: grafana-piechart-panel
docker-traefik-prometheus/config.monitoring at master · vegasbrianc/docker-traefik-prometheus · GitHub
這邊有看到跑 docker 前
可以在這邊設定這個
這樣可以忽略下面安裝 plugin
1 | [root@prometheus prometheus]# grafana-cli plugins install grafana-piechart-panel |
參考:解决Panel plugin not found: grafana-piechart-panel - 知我知行
但我這邊是用docker1
2docker-compose exec grafana bash
grafana-cli plugins install grafana-piechart-panel
我後來是用這個 dashboard
Traefik 2.2 dashboard for Grafana | Grafana Labs
1 | GF_INSTALL_PLUGINS=grafana-piechart-panel |
process exporter
目前沒用到
是看程式有沒有執行
但目前相對程式都有 exporter
就沒有使用
Prometheus — Process-exporter进程监控 - huandada - 博客园
Blackbox Exporter
老實說,剛看好難入手
實作完才知道大概在做什麼
簡單來說,所有控制點都在 prometheus
blackbox 都是收到接收做事情簡查
不多說,馬上實作
先git clone https://github.com/prometheus/blackbox_exporter.git
下來
1 | docker run --rm -d -p 9115:9115 --name blackbox_exporter -v `pwd`:/config prom/blackbox-exporter:master --config.file=/config/blackbox.yml |
prometheus 設定1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx] # Look for a HTTP 200 response.
static_configs:
- targets:
- https://xxx.web # Target to probe with https.
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.1.203:9115 # The blackbox exporter's real hostname:port.
- job_name: blackbox-ping
metrics_path: /probe
params:
module: [icmp]
static_configs:
- targets:
- 192.168.1.xx # <== Put here your targets
- 192.168.1.xx # <== Put here your targets
relabel_configs: # <== This comes from the blackbox exporter README
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.1.203:9115 # Blackbox exporter.
接下來重啟 dokcer-compose restart prometheus
就能解決了
調整為 docker-compose
1 | version: '3.3' |
參考:
How to ping targets using blackbox_exporter with prometheus - Stack Overflow
Blackbox Exporter 小記 - Potioneer’s Essays
网络探测:Blackbox Exporter - prometheus-book
grafana 7 监控https证书过期时间 - 海口-熟练工 - 博客园
官方 exporter 清單
Exporters and integrations | Prometheus
安裝 alertmanager-discord
首先 git clone 下來改 arm32v7 映像檔
golang build 參數調整
再 build 看看有沒有問題
1 | git clone https://github.com/benjojo/alertmanager-discord.git |
Dockerfile build 調整如下
中間有遇到一些狀況
1 | # Built following https://medium.com/@chemidy/create-the-smallest-and-secured-golang-docker-image-based-on-scratch -4752223b7324 |
1 | docker build -t alertmanager-discord . |
完成,之前我寫的 line 測試的 curl 用在這個會失敗
直接設定到 alertmanager 測試正常
直接docker run 跑起來吧
1 | docker run --rm -d -e DISCORD_WEBHOOK=https://discord.com/api/webhooks/xxxxxxx -p 9094:9094 alertmanager-discord |
alertmanager/config.yml1
2
3
4
5
6
7
8
9
10
11
12
13
14
15route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 10m
receiver: discord_webhook
receivers:
- name: 'live-monitoring'
#收邮件的邮箱
email_configs:
- to: 'your-email@163.com'
- name: 'discord_webhook'
webhook_configs:
- url: 'http://192.168.1.203:9094'
把一個 exporter 看會不會有任何通知
這邊注意
我用 discord-alertmanager 發現 job_name 不能設定 _
樣子
導致有時候發送訊息會異常!!
這邊要注意一下
接下來學習目標
如何設定 alertmanger
發現 PromQL 還是有學必要
因為設定 alertmanger 必須要會這個東西
還有細項動態設定 label …
先訂好一個大坑
- Prometheus監控神器-Rules篇 - ⎝⎛CodingNote.cc ⎞⎠
- Alertmanager 紀錄 - 系統工程師 Beck Yeh 的技術分享
- alertmanager/template at master · prometheus/alertmanager · GitHub
https://www.itread01.com/content/1545670504.html
暫時用的 alertmanager rule
參考一些規則:node_exporter 配置 - 簡書
1 | groups: |
最後有看到這個網站有一堆 rule 可以參考
Awesome Prometheus alerts | Collection of alerting rules
2021-04-14 發生問題
上 Grafana 看不到 Prometheus 資料
看 log 發現1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22prometheus | level=info ts=2021-04-14T12:32:25.883Z caller=head.go:768 component=tsdb msg="WAL segment loaded" segment=185 maxSegment=503
prometheus | panic: runtime error: invalid memory address or nil pointer dereference
prometheus | [signal SIGSEGV: segmentation violation code=0x1 addr=0xc pc=0x17127f8]
prometheus |
prometheus | goroutine 590 [running]:
prometheus | bufio.(*Writer).Available(...)
prometheus | /usr/local/go/src/bufio/bufio.go:624
prometheus | github.com/prometheus/prometheus/tsdb/chunks.(*ChunkDiskMapper).WriteChunk(0x3d39b90, 0x145c, 0x0, 0xcced3c4b, 0x178, 0xcd0878f3, 0x178, 0x26914c4, 0x4712680, 0x0, ...)
prometheus | /app/tsdb/chunks/head_chunks.go:291 +0x54c
prometheus | github.com/prometheus/prometheus/tsdb.(*memSeries).mmapCurrentHeadChunk(0x49bf340, 0x3d39b90)
prometheus | /app/tsdb/head.go:2230 +0x6c
prometheus | github.com/prometheus/prometheus/tsdb.(*memSeries).cutNewHeadChunk(0x49bf340, 0xcd08b38b, 0x178, 0x3d39b90, 0x0)
prometheus | /app/tsdb/head.go:2204 +0x24
prometheus | github.com/prometheus/prometheus/tsdb.(*memSeries).append(0x49bf340, 0xcd08b38b, 0x178, 0xc1422185, 0x3fe4f164, 0x0, 0x0, 0x3d39b90, 0x10001)
prometheus | /app/tsdb/head.go:2360 +0x3a8
prometheus | github.com/prometheus/prometheus/tsdb.(*Head).processWALSamples(0x3d22120, 0xccd1ba00, 0x178, 0x68a2180, 0x68a2140, 0x0, 0x0)
prometheus | /app/tsdb/head.go:425 +0x270
prometheus | github.com/prometheus/prometheus/tsdb.(*Head).loadWAL.func5(0x3d22120, 0x416dde0, 0x416ddf0, 0x68a2180, 0x68a2140)
prometheus | /app/tsdb/head.go:519 +0x40
prometheus | created by github.com/prometheus/prometheus/tsdb.(*Head).loadWAL
prometheus | /app/tsdb/head.go:518 +0x268
prometheus | level=info ts=2021-04-14T12:32:41.305Z caller=main.go:380 msg="No time or size retention was set so using the default time retention" duration=15d
因為我沒有用 healthcheck
後來沒看到有用的 healthcheck
web 介面都正常…
docker-compose restart 好像都失敗
所以docker-compose down -v
直接清掉重啟就正常…
swap 都設2GB
RAM 看起來都正常…
後來我猜測是 Prometheus 問題
把 volumes 清掉就正常
解決樹梅派遇到記憶體問題
2021-12-15
最近在回味重新看這篇時候,發現我沒有補上我調整完設定。不確定新版會不會修正,不過先留個紀錄。
主要是加這段- '--storage.tsdb.retention.size=500MB'
。
留個prometheus docker-compose 紀錄1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29services:
prometheus:
image: prom/prometheus-linux-armv7
container_name: prometheus
restart: always
volumes:
- /etc/localtime:/etc/localtime:ro
- $PWD/prometheus/:/etc/prometheus/
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
#- '--storage.tsdb.retention=90d'
#- '--storage.tsdb.min-block-duration=2h'
#- '--storage.tsdb.max-block-duration=2h'
- '--storage.tsdb.retention.size=500MB'
networks:
- monitoring
links:
- alertmanager
- cadvisor
expose:
- '9090'
ports:
- 9090:9090
depends_on:
- cadvisor
下面留個失敗紀錄
失敗紀錄
rpi4 docker panic: mmap, size 134217728: cannot allocate memory · Issue #8661 · prometheus/prometheus · GitHub後來我有找到這個跟我一樣問題我目前估計用docker-compose up 執行是跑舊版後來我跑最新版但是我沒清掉 volumes 關係?(忘記)看似 issue 問題也是升級問題但我是跑到第五天才遇到問題麻煩的是 alertmanager 沒發通知很容易沒注意到
確定我過5天後又發生問題
這邊當初我應該不用下docker-compose down -v
應該先 docker-compose stop prometheus
docker volume rm (volume_name)
才對
2021-04-25
最近又遇到這個問題
查了一下,可能跟 Raspberry PI OS 32 位元有關係
聽說64 位元能解決?!(不確定)
參考:
- Compaction running out of memory · Issue #7483 · prometheus/prometheus · GitHub
- 3G memory free yet still panic: mmap: cannot allocate memory · Issue #7450 · prometheus/prometheus · GitHub
- mmap: cannot allocate memory · Issue #4392 · prometheus/prometheus · GitHub
- any way to disable sync prometheus configuration? · Issue #3725 · prometheus-operator/prometheus-operator · GitHub
Had the same problem. Fixed it by adding a size limit for the storage with –storage.tsdb.retention.size=500MB.
Maybe on 32bit systems there could be a default/maximum value for storage.tsdb.retention.size, along with the warning.
https://github.com/prometheus/prometheus/issues/7483#issuecomment-670512677
這個我還沒測試
2021-05-03 測試中
Raspberry PI 3 B+
Raspbian Stretch Lite October 2018
prometheus, version 2.4.3+ds (branch: debian/sid, revision: 2.4.3+ds-2) installed from armhf deb package (https://packages.debian.org/sid/net/prometheus) (so I guess no 64bit on 32bit)
–storage.tsdb.retention=15y –storage.tsdb.min-block-duration=2h –storage.tsdb.max-block-duration=2h but problem also appears on default settings of min/max block durations.
https://github.com/prometheus/prometheus/issues/4392#issuecomment-433717839
目前使用這個方案觀察看看
2021-05-03 確定跑到後面會出問題
1 | prometheus: |