prometheus+grafana搭建

发布于:2025-09-08 ⋅ 阅读:(14) ⋅ 点赞:(0)

部署 prometheus

安装

# 1,下载
wget https://github.com/prometheus/prometheus/releases/download/v2.45.1/prometheus-3.5.0.linux-amd64.tar.gz

# 2,部署
tar  -zxvf  prometheus-3.5.0.linux-amd64.tar.gz  -C   /opt/
cd   /opt/
mv  ./prometheus-3.5.0.linux-amd64   prometheus

# 3,验证
[root@prometheus prometheus]#  cd   /opt/prometheus
[root@prometheus prometheus]# ./prometheus  --version
prometheus, version 3.5.0 (branch: HEAD, revision: 8be3a9560fbdd18a94dedec4b747c35178177202)
  build user:       root@4451b64cb451
  build date:       20250714-16:15:23
  go version:       go1.24.5
  platform:         linux/amd64
  tags:             netgo,builtinassets

# 4,配置用户
groupadd  prometheus
useradd  -g  prometheus -s /sbin/nologin prometheus
chown -R  prometheus:prometheus /opt/prometheus/

# 5,创建prometheus运行数据目录
mkdir  -p  /opt/prometheus/data
chown -R prometheus:prometheus /opt/prometheus/data

配置文件

[root@prometheus prometheus]# cat prometheus.yml
# my global config
global:
  scrape_interval: 15s # 默认15s 全局每次数据收集的间隔 minute.
  evaluation_interval: 15s # 规则扫描时间间隔是15秒,默认不填写是 1分钟 minute.
  # scrape_timeout is set to the global default (10s). # 超时时间

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs: #默认规则
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]
       # The label name is added as a label `label_name=<label_value>` to any timeseries scraped from this config.
        labels:
          app: "prometheus"

创建脚本systemd管理

vim  /usr/lib/systemd/system/prometheus.service

[Unit]
Description=Prometheus
After=network.target

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/opt/prometheus/prometheus \
--config.file=/opt/prometheus/prometheus.yml \
--storage.tsdb.path=/opt/prometheus/data \
--storage.tsdb.retention.time=15d \
--web.console.templates=/opt/prometheus/consoles \
--web.console.libraries=/opt/prometheus/console_libraries \
--web.max-connections=512 \
--web.external-url "http://自己服务器IP地址:9090" \
--web.listen-address "0.0.0.0:9090" \
--web.enable-admin-api \
--web.enable-lifecycle
Restart=on-failure

[Install]
WantedBy=multi-user.target

启动验证

systemctl daemon-reload
systemctl enable prometheus
systemctl start  prometheus
systemctl status prometheus
# 查看服务端口
ss -tunlp | grep 9090

访问 http://自己服务器IP:9090

点击Endpoint目标的值,再从exporter具体能抓到的数据,随便复制一个值就好,比如go_gc_pauses_seconds_count

部署node_exporter

Node_exporter收集机器的系统数据,采用prometheus官方提供的exporte

安装

# 安装node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
tar -zxvf node_exporter-1.6.1.linux-amd64.tar.gz -C /opt/
cd /opt/
mv node_exporter-1.6.1.linux-amd64/  node_exporter

# 添加用户
groupadd prometheus
useradd -g prometheus -s /sbin/nologin prometheus
chown -R prometheus:prometheus /opt/node_exporter

# 设置开机启动
vim  /lib/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node_exporter
After=network.target prometheus.service

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/opt/node_exporter/node_exporter --web.listen-address=0.0.0.0:9101
Restart=on-failure

[Install]
WantedBy=multi-user.target

设置启动服务

systemctl daemon-reload
systemctl enable node_exporter
systemctl start node_exporter

添加 node_exporter到配置文件

cat >> prometheus.yml <<EOF
  - job_name: 'node'
    static_configs:
    - targets: ['IP地址:9101']
EOF

注意:这里添加的一定要在 /opt/prometheus/prometheus.yml文件中操作,否则会导致后续prometheus中没有node节点,grafana表盘中无数据

重启prometheus服务

systemctl restart prometheus.service

验证

查看监控指标 http://IP地址:9101/metrics

这里跟部署prometheus启动验证一样,随机拿一个数据验证。

查看target

部署grafana

安装配置

# 安装
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-10.2.0-1.x86_64.rpm
yum -y install grafana-enterprise-10.2.0-1.x86_64.rpm

# 设置开启自启
systemctl enable grafana-server
systemctl start grafana-server

登录访问

访问:http://IP地址:3000,默认账号/密码:admin/admin,首次登陆需要修改默认的管理员密码

添加数据

这里URL填http:localhost:9090/或者http:IP地址:9090/都行

然后点击保存

导入仪表盘

new->import

这里1是填官方提供的表盘形式,填写11074或者16098都行

这里名字随便填,2默认,3点击后有个prometheus标志出来点击就行。然后import

可能遇到的问题

xshell无法传文件给远程主机

原因:远程服务器中没有安装响应相对应的驱动

解决办法:服务器安转就行

yum install lrzsz

下载超时

原因:服务器访问github比较慢或者访问不上

解决方法:修改网络文件或者直接下载相应的文件然后用xshell等远程工具上传到服务器

grafana仪表盘无数据

检查部署node_exporter步骤时,验证的时候是否会有node节点出现。没有出现则在配置node_exporter文件步骤中,prometheus.yml文件是否配置正确,注意scrape_configs:节点下

 - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["IP地址:9090"]
       # The label name is added as a label `label_name=<label_value>` to any timeseries scraped from this config.
        labels:
          app: "prometheus"
 
  - job_name: 'node'
    static_configs:
    - targets: ['IP地址:9101']

在grafana中dashboards中url是否配置正确

无法打开prometheus和grafana网站

用的阿里云服务器做的实验的话,查看安全组是否放行了9090,9101等这些端口。

参考文章

prometheus+grafana部署


网站公告

今日签到

点亮在社区的每一天
去签到