k8s监控方案实践(一):部署Prometheus与Node Exporter

发布于:2025-05-09 ⋅ 阅读:(14) ⋅ 点赞:(0)

k8s监控方案实践(一):部署Prometheus与Node Exporter


随着容器化和微服务架构的普及,系统架构日益复杂,传统监控工具难以胜任对多服务、多节点环境的全面可观测性需求。Prometheus 作为 Cloud Native Computing Foundation(CNCF) 的核心项目之一,因其灵活的数据模型、强大的查询语言(PromQL)以及对 Kubernetes 的天然支持,逐渐成为容器环境下主流的监控方案。

在本系列中,我们将围绕 Prometheus 在 Kubernetes 中的部署与实践展开介绍。本篇作为第一篇,将聚焦于 Prometheus 的基础部署,并集成 Node Exporter 实现对集群节点资源(如 CPU、内存、磁盘、网络等)的实时监控,为后续服务指标采集与告警体系搭建打下基础。

一、Prometheus简介

Prometheus 采用拉取模型(Pull-based)来定期采集被监控目标暴露的指标数据,所有数据以时间序列的形式存储在本地时间序列数据库中,支持高效压缩和快速查询。其核心组件包括:

  • Prometheus Server:主服务组件,负责服务发现、数据抓取、存储和 PromQL 查询处理
  • Exporter:用于暴露指标的采集器,常见如 Node Exporter(主机指标)、Kube-State-Metrics(K8s 状态)、Blackbox Exporter(探测)等
  • Alertmanager:负责接收和管理 Prometheus 的告警信息,并支持邮件、Slack、WebHook 等通知方式
  • Pushgateway(可选):用于支持短生命周期任务的指标上报,如定时任务脚本
  • Web UI 与 HTTP API:提供基础的可视化查询界面与外部系统接入能力

在 Kubernetes 环境下,Prometheus 可结合服务发现机制动态发现集群中的 Pod、Service、Node 等资源,从而自动完成监控目标的注册与指标抓取,非常适合用于监控容器化和微服务系统。

g.cn/direct/31da7451a2e34431b7ce7606e6722ebf.png)

二、Prometheus+Node Exporter实战部署

1. 创建Namespace(prometheus-namespace.yaml)

创建名为prometheus的命名空间,用于隔离部署监控相关资源

apiVersion: v1
kind: Namespace
metadata:
  name: prometheus

2. 创建Node Exporter DaemonSet(node-exporter.yaml)

以守护进程形式在所有节点部署Node Exporter,用于采集节点 CPU、内存、磁盘、网络等基础指标

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: prometheus
  labels:
    name: node-exporter
spec:
  selector:
    matchLabels:
     name: node-exporter
  template:
    metadata:
      labels:
        name: node-exporter
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true  #使用宿主机网络,便于 Prometheus 拉取指标
      containers:
      - name: node-exporter
        image: harbor.local/k8s/node-exporter:v1.7.0
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9100
        securityContext:
          privileged: true
        args:
        - --path.procfs
        - /host/proc
        - --path.sysfs
        - /host/sys
        - --collector.filesystem.ignored-mount-points
        - '"^/(sys|proc|dev|host|etc)($|/)"'
        volumeMounts:
        - name: dev
          mountPath: /host/dev
        - name: proc
          mountPath: /host/proc
        - name: sys
          mountPath: /host/sys
        - name: rootfs
          mountPath: /rootfs
      tolerations:
      - key: "node-role.kubernetes.io/control-plane"
        operator: "Exists"
        effect: "NoSchedule"
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: dev
          hostPath:
            path: /dev
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /root

3. 创建ServiceAccount(prometheus-serviceaccount.yaml)

为 Prometheus 创建专属的ServiceAccount,用于后续绑定权限

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: prometheus

4. 创建ClusterRoleBinding(prometheus-clusterrolebinding.yaml)

将集群管理员权限绑定到 Prometheus 的ServiceAccount,以便其能访问 Kubernetes API 拉取监控目标

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus-clusterrolebinding
subjects:
  - kind: ServiceAccount
    name: prometheus
    namespace: prometheus
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io

5. 创建ConfigMap(prometheus-config.yaml)

定义 Prometheus 的主配置文件prometheus.yml,配置抓取规则、服务发现方式、静态目标与黑盒探测等

---
kind: ConfigMap
apiVersion: v1
metadata:
  labels:
    app: prometheus
  name: prometheus-config
  namespace: prometheus
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      scrape_timeout: 10s
      evaluation_interval: 1m
    alerting:
      alertmanagers:
      - static_configs:
        - targets: ['alertmanager.prometheus.svc.cluster.local:9093']
    rule_files:
      - "rules/*.yml"
    scrape_configs:
    - job_name: 'kubernetes-node'
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:9100'
        target_label: __address__
        action: replace
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubernetes-node-cadvisor'
      kubernetes_sd_configs:
      - role:  node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    - job_name: 'kubernetes-apiserver'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name
    - job_name: kubernetes-etcd
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: keep
        regex: kube-system;etcd
        source_labels:
        - __meta_kubernetes_namespace
        - __meta_kubernetes_service_name      

    - job_name: 'k8s-cluster'
      scrape_interval: 15s
      static_configs:
        - targets: ['192.168.100.101:9100']
          labels:
            instance: 'master-100.101'
        - targets: ['192.168.100.102:9100']
          labels:
            instance: 'node1-100.102'
        - targets: ['192.168.100.103:9100']
          labels:
            instance: 'node2-100.103'

    - job_name: 'alertmanager'
      scrape_interval: 15s
      static_configs:
      - targets: ['alertmanager.prometheus.svc.cluster.local:9093']
 
    - job_name: "blackbox_http"
      metrics_path: /probe
      params:
        module: [http_2xx]
      static_configs:
        - targets:
          - https://www.baidu.com
      relabel_configs:
        - source_labels: [__address__]
          target_label: __param_target
        - source_labels: [__param_target]
          target_label: instance
        - target_label: __address__
          replacement: 192.168.100.104:9115

    - job_name: "blackbox_tcp"
      metrics_path: /probe
      params:
        module: [tcp_connect]
      static_configs:
        - targets: 
          - 192.168.100.101:3306
      relabel_configs:
        - source_labels: [__address__]
          target_label: __param_target
        - source_labels: [__param_target]
          target_label: instance
        - target_label: __address__
          replacement: 192.168.100.104:9115

5. 创建Service(prometheus-svc.yaml)

暴露 Prometheus Web UI 服务端口(9090),通过NodePort方式允许外部访问

apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: prometheus
  labels:
    app: prometheus
spec:
  type: NodePort
  ports:
    - port: 9090
      targetPort: 9090
      nodePort: 30001
      protocol: TCP
  selector:
    app: prometheus
    component: server

7. 创建Deployment(prometheus-deploy.yaml)

部署Prometheus Server,指定使用的配置文件和挂载方式,绑定上一步的ServiceAccount,支持高可用部署(当前为单实例)

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-server
  namespace: prometheus
  labels:
    app: prometheus
spec:
  replicas: 1  #单实例部署,可根据需要扩展为高可用
  selector:
    matchLabels:
      app: prometheus
      component: server
    #matchExpressions:
    #- {key: app, operator: In, values: [prometheus]}
    #- {key: component, operator: In, values: [server]}
  template:
    metadata:
      labels:
        app: prometheus
        component: server
      annotations:
        prometheus.io/scrape: 'false'
    spec:
      nodeName: node2    #固定部署在node2节点上(也可不指定)
      serviceAccountName: prometheus
      containers:
      - name: prometheus
        image: harbor.local/k8s/prometheus:v2.37.6
        imagePullPolicy: IfNotPresent
        command:
          - prometheus
          - --config.file=/etc/prometheus/prometheus.yml
          - --storage.tsdb.path=/prometheus  #设置本地数据存储路径
          - --storage.tsdb.retention=720h
          - --web.enable-lifecycle    #支持热更新配置
          - --web.enable-admin-api 
          - --storage.tsdb.retention.time=30d
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/prometheus/
          name: prometheus-config
        - mountPath: /prometheus/
          name: prometheus-storage-volume
        - mountPath: /etc/prometheus/rules
          name: prometheus-rules-volume
      volumes:
        - name: prometheus-config
          configMap:
            name: prometheus-config
        - name: prometheus-storage-volume
          hostPath:
          #需要去node2节点创建这两个目录并使用chmod -R 777授权
           path: /kubernetes/prometheus/data/  
           type: Directory
        - name: prometheus-rules-volume
          hostPath:
           path: /kubernetes/prometheus/rules  
           type: Directory

8. 部署所有资源

kubectl apply -f prometheus-namespace.yaml
kubectl apply -f node-exporter.yaml
kubectl apply -f prometheus-serviceaccount.yaml
kubectl apply -f prometheus-clusterrolebinding.yaml
kubectl apply -f prometheus-config.yaml
kubectl apply -f prometheus-svc.yaml
kubectl apply -f prometheus-deploy.yaml

三、验证部署

1. 验证Pod状态

kubectl get pod -n prometheus

在这里插入图片描述

2. 访问Prometheus Web UI

通过浏览器访问 http://ip:30001,进入 Prometheus Web UI

在这里插入图片描述


总结

🚀 本文介绍了如何在 Kubernetes 集群中部署 Prometheus,并结合 Node Exporter 实现对节点资源的基础监控。通过配置 ServiceAccount、RBAC 权限、Prometheus 配置文件以及 DaemonSet 部署 Node Exporter,完成了从数据采集到可视化的基本监控体系搭建。
✅ 在下一篇文章中,我们将继续完善监控体系,重点介绍如何在 Prometheus 中集成 Alertmanager,实现基于指标的自动化告警机制,包括告警规则配置、通知渠道设置(如邮件、Webhook 等)以及与 Prometheus 的联动流程,为运维体系提供实时的预警能力。


网站公告

今日签到

点亮在社区的每一天
去签到