Prometheus 联邦与长期存储：大规模监控体系，从单集群到全局视图-港品优选

Prometheus 联邦与长期存储：大规模监控体系，从单集群到全局视图

一、单 Prometheus 的扩展瓶颈：数据保留与查询性能的双重限制

Prometheus 的本地存储（TSDB）设计为短期快速查询优化，默认保留 15 天数据。当监控规模增长到数千个实例、数百万条时间序列时，单实例面临两个硬性限制：一是存储容量——1 亿条活跃时间序列约需 1TB 磁盘，长期保留成本极高；二是查询性能——大范围时间查询（如 30 天趋势）在本地 TSDB 上耗时数十秒。

Prometheus 联邦（Federation）和远程存储（Remote Write）是解决扩展性的两种互补方案：联邦用于跨集群聚合全局视图，远程存储用于长期数据归档和复杂查询。

二、大规模监控的架构与数据流

flowchart TB A[集群 A: Prometheus] --> B[Remote Write] C[集群 B: Prometheus] --> D[Remote Write] E[集群 C: Prometheus] --> F[Remote Write] B --> G[Thanos Receive / Cortex] D --> G F --> G G --> H[对象存储: S3/MinIO] G --> I[长期查询: Thanos Query] A --> J[联邦拉取] C --> J E --> J J --> K[全局 Prometheus] K --> L[Grafana 全局视图] subgraph 短期查询 A C E end subgraph 长期存储 G H I end

联邦适用于"聚合全局指标"（如所有集群的总 QPS），远程存储适用于"长期趋势分析"（如 90 天的容量规划数据）。

三、生产级配置：Thanos 长期存储方案

# prometheus-with-remote-write.yaml — Prometheus 远程写入配置 # 设计意图：将指标实时写入 Thanos Receive，实现长期存储 apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config data: prometheus.yml: | global: scrape_interval: 15s evaluation_interval: 15s external_labels: cluster: 'cluster-a' replica: 'prometheus-0' # 远程写入：实时发送到 Thanos Receive remote_write: - url: http://thanos-receive:19291/api/v1/receive queue_config: max_samples_per_send: 500 max_shards: 10 capacity: 10000 # 发送前添加外部标签，区分数据来源 write_relabel_configs: - source_labels: [__name__] regex: 'go_.*' action: drop # 丢弃 Go 运行时指标，减少存储量 scrape_configs: - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true

# thanos-receive.yaml — Thanos Receive 部署 # 设计意图：接收 Prometheus 远程写入数据，存储到对象存储 apiVersion: apps/v1 kind: Deployment metadata: name: thanos-receive spec: replicas: 3 selector: matchLabels: app: thanos-receive template: metadata: labels: app: thanos-receive spec: containers: - name: thanos-receive image: thanosio/thanos:v0.34.0 args: - receive - --grpc-address=0.0.0.0:10901 - --http-address=0.0.0.0:10902 - --remote-write.address=0.0.0.0:19291 - --tsdb.path=/data - --tsdb.retention=48h # 本地保留 48 小时 - --label=receive_replica="$(POD_NAME)" - --objstore.config-file=/etc/thanos/objstore.yml env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name ports: - containerPort: 10901 - containerPort: 10902 - containerPort: 19291 volumeMounts: - name: data mountPath: /data - name: objstore-config mountPath: /etc/thanos volumes: - name: data emptyDir: {} - name: objstore-config configMap: name: thanos-objstore-config

# thanos-objstore-config.yaml — 对象存储配置 # 设计意图：Thanos 将历史数据上传到 S3 兼容存储 apiVersion: v1 kind: ConfigMap metadata: name: thanos-objstore-config data: objstore.yml: | type: s3 config: bucket: thanos-storage endpoint: minio.minio.svc:9000 access_key: ${MINIO_ACCESS_KEY} secret_key: ${MINIO_SECRET_KEY} insecure: true # 内网通信不使用 TLS

# thanos-query.yaml — Thanos Query 全局查询 # 设计意图：统一查询入口，同时查本地 TSDB 和对象存储 apiVersion: apps/v1 kind: Deployment metadata: name: thanos-query spec: replicas: 2 selector: matchLabels: app: thanos-query template: metadata: labels: app: thanos-query spec: containers: - name: thanos-query image: thanosio/thanos:v0.34.0 args: - query - --http-address=0.0.0.0:19192 - --grpc-address=0.0.0.0:10901 # 查询所有 Prometheus 实例 - --store=dnssrv+_grpc._tcp.prometheus.monitoring.svc.cluster.local # 查询 Thanos Receive - --store=dnssrv+_grpc._tcp.thanos-receive.monitoring.svc.cluster.local # 查询对象存储（通过 Store Gateway） - --store=dnssrv+_grpc._tcp.thanos-store-gateway.monitoring.svc.cluster.local # 自动下采样：大范围查询使用降采样数据加速 - --query.auto-downsampling ports: - containerPort: 19192 - containerPort: 10901

四、Trade-offs：长期存储的架构权衡

存储成本与查询性能。对象存储的成本远低于本地 SSD（约 1/10），但查询延迟更高（需从对象存储下载 Block）。Thanos 的自动下采样（Auto Downsampling）可以缓解——大范围查询使用 5 分钟或 1 小时粒度的降采样数据，牺牲精度换取速度。

Remote Write 的可靠性。Prometheus 的 Remote Write 使用内存队列缓冲数据，如果 Thanos Receive 不可用，队列满后数据会丢失。建议配置多个 Remote Write 端点做冗余，并监控队列积压指标（prometheus_remote_write_queue_pending_samples）。

数据一致性挑战。联邦拉取和远程写入可能产生重复数据（同一指标被多个路径采集）。Thanos 通过 external_labels 和 deduplication 去重，但需要正确配置 replica 标签。

运维复杂度。Thanos 组件（Receive、Query、Store Gateway、Compactor、Ruler）数量多，部署和运维成本高。对于中小规模（< 1000 实例），建议使用 Cortex 或 Mimir 等一体化方案，降低运维复杂度。

五、总结

Prometheus 联邦与长期存储是大规模监控体系的关键扩展能力。落地路径：第一步，为 Prometheus 配置 Remote Write，将数据实时写入集中存储；第二步，部署 Thanos Query 作为全局查询入口；第三步，配置对象存储和自动下采样，支持长期趋势查询；第四步，建立数据生命周期管理，自动清理过期数据。核心原则：短期查询走本地 TSDB（快），长期查询走对象存储（省），两者通过 Thanos Query 统一入口。

企业官网建设流程全解析