Scraping custom metrics

Let groundcover automatically scrape your custom metrics

groundcover can scrape your custom metrics by deploying a metrics scraper (vmagent by Victoria Metrics) that will automatically scrape prometheus targets.

vmagent is fully compatible with prometheus scrape job syntax - more can be found here.

Enabling custom metrics scraping

The following helm override enables custom metrics scraping

custom-metrics:
  enabled: true

Using CLI

Scrape your custom metrics using groundcover CLI (using default scrape jobs):

groundcover deploy --custom-metrics

Using Helm

Using Helm (Upgrading existing installation)

helm upgrade groundcover -n groundcover -f <overrides.yaml> --reuse-values

Either create a new custom-values.yaml or edit your existing groundcover values.yaml:

global:
  logs:
    retention: 3d # {amount}[h(ours), d(ays), w(eeks), y(ears)]
  traces:
    retention: 24h # {amount}[h(ours), d(ays), w(eeks), y(ears)]  

victoria-metrics-single:
  server:
    # -- Data retention period for metrics, {amount}[h(ours), d(ays), w(eeks), y(ears)], default is 7d
    retentionPeriod: 7d

Autodiscovery

Ensure that the Kubernetes resources that contain your Prometheus exporters have been deployed with the following annotations to enable scraping

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "<port>"
  #optinoal if path is not /metrics
  prometheus.io/path: "<metrics-path>"

by default, the following scrape jobs are deployed

custom-metrics:
  enabled: true
  config:
    scrape_configs:
      ## COPY from Prometheus helm chart https://github.com/helm/charts/blob/master/stable/prometheus/values.yaml

      # Scrape config for API servers.
      #
      # Kubernetes exposes API servers as endpoints to the default/kubernetes
      # service so this uses `endpoints` role and uses relabelling to only keep
      # the endpoints associated with the default/kubernetes service using the
      # default named port `https`. This works for single API server deployments as
      # well as HA API server deployments.
      - job_name: "kubernetes-apiservers"
        honor_labels: true  
        kubernetes_sd_configs:
          - role: endpointslices
        # Default to scraping over https. If required, just disable this or change to
        # `http`.
        scheme: https
        # This TLS & bearer token file config is used to connect to the actual scrape
        # endpoints for cluster components. This is separate to discovery auth
        # configuration because discovery & scraping are two separate concerns in
        # Prometheus. The discovery auth config is automatic if Prometheus runs inside
        # the cluster. Otherwise, more config options have to be provided within the
        # <kubernetes_sd_config>.
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          # If your node certificates are self-signed or use a different CA to the
          # master CA, then disable certificate verification below. Note that
          # certificate verification is an integral part of a secure infrastructure
          # so this should only be disabled in a controlled environment. You can
          # disable certificate verification by uncommenting the line below.
          #
          insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        # Keep only the default/kubernetes service endpoints for the https port. This
        # will add targets for each API server which Kubernetes adds an endpoint to
        # the default/kubernetes service.
        relabel_configs:
          - action: drop
            source_labels:
              - __meta_kubernetes_pod_label_app_kubernetes_io_part_of
            regex: groundcover
          - source_labels:
              [
                __meta_kubernetes_namespace,
                __meta_kubernetes_service_name,
                __meta_kubernetes_endpoint_port_name,
              ]
            action: keep
            regex: default;kubernetes;https
      - job_name: "kubernetes-nodes"
        honor_labels: true
        # Default to scraping over https. If required, just disable this or change to
        # `http`.
        scheme: https
        # This TLS & bearer token file config is used to connect to the actual scrape
        # endpoints for cluster components. This is separate to discovery auth
        # configuration because discovery & scraping are two separate concerns in
        # Prometheus. The discovery auth config is automatic if Prometheus runs inside
        # the cluster. Otherwise, more config options have to be provided within the
        # <kubernetes_sd_config>.
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          # If your node certificates are self-signed or use a different CA to the
          # master CA, then disable certificate verification below. Note that
          # certificate verification is an integral part of a secure infrastructure
          # so this should only be disabled in a controlled environment. You can
          # disable certificate verification by uncommenting the line below.
          #
          insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - action: drop
            source_labels:
              - __meta_kubernetes_pod_label_app_kubernetes_io_part_of
            regex: groundcover
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/$1/proxy/metrics
      # Scrape config for service endpoints.
      #
      # The relabeling allows the actual service scrape endpoint to be configured
      # via the following annotations:
      #
      # * `prometheus.io/scrape`: Only scrape services that have a value of `true`
      # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
      # to set this to `https` & most likely set the `tls_config` of the scrape config.
      # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
      # * `prometheus.io/port`: If the metrics are exposed on a different port to the
      # service then set this appropriately.
      - job_name: "kubernetes-service-endpoints"
        honor_labels: true
        kubernetes_sd_configs:
          - role: endpointslices
        relabel_configs:
          - action: drop
            source_labels:
              - __meta_kubernetes_pod_label_app_kubernetes_io_part_of
            regex: groundcover
          - action: drop
            source_labels: [__meta_kubernetes_pod_container_init]
            regex: true
          - action: keep_if_equal
            source_labels:
              [
                __meta_kubernetes_service_annotation_prometheus_io_port,
                __meta_kubernetes_pod_container_port_number,
              ]
          - source_labels:
              [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels:
              [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels:
              [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels:
              [
                __address__,
                __meta_kubernetes_service_annotation_prometheus_io_port,
              ]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_pod_name]
            target_label: pod
          - source_labels: [__meta_kubernetes_pod_container_name]
            target_label: container
          - source_labels: [__meta_kubernetes_namespace]
            target_label: namespace
          - source_labels: [__meta_kubernetes_service_name]
            target_label: service
          - source_labels: [__meta_kubernetes_service_name]
            target_label: job
            replacement: ${1}
          - source_labels: [__meta_kubernetes_pod_node_name]
            action: replace
            target_label: node
      # Scrape config for slow service endpoints; same as above, but with a larger
      # timeout and a larger interval
      #
      # The relabeling allows the actual service scrape endpoint to be configured
      # via the following annotations:
      #
      # * `prometheus.io/scrape-slow`: Only scrape services that have a value of `true`
      # * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
      # to set this to `https` & most likely set the `tls_config` of the scrape config.
      # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
      # * `prometheus.io/port`: If the metrics are exposed on a different port to the
      # service then set this appropriately.
      - job_name: "kubernetes-service-endpoints-slow"
        honor_labels: true
        scrape_interval: 5m
        scrape_timeout: 30s
        kubernetes_sd_configs:
          - role: endpointslices
        relabel_configs:
          - action: drop
            source_labels:
              - __meta_kubernetes_pod_label_app_kubernetes_io_part_of
            regex: groundcover
          - action: drop
            source_labels: [__meta_kubernetes_pod_container_init]
            regex: true
          - action: keep_if_equal
            source_labels:
              [
                __meta_kubernetes_service_annotation_prometheus_io_port,
                __meta_kubernetes_pod_container_port_number,
              ]
          - source_labels:
              [__meta_kubernetes_service_annotation_prometheus_io_scrape_slow]
            action: keep
            regex: true
          - source_labels:
              [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels:
              [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels:
              [
                __address__,
                __meta_kubernetes_service_annotation_prometheus_io_port,
              ]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_pod_name]
            target_label: pod
          - source_labels: [__meta_kubernetes_pod_container_name]
            target_label: container
          - source_labels: [__meta_kubernetes_namespace]
            target_label: namespace
          - source_labels: [__meta_kubernetes_service_name]
            target_label: service
          - source_labels: [__meta_kubernetes_service_name]
            target_label: job
            replacement: ${1}
          - source_labels: [__meta_kubernetes_pod_node_name]
            action: replace
            target_label: node
      # Example scrape config for probing services via the Blackbox Exporter.
      #
      # The relabeling allows the actual service scrape endpoint to be configured
      # via the following annotations:
      #
      # * `prometheus.io/probe`: Only probe services that have a value of `true`
      - job_name: "kubernetes-services"
        honor_labels: true
        metrics_path: /probe
        params:
          module: [http_2xx]
        kubernetes_sd_configs:
          - role: service
        relabel_configs:
          - action: drop
            source_labels:
              - __meta_kubernetes_pod_label_app_kubernetes_io_part_of
            regex: groundcover
          - source_labels:
              [__meta_kubernetes_service_annotation_prometheus_io_probe]
            action: keep
            regex: true
          - source_labels: [__address__]
            target_label: __param_target
          - target_label: __address__
            replacement: blackbox
          - source_labels: [__param_target]
            target_label: instance
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            target_label: namespace
          - source_labels: [__meta_kubernetes_service_name]
            target_label: service
      # Example scrape config for pods
      #
      # The relabeling allows the actual pod scrape endpoint to be configured via the
      # following annotations:
      #
      # * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
      # * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
      # * `prometheus.io/port`: Scrape the pod on the indicated port instead of the default of `9102`.
      - job_name: "kubernetes-pods"
        honor_labels: true
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - action: drop
            source_labels:
              - __meta_kubernetes_pod_label_app_kubernetes_io_part_of
            regex: groundcover
          - action: drop
            source_labels: [__meta_kubernetes_pod_container_init]
            regex: true
          - action: keep_if_equal
            source_labels:
              [
                __meta_kubernetes_pod_annotation_prometheus_io_port,
                __meta_kubernetes_pod_container_port_number,
              ]
          - source_labels:
              [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels:
              [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - source_labels: [__meta_kubernetes_pod_name]
            target_label: pod
          - source_labels: [__meta_kubernetes_pod_container_name]
            target_label: container
          - source_labels: [__meta_kubernetes_namespace]
            target_label: namespace
          - source_labels: [__meta_kubernetes_service_name]
            target_label: service
          - source_labels: [__meta_kubernetes_service_name]
            target_label: job
            replacement: ${1}
          - source_labels: [__meta_kubernetes_pod_node_name]
            action: replace
            target_label: node

Disable autodiscovery scrape jobs

In case you're interested in disabling autodiscovery scrape jobs, provide the following override

custom-metrics:
  enabled: true
  config:
    scrape_configs: []

Using custom scrape jobs

in case you're interested in deploying custom scrape jobs, create/add the following override

custom-metrics:
  enabled: true
  extraScrapeConfigs:
  - job_name: "custom-scrape-job"
  ...

Cardinality limits

In order to safeguard groundcover's performance, there are default limitations on metrics ingestion in place.

remoteWrite.maxDailySeries: "1000000" # limiting daily churn rate.
remoteWrite.maxHourlySeries: "100000" # limiting the number of active time series.

Increasing metrics cardinality

In order to increase metrics resolution, you can implement the following overrides

Increasing cardinality parameters will increase memory/cpu consumption and might cause OOMKills/CPU Throttling.

Please use with caution and increase the custom metrics agent/metrics server resources accordingly

custom-metrics:
  enabled: true
  extraArgs:
    remoteWrite.maxDailySeries: "<custom value>"
    remoteWrite.maxHourlySeries: "<custom value>"

In case you wish to increase metrics server / custom metrics resources, use the following overrides:

custom-metrics:
  resources:
    limits:
      cpu: <>
      memory: <>
    requests:
      cpu: <>
      memory: <>

victoria-metrics-single:
  server:
    resources:
      limits:
        cpu: <>
        memory: <>
      requests:
        cpu: <>
        memory: <>

Accessing the custom metrics

once custom metrics module is up and running, you can access them using groundcover's dashboard panel

  • Create a new dashboard

  • Create a new panel

  • Select the cluster with custom-metrics enabled

  • Custom metrics will be available

Last updated