Dashboards reference

This document contains a complete reference on Sourcegraph’s available dashboards, as well as details on how to interpret the panels and metrics.

To learn more about Sourcegraph’s metrics and how to view these dashboards, see our metrics guide.

Frontend

Serves all end-user browser and API requests.

Frontend: Search at a glance

frontend: 99th_percentile_search_request_duration

This panel indicates 99th percentile successful search request duration over 5m.

Managed by the Sourcegraph Search team.


frontend: 90th_percentile_search_request_duration

This panel indicates 90th percentile successful search request duration over 5m.

Managed by the Sourcegraph Search team.


frontend: hard_timeout_search_responses

This panel indicates hard timeout search responses every 5m.

Managed by the Sourcegraph Search team.


frontend: hard_error_search_responses

This panel indicates hard error search responses every 5m.

Managed by the Sourcegraph Search team.


frontend: partial_timeout_search_responses

This panel indicates partial timeout search responses every 5m.

Managed by the Sourcegraph Search team.


frontend: search_alert_user_suggestions

This panel indicates search alert user suggestions shown every 5m.

Managed by the Sourcegraph Search team.


frontend: page_load_latency

This panel indicates 90th percentile page load latency over all routes over 10m.

Managed by the Sourcegraph Core application team.


frontend: blob_load_latency

This panel indicates 90th percentile blob load latency over 10m.

Managed by the Sourcegraph Core application team.


Frontend: Search-based code intelligence at a glance

frontend: 99th_percentile_search_codeintel_request_duration

This panel indicates 99th percentile code-intel successful search request duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: 90th_percentile_search_codeintel_request_duration

This panel indicates 90th percentile code-intel successful search request duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: hard_timeout_search_codeintel_responses

This panel indicates hard timeout search code-intel responses every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: hard_error_search_codeintel_responses

This panel indicates hard error search code-intel responses every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: partial_timeout_search_codeintel_responses

This panel indicates partial timeout search code-intel responses every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: search_codeintel_alert_user_suggestions

This panel indicates search code-intel alert user suggestions shown every 5m.

Managed by the Sourcegraph Code-intelligence team.


Frontend: Search API usage at a glance

frontend: 99th_percentile_search_api_request_duration

This panel indicates 99th percentile successful search API request duration over 5m.

Managed by the Sourcegraph Search team.


frontend: 90th_percentile_search_api_request_duration

This panel indicates 90th percentile successful search API request duration over 5m.

Managed by the Sourcegraph Search team.


frontend: hard_timeout_search_api_responses

This panel indicates hard timeout search API responses every 5m.

Managed by the Sourcegraph Search team.


frontend: hard_error_search_api_responses

This panel indicates hard error search API responses every 5m.

Managed by the Sourcegraph Search team.


frontend: partial_timeout_search_api_responses

This panel indicates partial timeout search API responses every 5m.

Managed by the Sourcegraph Search team.


frontend: search_api_alert_user_suggestions

This panel indicates search API alert user suggestions shown every 5m.

Managed by the Sourcegraph Search team.


Frontend: Codeintel: Precise code intelligence usage at a glance

frontend: codeintel_resolvers_total

This panel indicates aggregate graphql operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_resolvers_99th_percentile_duration

This panel indicates 99th percentile successful aggregate graphql operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_resolvers_errors_total

This panel indicates aggregate graphql operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_resolvers_error_rate

This panel indicates aggregate graphql operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_resolvers_total

This panel indicates graphql operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_resolvers_99th_percentile_duration

This panel indicates 99th percentile successful graphql operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_resolvers_errors_total

This panel indicates graphql operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_resolvers_error_rate

This panel indicates graphql operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Frontend: Codeintel: Auto-index enqueuer

frontend: codeintel_autoindex_enqueuer_total

This panel indicates aggregate enqueuer operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_autoindex_enqueuer_99th_percentile_duration

This panel indicates 99th percentile successful aggregate enqueuer operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_autoindex_enqueuer_errors_total

This panel indicates aggregate enqueuer operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_autoindex_enqueuer_error_rate

This panel indicates aggregate enqueuer operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_autoindex_enqueuer_total

This panel indicates enqueuer operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_autoindex_enqueuer_99th_percentile_duration

This panel indicates 99th percentile successful enqueuer operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_autoindex_enqueuer_errors_total

This panel indicates enqueuer operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_autoindex_enqueuer_error_rate

This panel indicates enqueuer operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Frontend: Codeintel: dbstore stats

frontend: codeintel_dbstore_total

This panel indicates aggregate store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_dbstore_99th_percentile_duration

This panel indicates 99th percentile successful aggregate store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_dbstore_errors_total

This panel indicates aggregate store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_dbstore_error_rate

This panel indicates aggregate store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_dbstore_total

This panel indicates store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_dbstore_99th_percentile_duration

This panel indicates 99th percentile successful store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_dbstore_errors_total

This panel indicates store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_dbstore_error_rate

This panel indicates store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Frontend: Workerutil: lsif_indexes dbworker/store stats

frontend: workerutil_dbworker_store_codeintel_index_total

This panel indicates store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: workerutil_dbworker_store_codeintel_index_99th_percentile_duration

This panel indicates 99th percentile successful store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: workerutil_dbworker_store_codeintel_index_errors_total

This panel indicates store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: workerutil_dbworker_store_codeintel_index_error_rate

This panel indicates store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Frontend: Codeintel: lsifstore stats

frontend: codeintel_lsifstore_total

This panel indicates aggregate store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_lsifstore_99th_percentile_duration

This panel indicates 99th percentile successful aggregate store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_lsifstore_errors_total

This panel indicates aggregate store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_lsifstore_error_rate

This panel indicates aggregate store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_lsifstore_total

This panel indicates store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_lsifstore_99th_percentile_duration

This panel indicates 99th percentile successful store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_lsifstore_errors_total

This panel indicates store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_lsifstore_error_rate

This panel indicates store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Frontend: Codeintel: gitserver client

frontend: codeintel_gitserver_total

This panel indicates aggregate client operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_gitserver_99th_percentile_duration

This panel indicates 99th percentile successful aggregate client operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_gitserver_errors_total

This panel indicates aggregate client operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_gitserver_error_rate

This panel indicates aggregate client operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_gitserver_total

This panel indicates client operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_gitserver_99th_percentile_duration

This panel indicates 99th percentile successful client operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_gitserver_errors_total

This panel indicates client operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_gitserver_error_rate

This panel indicates client operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Frontend: Codeintel: uploadstore stats

frontend: codeintel_uploadstore_total

This panel indicates aggregate store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_uploadstore_99th_percentile_duration

This panel indicates 99th percentile successful aggregate store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_uploadstore_errors_total

This panel indicates aggregate store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_uploadstore_error_rate

This panel indicates aggregate store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_uploadstore_total

This panel indicates store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_uploadstore_99th_percentile_duration

This panel indicates 99th percentile successful store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_uploadstore_errors_total

This panel indicates store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_uploadstore_error_rate

This panel indicates store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Frontend: Batches: dbstore stats

frontend: batches_dbstore_total

This panel indicates aggregate store operations every 5m.

Managed by the Sourcegraph Batches team.


frontend: batches_dbstore_99th_percentile_duration

This panel indicates 99th percentile successful aggregate store operation duration over 5m.

Managed by the Sourcegraph Batches team.


frontend: batches_dbstore_errors_total

This panel indicates aggregate store operation errors every 5m.

Managed by the Sourcegraph Batches team.


frontend: batches_dbstore_error_rate

This panel indicates aggregate store operation error rate over 5m.

Managed by the Sourcegraph Batches team.


frontend: batches_dbstore_total

This panel indicates store operations every 5m.

Managed by the Sourcegraph Batches team.


frontend: batches_dbstore_99th_percentile_duration

This panel indicates 99th percentile successful store operation duration over 5m.

Managed by the Sourcegraph Batches team.


frontend: batches_dbstore_errors_total

This panel indicates store operation errors every 5m.

Managed by the Sourcegraph Batches team.


frontend: batches_dbstore_error_rate

This panel indicates store operation error rate over 5m.

Managed by the Sourcegraph Batches team.


Frontend: Out-of-band migrations: up migration invocation (one batch processed)

frontend: oobmigration_total

This panel indicates migration handler operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: oobmigration_99th_percentile_duration

This panel indicates 99th percentile successful migration handler operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: oobmigration_errors_total

This panel indicates migration handler operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: oobmigration_error_rate

This panel indicates migration handler operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Frontend: Out-of-band migrations: down migration invocation (one batch processed)

frontend: oobmigration_total

This panel indicates migration handler operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: oobmigration_99th_percentile_duration

This panel indicates 99th percentile successful migration handler operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: oobmigration_errors_total

This panel indicates migration handler operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: oobmigration_error_rate

This panel indicates migration handler operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Frontend: Internal service requests

frontend: internal_indexed_search_error_responses

This panel indicates internal indexed search error responses every 5m.

Managed by the Sourcegraph Search team.


frontend: internal_unindexed_search_error_responses

This panel indicates internal unindexed search error responses every 5m.

Managed by the Sourcegraph Search team.


frontend: internal_api_error_responses

This panel indicates internal API error responses every 5m by route.

Managed by the Sourcegraph Core application team.


frontend: 99th_percentile_gitserver_duration

This panel indicates 99th percentile successful gitserver query duration over 5m.

Managed by the Sourcegraph Core application team.


frontend: gitserver_error_responses

This panel indicates gitserver error responses every 5m.

Managed by the Sourcegraph Core application team.


frontend: observability_test_alert_warning

This panel indicates warning test alert metric.

Managed by the Sourcegraph Distribution team.


frontend: observability_test_alert_critical

This panel indicates critical test alert metric.

Managed by the Sourcegraph Distribution team.


Frontend: Database connections

frontend: max_open_conns

This panel indicates maximum open.

Managed by the Sourcegraph Core application team.


frontend: open_conns

This panel indicates established.

Managed by the Sourcegraph Core application team.


frontend: in_use

This panel indicates used.

Managed by the Sourcegraph Core application team.


frontend: idle

This panel indicates idle.

Managed by the Sourcegraph Core application team.


frontend: mean_blocked_seconds_per_conn_request

This panel indicates mean blocked seconds per conn request.

Managed by the Sourcegraph Core application team.


frontend: closed_max_idle

This panel indicates closed by SetMaxIdleConns.

Managed by the Sourcegraph Core application team.


frontend: closed_max_lifetime

This panel indicates closed by SetConnMaxLifetime.

Managed by the Sourcegraph Core application team.


frontend: closed_max_idle_time

This panel indicates closed by SetConnMaxIdleTime.

Managed by the Sourcegraph Core application team.


Frontend: Container monitoring (not available on server)

frontend: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod (frontend|sourcegraph-frontend) (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p (frontend|sourcegraph-frontend).
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' (frontend|sourcegraph-frontend) (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the (frontend|sourcegraph-frontend) container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs (frontend|sourcegraph-frontend) (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Core application team.


frontend: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Core application team.


frontend: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Core application team.


frontend: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Frontend: Provisioning indicators (not available on server)

frontend: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Core application team.


frontend: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Core application team.


frontend: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Core application team.


frontend: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Core application team.


Frontend: Golang runtime monitoring

frontend: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Core application team.


frontend: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Core application team.


Frontend: Kubernetes monitoring (only available on Kubernetes)

frontend: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Core application team.


Frontend: Sentinel queries (only on sourcegraph.com)

frontend: mean_successful_sentinel_duration_5m

This panel indicates mean successful sentinel search duration over 5m.

Managed by the Sourcegraph Search team.


frontend: mean_sentinel_stream_latency_5m

This panel indicates mean sentinel stream latency over 5m.

Managed by the Sourcegraph Search team.


frontend: 90th_percentile_successful_sentinel_duration_5m

This panel indicates 90th percentile successful sentinel search duration over 5m.

Managed by the Sourcegraph Search team.


frontend: 90th_percentile_sentinel_stream_latency_5m

This panel indicates 90th percentile sentinel stream latency over 5m.

Managed by the Sourcegraph Search team.


frontend: mean_successful_sentinel_duration_by_query_5m

This panel indicates mean successful sentinel search duration by query over 5m.

  • The mean search duration for sentinel queries, broken down by query. Useful for debugging whether a slowdown is limited to a specific type of query.

Managed by the Sourcegraph Search team.


frontend: mean_sentinel_stream_latency_by_query_5m

This panel indicates mean sentinel stream latency by query over 5m.

  • The mean streaming search latency for sentinel queries, broken down by query. Useful for debugging whether a slowdown is limited to a specific type of query.

Managed by the Sourcegraph Search team.


frontend: unsuccessful_status_rate_5m

This panel indicates unsuccessful status rate per 5m.

  • The rate of unsuccessful sentinel query, broken down by failure type

Managed by the Sourcegraph Search team.


Git Server

Stores, manages, and operates Git repositories.

gitserver: memory_working_set

This panel indicates memory working set.

Managed by the Sourcegraph Core application team.


gitserver: go_routines

This panel indicates go routines.

Managed by the Sourcegraph Core application team.


gitserver: cpu_throttling_time

This panel indicates container CPU throttling time %.

Managed by the Sourcegraph Core application team.


gitserver: cpu_usage_seconds

This panel indicates cpu usage seconds.

Managed by the Sourcegraph Core application team.


gitserver: disk_space_remaining

This panel indicates disk space remaining by instance.

Managed by the Sourcegraph Core application team.


gitserver: io_reads_total

This panel indicates i/o reads total.

Managed by the Sourcegraph Core application team.


gitserver: io_writes_total

This panel indicates i/o writes total.

Managed by the Sourcegraph Core application team.


gitserver: io_reads

This panel indicates i/o reads.

Managed by the Sourcegraph Core application team.


gitserver: io_writes

This panel indicates i/o writes.

Managed by the Sourcegraph Core application team.


gitserver: io_read_througput

This panel indicates i/o read throughput.

Managed by the Sourcegraph Core application team.


gitserver: io_write_throughput

This panel indicates i/o write throughput.

Managed by the Sourcegraph Core application team.


gitserver: running_git_commands

This panel indicates git commands running on each gitserver instance.

A high value signals load.

Managed by the Sourcegraph Core application team.


gitserver: git_commands_received

This panel indicates rate of git commands received across all instances.

per second rate per command across all instances

Managed by the Sourcegraph Core application team.


gitserver: repository_clone_queue_size

This panel indicates repository clone queue size.

Managed by the Sourcegraph Core application team.


gitserver: repository_existence_check_queue_size

This panel indicates repository existence check queue size.

Managed by the Sourcegraph Core application team.


gitserver: echo_command_duration_test

This panel indicates echo test command duration.

A high value here likely indicates a problem, especially if consistently high. You can query for individual commands using sum by (cmd)(src_gitserver_exec_running) in Grafana (/-/debug/grafana) to see if a specific Git Server command might be spiking in frequency.

If this value is consistently high, consider the following:

  • Single container deployments: Upgrade to a Docker Compose deployment which offers better scalability and resource isolation.
  • Kubernetes and Docker Compose: Check that you are running a similar number of git server replicas and that their CPU/memory limits are allocated according to what is shown in the Sourcegraph resource estimator.

Managed by the Sourcegraph Core application team.


gitserver: frontend_internal_api_error_responses

This panel indicates frontend-internal API error responses every 5m by route.

Managed by the Sourcegraph Core application team.


Git Server: Gitserver cleanup jobs

gitserver: janitor_running

This panel indicates if the janitor process is running.

1, if the janitor process is currently running

Managed by the Sourcegraph Core application team.


gitserver: janitor_job_duration

This panel indicates 95th percentile job run duration.

95th percentile job run duration

Managed by the Sourcegraph Core application team.


gitserver: repos_removed

This panel indicates repositories removed due to disk pressure.

Repositories removed due to disk pressure

Managed by the Sourcegraph Core application team.


Git Server: Codeintel: Coursier invocation stats

gitserver: codeintel_coursier_total

This panel indicates aggregate invocations operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


gitserver: codeintel_coursier_99th_percentile_duration

This panel indicates 99th percentile successful aggregate invocations operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


gitserver: codeintel_coursier_errors_total

This panel indicates aggregate invocations operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


gitserver: codeintel_coursier_error_rate

This panel indicates aggregate invocations operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


gitserver: codeintel_coursier_total

This panel indicates invocations operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


gitserver: codeintel_coursier_99th_percentile_duration

This panel indicates 99th percentile successful invocations operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


gitserver: codeintel_coursier_errors_total

This panel indicates invocations operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


gitserver: codeintel_coursier_error_rate

This panel indicates invocations operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Git Server: Database connections

gitserver: max_open_conns

This panel indicates maximum open.

Managed by the Sourcegraph Core application team.


gitserver: open_conns

This panel indicates established.

Managed by the Sourcegraph Core application team.


gitserver: in_use

This panel indicates used.

Managed by the Sourcegraph Core application team.


gitserver: idle

This panel indicates idle.

Managed by the Sourcegraph Core application team.


gitserver: mean_blocked_seconds_per_conn_request

This panel indicates mean blocked seconds per conn request.

Managed by the Sourcegraph Core application team.


gitserver: closed_max_idle

This panel indicates closed by SetMaxIdleConns.

Managed by the Sourcegraph Core application team.


gitserver: closed_max_lifetime

This panel indicates closed by SetConnMaxLifetime.

Managed by the Sourcegraph Core application team.


gitserver: closed_max_idle_time

This panel indicates closed by SetConnMaxIdleTime.

Managed by the Sourcegraph Core application team.


Git Server: Container monitoring (not available on server)

gitserver: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod gitserver (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p gitserver.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' gitserver (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the gitserver container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs gitserver (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Core application team.


gitserver: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Core application team.


gitserver: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Core application team.


gitserver: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Git Server: Provisioning indicators (not available on server)

gitserver: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Core application team.


gitserver: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Git Server is expected to use up all the memory it is provided.

Managed by the Sourcegraph Core application team.


gitserver: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Core application team.


gitserver: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Git Server is expected to use up all the memory it is provided.

Managed by the Sourcegraph Core application team.


Git Server: Golang runtime monitoring

gitserver: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Core application team.


gitserver: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Core application team.


Git Server: Kubernetes monitoring (only available on Kubernetes)

gitserver: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Core application team.


GitHub Proxy

Proxies all requests to github.com, keeping track of and managing rate limits.

GitHub Proxy: GitHub API monitoring

github-proxy: github_proxy_waiting_requests

This panel indicates number of requests waiting on the global mutex.

Managed by the Sourcegraph Core application team.


GitHub Proxy: Container monitoring (not available on server)

github-proxy: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod github-proxy (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p github-proxy.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' github-proxy (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the github-proxy container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs github-proxy (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Core application team.


github-proxy: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Core application team.


github-proxy: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Core application team.


github-proxy: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


GitHub Proxy: Provisioning indicators (not available on server)

github-proxy: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Core application team.


github-proxy: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Core application team.


github-proxy: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Core application team.


github-proxy: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Core application team.


GitHub Proxy: Golang runtime monitoring

github-proxy: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Core application team.


github-proxy: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Core application team.


GitHub Proxy: Kubernetes monitoring (only available on Kubernetes)

github-proxy: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Core application team.


Postgres

Postgres metrics, exported from postgres_exporter (only available on Kubernetes).

postgres: connections

This panel indicates active connections.

Managed by the Sourcegraph Core application team.


postgres: transaction_durations

This panel indicates maximum transaction durations.

Managed by the Sourcegraph Core application team.


Postgres: Database and collector status

postgres: postgres_up

This panel indicates database availability.

A non-zero value indicates the database is online.

Managed by the Sourcegraph Core application team.


postgres: invalid_indexes

This panel indicates invalid indexes (unusable by the query planner).

A non-zero value indicates the that Postgres failed to build an index. Expect degraded performance until the index is manually rebuilt.

Managed by the Sourcegraph Core application team.


postgres: pg_exporter_err

This panel indicates errors scraping postgres exporter.

This value indicates issues retrieving metrics from postgres_exporter.

Managed by the Sourcegraph Core application team.


postgres: migration_in_progress

This panel indicates active schema migration.

A 0 value indicates that no migration is in progress.

Managed by the Sourcegraph Core application team.


Postgres: Object size and bloat

postgres: pg_table_size

This panel indicates table size.

Total size of this table

Managed by the Sourcegraph Core application team.


postgres: pg_table_bloat_ratio

This panel indicates table bloat ratio.

Estimated bloat ratio of this table (high bloat = high overhead)

Managed by the Sourcegraph Core application team.


postgres: pg_index_size

This panel indicates index size.

Total size of this index

Managed by the Sourcegraph Core application team.


postgres: pg_index_bloat_ratio

This panel indicates index bloat ratio.

Estimated bloat ratio of this index (high bloat = high overhead)

Managed by the Sourcegraph Core application team.


Postgres: Provisioning indicators (not available on server)

postgres: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Core application team.


postgres: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Core application team.


postgres: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Core application team.


postgres: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Core application team.


Postgres: Kubernetes monitoring (only available on Kubernetes)

postgres: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Core application team.


Precise Code Intel Worker

Handles conversion of uploaded precise code intelligence bundles.

Precise Code Intel Worker: Codeintel: LSIF uploads

precise-code-intel-worker: codeintel_upload_queue_size

This panel indicates unprocessed upload record queue size.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_upload_queue_growth_rate

This panel indicates unprocessed upload record queue growth rate over 30m.

This value compares the rate of enqueues against the rate of finished jobs.

- A value < than 1 indicates that process rate > enqueue rate
- A value = than 1 indicates that process rate = enqueue rate
- A value > than 1 indicates that process rate < enqueue rate

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Worker: Codeintel: LSIF uploads

precise-code-intel-worker: codeintel_upload_handlers

This panel indicates handler active handlers.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_upload_processor_total

This panel indicates handler operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_upload_processor_99th_percentile_duration

This panel indicates 99th percentile successful handler operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_upload_processor_errors_total

This panel indicates handler operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_upload_processor_error_rate

This panel indicates handler operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Worker: Codeintel: dbstore stats

precise-code-intel-worker: codeintel_dbstore_total

This panel indicates aggregate store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_dbstore_99th_percentile_duration

This panel indicates 99th percentile successful aggregate store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_dbstore_errors_total

This panel indicates aggregate store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_dbstore_error_rate

This panel indicates aggregate store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_dbstore_total

This panel indicates store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_dbstore_99th_percentile_duration

This panel indicates 99th percentile successful store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_dbstore_errors_total

This panel indicates store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_dbstore_error_rate

This panel indicates store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Worker: Codeintel: lsifstore stats

precise-code-intel-worker: codeintel_lsifstore_total

This panel indicates aggregate store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_lsifstore_99th_percentile_duration

This panel indicates 99th percentile successful aggregate store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_lsifstore_errors_total

This panel indicates aggregate store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_lsifstore_error_rate

This panel indicates aggregate store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_lsifstore_total

This panel indicates store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_lsifstore_99th_percentile_duration

This panel indicates 99th percentile successful store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_lsifstore_errors_total

This panel indicates store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_lsifstore_error_rate

This panel indicates store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Worker: Workerutil: lsif_uploads dbworker/store stats

precise-code-intel-worker: workerutil_dbworker_store_codeintel_upload_total

This panel indicates store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: workerutil_dbworker_store_codeintel_upload_99th_percentile_duration

This panel indicates 99th percentile successful store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: workerutil_dbworker_store_codeintel_upload_errors_total

This panel indicates store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: workerutil_dbworker_store_codeintel_upload_error_rate

This panel indicates store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Worker: Codeintel: gitserver client

precise-code-intel-worker: codeintel_gitserver_total

This panel indicates aggregate client operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_gitserver_99th_percentile_duration

This panel indicates 99th percentile successful aggregate client operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_gitserver_errors_total

This panel indicates aggregate client operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_gitserver_error_rate

This panel indicates aggregate client operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_gitserver_total

This panel indicates client operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_gitserver_99th_percentile_duration

This panel indicates 99th percentile successful client operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_gitserver_errors_total

This panel indicates client operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_gitserver_error_rate

This panel indicates client operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Worker: Codeintel: uploadstore stats

precise-code-intel-worker: codeintel_uploadstore_total

This panel indicates aggregate store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_uploadstore_99th_percentile_duration

This panel indicates 99th percentile successful aggregate store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_uploadstore_errors_total

This panel indicates aggregate store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_uploadstore_error_rate

This panel indicates aggregate store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_uploadstore_total

This panel indicates store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_uploadstore_99th_percentile_duration

This panel indicates 99th percentile successful store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_uploadstore_errors_total

This panel indicates store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_uploadstore_error_rate

This panel indicates store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Worker: Internal service requests

precise-code-intel-worker: frontend_internal_api_error_responses

This panel indicates frontend-internal API error responses every 5m by route.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Worker: Database connections

precise-code-intel-worker: max_open_conns

This panel indicates maximum open.

Managed by the Sourcegraph Core application team.


precise-code-intel-worker: open_conns

This panel indicates established.

Managed by the Sourcegraph Core application team.


precise-code-intel-worker: in_use

This panel indicates used.

Managed by the Sourcegraph Core application team.


precise-code-intel-worker: idle

This panel indicates idle.

Managed by the Sourcegraph Core application team.


precise-code-intel-worker: mean_blocked_seconds_per_conn_request

This panel indicates mean blocked seconds per conn request.

Managed by the Sourcegraph Core application team.


precise-code-intel-worker: closed_max_idle

This panel indicates closed by SetMaxIdleConns.

Managed by the Sourcegraph Core application team.


precise-code-intel-worker: closed_max_lifetime

This panel indicates closed by SetConnMaxLifetime.

Managed by the Sourcegraph Core application team.


precise-code-intel-worker: closed_max_idle_time

This panel indicates closed by SetConnMaxIdleTime.

Managed by the Sourcegraph Core application team.


Precise Code Intel Worker: Container monitoring (not available on server)

precise-code-intel-worker: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod precise-code-intel-worker (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p precise-code-intel-worker.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' precise-code-intel-worker (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the precise-code-intel-worker container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs precise-code-intel-worker (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Precise Code Intel Worker: Provisioning indicators (not available on server)

precise-code-intel-worker: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Worker: Golang runtime monitoring

precise-code-intel-worker: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Worker: Kubernetes monitoring (only available on Kubernetes)

precise-code-intel-worker: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Code-intelligence team.


Query Runner

Periodically runs saved searches and instructs the frontend to send out notifications.

Query Runner: Internal service requests

query-runner: frontend_internal_api_error_responses

This panel indicates frontend-internal API error responses every 5m by route.

Managed by the Sourcegraph Search team.


Query Runner: Container monitoring (not available on server)

query-runner: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod query-runner (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p query-runner.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' query-runner (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the query-runner container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs query-runner (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Search team.


query-runner: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Search team.


query-runner: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Search team.


query-runner: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Query Runner: Provisioning indicators (not available on server)

query-runner: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Search team.


query-runner: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Search team.


query-runner: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Search team.


query-runner: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Search team.


Query Runner: Golang runtime monitoring

query-runner: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Search team.


query-runner: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Search team.


Query Runner: Kubernetes monitoring (only available on Kubernetes)

query-runner: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Search team.


Worker

Manages background processes.

Worker: Active jobs

worker: worker_job_count

This panel indicates number of worker instances running each job.

The number of worker instances running each job type. It is necessary for each job type to be managed by at least one worker instance.


worker: worker_job_codeintel-janitor_count

This panel indicates number of worker instances running the codeintel-janitor job.

Managed by the Sourcegraph Code-intelligence team.


worker: worker_job_codeintel-commitgraph_count

This panel indicates number of worker instances running the codeintel-commitgraph job.

Managed by the Sourcegraph Code-intelligence team.


worker: worker_job_codeintel-auto-indexing_count

This panel indicates number of worker instances running the codeintel-auto-indexing job.

Managed by the Sourcegraph Code-intelligence team.


Worker: Codeintel: Repository with stale commit graph

worker: codeintel_commit_graph_queue_size

This panel indicates repository queue size.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_commit_graph_queue_growth_rate

This panel indicates repository queue growth rate over 30m.

This value compares the rate of enqueues against the rate of finished jobs.

- A value < than 1 indicates that process rate > enqueue rate
- A value = than 1 indicates that process rate = enqueue rate
- A value > than 1 indicates that process rate < enqueue rate

Managed by the Sourcegraph Code-intelligence team.


Worker: Codeintel: Repository commit graph updates

worker: codeintel_commit_graph_processor_total

This panel indicates update operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_commit_graph_processor_99th_percentile_duration

This panel indicates 99th percentile successful update operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_commit_graph_processor_errors_total

This panel indicates update operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_commit_graph_processor_error_rate

This panel indicates update operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Worker: Codeintel: Dependency index job

worker: codeintel_dependency_index_queue_size

This panel indicates dependency index job queue size.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dependency_index_queue_growth_rate

This panel indicates dependency index job queue growth rate over 30m.

This value compares the rate of enqueues against the rate of finished jobs.

- A value < than 1 indicates that process rate > enqueue rate
- A value = than 1 indicates that process rate = enqueue rate
- A value > than 1 indicates that process rate < enqueue rate

Managed by the Sourcegraph Code-intelligence team.


Worker: Codeintel: Dependency index jobs

worker: codeintel_dependency_index_handlers

This panel indicates handler active handlers.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dependency_index_processor_total

This panel indicates handler operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dependency_index_processor_99th_percentile_duration

This panel indicates 99th percentile successful handler operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dependency_index_processor_errors_total

This panel indicates handler operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dependency_index_processor_error_rate

This panel indicates handler operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Worker: [codeintel] Janitor stats

worker: codeintel_background_upload_records_removed_total

This panel indicates lsif_upload records deleted every 5m.

Number of LSIF upload records deleted due to expiration or unreachability every 5m

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_background_index_records_removed_total

This panel indicates lsif_index records deleted every 5m.

Number of LSIF index records deleted due to expiration or unreachability every 5m

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_background_uploads_purged_total

This panel indicates lsif_upload data bundles deleted every 5m.

Number of LSIF upload data bundles purged from the codeintel-db database every 5m

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_background_errors_total

This panel indicates janitor operation errors every 5m.

Number of code intelligence janitor errors every 5m

Managed by the Sourcegraph Code-intelligence team.


Worker: Codeintel: Auto-index scheduler

worker: codeintel_index_scheduler_total

This panel indicates aggregate scheduler operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_index_scheduler_99th_percentile_duration

This panel indicates 99th percentile successful aggregate scheduler operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_index_scheduler_errors_total

This panel indicates aggregate scheduler operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_index_scheduler_error_rate

This panel indicates aggregate scheduler operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_index_scheduler_total

This panel indicates scheduler operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_index_scheduler_99th_percentile_duration

This panel indicates 99th percentile successful scheduler operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_index_scheduler_errors_total

This panel indicates scheduler operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_index_scheduler_error_rate

This panel indicates scheduler operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Worker: Codeintel: Auto-index enqueuer

worker: codeintel_autoindex_enqueuer_total

This panel indicates aggregate enqueuer operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_autoindex_enqueuer_99th_percentile_duration

This panel indicates 99th percentile successful aggregate enqueuer operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_autoindex_enqueuer_errors_total

This panel indicates aggregate enqueuer operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_autoindex_enqueuer_error_rate

This panel indicates aggregate enqueuer operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_autoindex_enqueuer_total

This panel indicates enqueuer operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_autoindex_enqueuer_99th_percentile_duration

This panel indicates 99th percentile successful enqueuer operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_autoindex_enqueuer_errors_total

This panel indicates enqueuer operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_autoindex_enqueuer_error_rate

This panel indicates enqueuer operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Worker: Codeintel: dbstore stats

worker: codeintel_dbstore_total

This panel indicates aggregate store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dbstore_99th_percentile_duration

This panel indicates 99th percentile successful aggregate store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dbstore_errors_total

This panel indicates aggregate store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dbstore_error_rate

This panel indicates aggregate store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dbstore_total

This panel indicates store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dbstore_99th_percentile_duration

This panel indicates 99th percentile successful store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dbstore_errors_total

This panel indicates store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dbstore_error_rate

This panel indicates store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Worker: Codeintel: lsifstore stats

worker: codeintel_lsifstore_total

This panel indicates aggregate store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_lsifstore_99th_percentile_duration

This panel indicates 99th percentile successful aggregate store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_lsifstore_errors_total

This panel indicates aggregate store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_lsifstore_error_rate

This panel indicates aggregate store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_lsifstore_total

This panel indicates store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_lsifstore_99th_percentile_duration

This panel indicates 99th percentile successful store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_lsifstore_errors_total

This panel indicates store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_lsifstore_error_rate

This panel indicates store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Worker: Workerutil: lsif_dependency_indexes dbworker/store stats

worker: workerutil_dbworker_store_codeintel_dependency_index_total

This panel indicates store operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: workerutil_dbworker_store_codeintel_dependency_index_99th_percentile_duration

This panel indicates 99th percentile successful store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: workerutil_dbworker_store_codeintel_dependency_index_errors_total

This panel indicates store operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: workerutil_dbworker_store_codeintel_dependency_index_error_rate

This panel indicates store operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Worker: Codeintel: gitserver client

worker: codeintel_gitserver_total

This panel indicates aggregate client operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_gitserver_99th_percentile_duration

This panel indicates 99th percentile successful aggregate client operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_gitserver_errors_total

This panel indicates aggregate client operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_gitserver_error_rate

This panel indicates aggregate client operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_gitserver_total

This panel indicates client operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_gitserver_99th_percentile_duration

This panel indicates 99th percentile successful client operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_gitserver_errors_total

This panel indicates client operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_gitserver_error_rate

This panel indicates client operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Worker: Codeintel: Dependency repository insert

worker: codeintel_dependency_repos_total

This panel indicates aggregate insert operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dependency_repos_99th_percentile_duration

This panel indicates 99th percentile successful aggregate insert operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dependency_repos_errors_total

This panel indicates aggregate insert operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dependency_repos_error_rate

This panel indicates aggregate insert operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dependency_repos_total

This panel indicates insert operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dependency_repos_99th_percentile_duration

This panel indicates 99th percentile successful insert operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dependency_repos_errors_total

This panel indicates insert operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_dependency_repos_error_rate

This panel indicates insert operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Worker: Codeintel: lsif_upload record resetter

worker: codeintel_background_upload_record_resets_total

This panel indicates lsif_upload records reset to queued state every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_background_upload_record_reset_failures_total

This panel indicates lsif_upload records reset to errored state every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_background_upload_record_reset_errors_total

This panel indicates lsif_upload operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


Worker: Codeintel: lsif_index record resetter

worker: codeintel_background_index_record_resets_total

This panel indicates lsif_index records reset to queued state every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_background_index_record_reset_failures_total

This panel indicates lsif_index records reset to errored state every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_background_index_record_reset_errors_total

This panel indicates lsif_index operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


Worker: Codeintel: lsif_dependency_index record resetter

worker: codeintel_background_dependency_index_record_resets_total

This panel indicates lsif_dependency_index records reset to queued state every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_background_dependency_index_record_reset_failures_total

This panel indicates lsif_dependency_index records reset to errored state every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_background_dependency_index_record_reset_errors_total

This panel indicates lsif_dependency_index operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


Worker: Codeinsights: Query Runner Queue

worker: insights_search_queue_queue_size

This panel indicates code insights search queue queue size.

Managed by the Sourcegraph Code-insights team.


worker: insights_search_queue_queue_growth_rate

This panel indicates code insights search queue queue growth rate over 30m.

This value compares the rate of enqueues against the rate of finished jobs.

- A value < than 1 indicates that process rate > enqueue rate
- A value = than 1 indicates that process rate = enqueue rate
- A value > than 1 indicates that process rate < enqueue rate

Managed by the Sourcegraph Code-insights team.


Worker: Codeinsights: insights queue processor

worker: insights_search_queue_handlers

This panel indicates handler active handlers.

Managed by the Sourcegraph Code-insights team.


worker: insights_search_queue_processor_total

This panel indicates handler operations every 5m.

Managed by the Sourcegraph Code-insights team.


worker: insights_search_queue_processor_99th_percentile_duration

This panel indicates 99th percentile successful handler operation duration over 5m.

Managed by the Sourcegraph Code-insights team.


worker: insights_search_queue_processor_errors_total

This panel indicates handler operation errors every 5m.

Managed by the Sourcegraph Code-insights team.


worker: insights_search_queue_processor_error_rate

This panel indicates handler operation error rate over 5m.

Managed by the Sourcegraph Code-insights team.


Worker: Codeinsights: code insights search queue record resetter

worker: insights_search_queue_record_resets_total

This panel indicates insights_search_queue records reset to queued state every 5m.

Managed by the Sourcegraph Code-insights team.


worker: insights_search_queue_record_reset_failures_total

This panel indicates insights_search_queue records reset to errored state every 5m.

Managed by the Sourcegraph Code-insights team.


worker: insights_search_queue_record_reset_errors_total

This panel indicates insights_search_queue operation errors every 5m.

Managed by the Sourcegraph Code-insights team.


Worker: Codeinsights: dbstore stats

worker: workerutil_dbworker_store_insights_query_runner_jobs_store_total

This panel indicates aggregate store operations every 5m.

Managed by the Sourcegraph Code-insights team.


worker: workerutil_dbworker_store_insights_query_runner_jobs_store_99th_percentile_duration

This panel indicates 99th percentile successful aggregate store operation duration over 5m.

Managed by the Sourcegraph Code-insights team.


worker: workerutil_dbworker_store_insights_query_runner_jobs_store_errors_total

This panel indicates aggregate store operation errors every 5m.

Managed by the Sourcegraph Code-insights team.


worker: workerutil_dbworker_store_insights_query_runner_jobs_store_error_rate

This panel indicates aggregate store operation error rate over 5m.

Managed by the Sourcegraph Code-insights team.


worker: workerutil_dbworker_store_insights_query_runner_jobs_store_total

This panel indicates store operations every 5m.

Managed by the Sourcegraph Code-insights team.


worker: workerutil_dbworker_store_insights_query_runner_jobs_store_99th_percentile_duration

This panel indicates 99th percentile successful store operation duration over 5m.

Managed by the Sourcegraph Code-insights team.


worker: workerutil_dbworker_store_insights_query_runner_jobs_store_errors_total

This panel indicates store operation errors every 5m.

Managed by the Sourcegraph Code-insights team.


worker: workerutil_dbworker_store_insights_query_runner_jobs_store_error_rate

This panel indicates store operation error rate over 5m.

Managed by the Sourcegraph Code-insights team.


Worker: Internal service requests

worker: frontend_internal_api_error_responses

This panel indicates frontend-internal API error responses every 5m by route.

Managed by the Sourcegraph Code-intelligence team.


Worker: Database connections

worker: max_open_conns

This panel indicates maximum open.

Managed by the Sourcegraph Core application team.


worker: open_conns

This panel indicates established.

Managed by the Sourcegraph Core application team.


worker: in_use

This panel indicates used.

Managed by the Sourcegraph Core application team.


worker: idle

This panel indicates idle.

Managed by the Sourcegraph Core application team.


worker: mean_blocked_seconds_per_conn_request

This panel indicates mean blocked seconds per conn request.

Managed by the Sourcegraph Core application team.


worker: closed_max_idle

This panel indicates closed by SetMaxIdleConns.

Managed by the Sourcegraph Core application team.


worker: closed_max_lifetime

This panel indicates closed by SetConnMaxLifetime.

Managed by the Sourcegraph Core application team.


worker: closed_max_idle_time

This panel indicates closed by SetConnMaxIdleTime.

Managed by the Sourcegraph Core application team.


Worker: Container monitoring (not available on server)

worker: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod worker (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p worker.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' worker (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the worker container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs worker (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Code-intelligence team.


worker: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


worker: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Code-intelligence team.


worker: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Worker: Provisioning indicators (not available on server)

worker: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


worker: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


worker: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


worker: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


Worker: Golang runtime monitoring

worker: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Code-intelligence team.


worker: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Code-intelligence team.


Worker: Kubernetes monitoring (only available on Kubernetes)

worker: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Code-intelligence team.


Repo Updater

Manages interaction with code hosts, instructs Gitserver to update repositories.

Repo Updater: Repositories

repo-updater: syncer_sync_last_time

This panel indicates time since last sync.

A high value here indicates issues synchronizing repo metadata. If the value is persistently high, make sure all external services have valid tokens.

Managed by the Sourcegraph Core application team.


repo-updater: src_repoupdater_max_sync_backoff

This panel indicates time since oldest sync.

Managed by the Sourcegraph Core application team.


repo-updater: src_repoupdater_syncer_sync_errors_total

This panel indicates site level external service sync error rate.

Managed by the Sourcegraph Core application team.


repo-updater: syncer_sync_start

This panel indicates repo metadata sync was started.

Managed by the Sourcegraph Core application team.


repo-updater: syncer_sync_duration

This panel indicates 95th repositories sync duration.

Managed by the Sourcegraph Core application team.


repo-updater: source_duration

This panel indicates 95th repositories source duration.

Managed by the Sourcegraph Core application team.


repo-updater: syncer_synced_repos

This panel indicates repositories synced.

Managed by the Sourcegraph Core application team.


repo-updater: sourced_repos

This panel indicates repositories sourced.

Managed by the Sourcegraph Core application team.


repo-updater: user_added_repos

This panel indicates total number of user added repos.

Managed by the Sourcegraph Core application team.


repo-updater: purge_failed

This panel indicates repositories purge failed.

Managed by the Sourcegraph Core application team.


repo-updater: sched_auto_fetch

This panel indicates repositories scheduled due to hitting a deadline.

Managed by the Sourcegraph Core application team.


repo-updater: sched_manual_fetch

This panel indicates repositories scheduled due to user traffic.

Check repo-updater logs if this value is persistently high. This does not indicate anything if there are no user added code hosts.

Managed by the Sourcegraph Core application team.


repo-updater: sched_known_repos

This panel indicates repositories managed by the scheduler.

Managed by the Sourcegraph Core application team.


repo-updater: sched_update_queue_length

This panel indicates rate of growth of update queue length over 5 minutes.

Managed by the Sourcegraph Core application team.


repo-updater: sched_loops

This panel indicates scheduler loops.

Managed by the Sourcegraph Core application team.


repo-updater: sched_error

This panel indicates repositories schedule error rate.

Managed by the Sourcegraph Core application team.


Repo Updater: Permissions

repo-updater: perms_syncer_perms

This panel indicates time gap between least and most up to date permissions.

Managed by the Sourcegraph Core application team.


repo-updater: perms_syncer_stale_perms

This panel indicates number of entities with stale permissions.

Managed by the Sourcegraph Core application team.


repo-updater: perms_syncer_no_perms

This panel indicates number of entities with no permissions.

Managed by the Sourcegraph Core application team.


repo-updater: perms_syncer_sync_duration

This panel indicates 95th permissions sync duration.

Managed by the Sourcegraph Core application team.


repo-updater: perms_syncer_queue_size

This panel indicates permissions sync queued items.

Managed by the Sourcegraph Core application team.


repo-updater: perms_syncer_sync_errors

This panel indicates permissions sync error rate.

Managed by the Sourcegraph Core application team.


Repo Updater: External services

repo-updater: src_repoupdater_external_services_total

This panel indicates the total number of external services.

Managed by the Sourcegraph Core application team.


repo-updater: src_repoupdater_user_external_services_total

This panel indicates the total number of user added external services.

Managed by the Sourcegraph Core application team.


repo-updater: repoupdater_queued_sync_jobs_total

This panel indicates the total number of queued sync jobs.

Managed by the Sourcegraph Core application team.


repo-updater: repoupdater_completed_sync_jobs_total

This panel indicates the total number of completed sync jobs.

Managed by the Sourcegraph Core application team.


repo-updater: repoupdater_errored_sync_jobs_percentage

This panel indicates the percentage of external services that have failed their most recent sync.

Managed by the Sourcegraph Core application team.


repo-updater: github_graphql_rate_limit_remaining

This panel indicates remaining calls to GitHub graphql API before hitting the rate limit.

Managed by the Sourcegraph Core application team.


repo-updater: github_rest_rate_limit_remaining

This panel indicates remaining calls to GitHub rest API before hitting the rate limit.

Managed by the Sourcegraph Core application team.


repo-updater: github_search_rate_limit_remaining

This panel indicates remaining calls to GitHub search API before hitting the rate limit.

Managed by the Sourcegraph Core application team.


repo-updater: github_graphql_rate_limit_wait_duration

This panel indicates time spent waiting for the GitHub graphql API rate limiter.

Indicates how long we`re waiting on the rate limit once it has been exceeded

Managed by the Sourcegraph Core application team.


repo-updater: github_rest_rate_limit_wait_duration

This panel indicates time spent waiting for the GitHub rest API rate limiter.

Indicates how long we`re waiting on the rate limit once it has been exceeded

Managed by the Sourcegraph Core application team.


repo-updater: github_search_rate_limit_wait_duration

This panel indicates time spent waiting for the GitHub search API rate limiter.

Indicates how long we`re waiting on the rate limit once it has been exceeded

Managed by the Sourcegraph Core application team.


repo-updater: gitlab_rest_rate_limit_remaining

This panel indicates remaining calls to GitLab rest API before hitting the rate limit.

Managed by the Sourcegraph Core application team.


repo-updater: gitlab_rest_rate_limit_wait_duration

This panel indicates time spent waiting for the GitLab rest API rate limiter.

Indicates how long we`re waiting on the rate limit once it has been exceeded

Managed by the Sourcegraph Core application team.


Repo Updater: Batches: dbstore stats

repo-updater: batches_dbstore_total

This panel indicates aggregate store operations every 5m.

Managed by the Sourcegraph Batches team.


repo-updater: batches_dbstore_99th_percentile_duration

This panel indicates 99th percentile successful aggregate store operation duration over 5m.

Managed by the Sourcegraph Batches team.


repo-updater: batches_dbstore_errors_total

This panel indicates aggregate store operation errors every 5m.

Managed by the Sourcegraph Batches team.


repo-updater: batches_dbstore_error_rate

This panel indicates aggregate store operation error rate over 5m.

Managed by the Sourcegraph Batches team.


repo-updater: batches_dbstore_total

This panel indicates store operations every 5m.

Managed by the Sourcegraph Batches team.


repo-updater: batches_dbstore_99th_percentile_duration

This panel indicates 99th percentile successful store operation duration over 5m.

Managed by the Sourcegraph Batches team.


repo-updater: batches_dbstore_errors_total

This panel indicates store operation errors every 5m.

Managed by the Sourcegraph Batches team.


repo-updater: batches_dbstore_error_rate

This panel indicates store operation error rate over 5m.

Managed by the Sourcegraph Batches team.


Repo Updater: Codeintel: Coursier invocation stats

repo-updater: codeintel_coursier_total

This panel indicates aggregate invocations operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


repo-updater: codeintel_coursier_99th_percentile_duration

This panel indicates 99th percentile successful aggregate invocations operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


repo-updater: codeintel_coursier_errors_total

This panel indicates aggregate invocations operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


repo-updater: codeintel_coursier_error_rate

This panel indicates aggregate invocations operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


repo-updater: codeintel_coursier_total

This panel indicates invocations operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


repo-updater: codeintel_coursier_99th_percentile_duration

This panel indicates 99th percentile successful invocations operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


repo-updater: codeintel_coursier_errors_total

This panel indicates invocations operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


repo-updater: codeintel_coursier_error_rate

This panel indicates invocations operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Repo Updater: Internal service requests

repo-updater: frontend_internal_api_error_responses

This panel indicates frontend-internal API error responses every 5m by route.

Managed by the Sourcegraph Core application team.


Repo Updater: Database connections

repo-updater: max_open_conns

This panel indicates maximum open.

Managed by the Sourcegraph Core application team.


repo-updater: open_conns

This panel indicates established.

Managed by the Sourcegraph Core application team.


repo-updater: in_use

This panel indicates used.

Managed by the Sourcegraph Core application team.


repo-updater: idle

This panel indicates idle.

Managed by the Sourcegraph Core application team.


repo-updater: mean_blocked_seconds_per_conn_request

This panel indicates mean blocked seconds per conn request.

Managed by the Sourcegraph Core application team.


repo-updater: closed_max_idle

This panel indicates closed by SetMaxIdleConns.

Managed by the Sourcegraph Core application team.


repo-updater: closed_max_lifetime

This panel indicates closed by SetConnMaxLifetime.

Managed by the Sourcegraph Core application team.


repo-updater: closed_max_idle_time

This panel indicates closed by SetConnMaxIdleTime.

Managed by the Sourcegraph Core application team.


Repo Updater: Container monitoring (not available on server)

repo-updater: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod repo-updater (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p repo-updater.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' repo-updater (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the repo-updater container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs repo-updater (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Core application team.


repo-updater: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Core application team.


repo-updater: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Core application team.


repo-updater: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Repo Updater: Provisioning indicators (not available on server)

repo-updater: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Core application team.


repo-updater: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Core application team.


repo-updater: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Core application team.


repo-updater: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Core application team.


Repo Updater: Golang runtime monitoring

repo-updater: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Core application team.


repo-updater: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Core application team.


Repo Updater: Kubernetes monitoring (only available on Kubernetes)

repo-updater: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Core application team.


Searcher

Performs unindexed searches (diff and commit search, text search for unindexed branches).

searcher: unindexed_search_request_errors

This panel indicates unindexed search request errors every 5m by code.

Managed by the Sourcegraph Search team.


searcher: replica_traffic

This panel indicates requests per second over 10m.

Managed by the Sourcegraph Search team.


Searcher: Internal service requests

searcher: frontend_internal_api_error_responses

This panel indicates frontend-internal API error responses every 5m by route.

Managed by the Sourcegraph Search team.


Searcher: Container monitoring (not available on server)

searcher: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod searcher (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p searcher.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' searcher (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the searcher container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs searcher (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Search team.


searcher: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Search team.


searcher: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Search team.


searcher: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Searcher: Provisioning indicators (not available on server)

searcher: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Search team.


searcher: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Search team.


searcher: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Search team.


searcher: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Search team.


Searcher: Golang runtime monitoring

searcher: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Search team.


searcher: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Search team.


Searcher: Kubernetes monitoring (only available on Kubernetes)

searcher: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Search team.


Symbols

Handles symbol searches for unindexed branches.

symbols: store_fetch_failures

This panel indicates store fetch failures every 5m.

Managed by the Sourcegraph Code-intelligence team.


symbols: current_fetch_queue_size

This panel indicates current fetch queue size.

Managed by the Sourcegraph Code-intelligence team.


Symbols: Internal service requests

symbols: frontend_internal_api_error_responses

This panel indicates frontend-internal API error responses every 5m by route.

Managed by the Sourcegraph Code-intelligence team.


Symbols: Container monitoring (not available on server)

symbols: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod symbols (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p symbols.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' symbols (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the symbols container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs symbols (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Code-intelligence team.


symbols: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


symbols: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Code-intelligence team.


symbols: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Symbols: Provisioning indicators (not available on server)

symbols: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


symbols: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


symbols: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


symbols: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


Symbols: Golang runtime monitoring

symbols: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Code-intelligence team.


symbols: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Code-intelligence team.


Symbols: Kubernetes monitoring (only available on Kubernetes)

symbols: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Code-intelligence team.


Syntect Server

Handles syntax highlighting for code files.

syntect-server: syntax_highlighting_errors

This panel indicates syntax highlighting errors every 5m.

Managed by the Sourcegraph Core application team.


syntect-server: syntax_highlighting_timeouts

This panel indicates syntax highlighting timeouts every 5m.

Managed by the Sourcegraph Core application team.


syntect-server: syntax_highlighting_panics

This panel indicates syntax highlighting panics every 5m.

Managed by the Sourcegraph Core application team.


syntect-server: syntax_highlighting_worker_deaths

This panel indicates syntax highlighter worker deaths every 5m.

Managed by the Sourcegraph Core application team.


Syntect Server: Container monitoring (not available on server)

syntect-server: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod syntect-server (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p syntect-server.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' syntect-server (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the syntect-server container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs syntect-server (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Core application team.


syntect-server: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Core application team.


syntect-server: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Core application team.


syntect-server: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Syntect Server: Provisioning indicators (not available on server)

syntect-server: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Core application team.


syntect-server: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Core application team.


syntect-server: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Core application team.


syntect-server: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Core application team.


Syntect Server: Kubernetes monitoring (only available on Kubernetes)

syntect-server: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Core application team.


Zoekt Index Server

Indexes repositories and populates the search index.

zoekt-indexserver: repos_assigned

This panel indicates total number of repos.

Sudden changes should be caused by indexing configuration changes.

Managed by the Sourcegraph Search team.


zoekt-indexserver: repo_index_state

This panel indicates indexing results over 5m (noop=no changes, empty=no branches to index).

A persistent failing state indicates some repositories cannot be indexed, perhaps due to size and timeouts.

Managed by the Sourcegraph Search team.


zoekt-indexserver: repo_index_success_speed

This panel indicates successful indexing durations.

Latency increases can indicate bottlenecks in the indexserver.

Managed by the Sourcegraph Search team.


zoekt-indexserver: repo_index_fail_speed

This panel indicates failed indexing durations.

Failures happening after a long time indicates timeouts.

Managed by the Sourcegraph Search team.


zoekt-indexserver: average_resolve_revision_duration

This panel indicates average resolve revision duration over 5m.

Managed by the Sourcegraph Search team.


Zoekt Index Server: Container monitoring (not available on server)

zoekt-indexserver: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod zoekt-indexserver (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p zoekt-indexserver.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' zoekt-indexserver (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the zoekt-indexserver container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs zoekt-indexserver (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Search team.


zoekt-indexserver: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Search team.


zoekt-indexserver: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Search team.


zoekt-indexserver: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Zoekt Index Server: Provisioning indicators (not available on server)

zoekt-indexserver: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Search team.


zoekt-indexserver: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Search team.


zoekt-indexserver: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Search team.


zoekt-indexserver: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Search team.


Zoekt Index Server: Kubernetes monitoring (only available on Kubernetes)

zoekt-indexserver: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Search team.


Zoekt Web Server

Serves indexed search requests using the search index.

zoekt-webserver: indexed_search_request_errors

This panel indicates indexed search request errors every 5m by code.

Managed by the Sourcegraph Search team.


Zoekt Web Server: Container monitoring (not available on server)

zoekt-webserver: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod zoekt-webserver (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p zoekt-webserver.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' zoekt-webserver (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the zoekt-webserver container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs zoekt-webserver (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Search team.


zoekt-webserver: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Search team.


zoekt-webserver: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Search team.


zoekt-webserver: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Zoekt Web Server: Provisioning indicators (not available on server)

zoekt-webserver: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Search team.


zoekt-webserver: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Search team.


zoekt-webserver: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Search team.


zoekt-webserver: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Search team.


Prometheus

Sourcegraph's all-in-one Prometheus and Alertmanager service.

Prometheus: Metrics

prometheus: prometheus_rule_eval_duration

This panel indicates average prometheus rule group evaluation duration over 10m by rule group.

A high value here indicates Prometheus rule evaluation is taking longer than expected. It might indicate that certain rule groups are taking too long to evaluate, or Prometheus is underprovisioned.

Rules that Sourcegraph ships with are grouped under /sg_config_prometheus. Custom rules are grouped under /sg_prometheus_addons.

Managed by the Sourcegraph Distribution team.


prometheus: prometheus_rule_eval_failures

This panel indicates failed prometheus rule evaluations over 5m by rule group.

Rules that Sourcegraph ships with are grouped under /sg_config_prometheus. Custom rules are grouped under /sg_prometheus_addons.

Managed by the Sourcegraph Distribution team.


Prometheus: Alerts

prometheus: alertmanager_notification_latency

This panel indicates alertmanager notification latency over 1m by integration.

Managed by the Sourcegraph Distribution team.


prometheus: alertmanager_notification_failures

This panel indicates failed alertmanager notifications over 1m by integration.

Managed by the Sourcegraph Distribution team.


Prometheus: Internals

prometheus: prometheus_config_status

This panel indicates prometheus configuration reload status.

A 1 indicates Prometheus reloaded its configuration successfully.

Managed by the Sourcegraph Distribution team.


prometheus: alertmanager_config_status

This panel indicates alertmanager configuration reload status.

A 1 indicates Alertmanager reloaded its configuration successfully.

Managed by the Sourcegraph Distribution team.


prometheus: prometheus_tsdb_op_failure

This panel indicates prometheus tsdb failures by operation over 1m by operation.

Managed by the Sourcegraph Distribution team.


prometheus: prometheus_target_sample_exceeded

This panel indicates prometheus scrapes that exceed the sample limit over 10m.

Managed by the Sourcegraph Distribution team.


prometheus: prometheus_target_sample_duplicate

This panel indicates prometheus scrapes rejected due to duplicate timestamps over 10m.

Managed by the Sourcegraph Distribution team.


Prometheus: Container monitoring (not available on server)

prometheus: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod prometheus (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p prometheus.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' prometheus (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the prometheus container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs prometheus (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Distribution team.


prometheus: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Distribution team.


prometheus: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Distribution team.


prometheus: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Prometheus: Provisioning indicators (not available on server)

prometheus: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Distribution team.


prometheus: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Distribution team.


prometheus: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Distribution team.


prometheus: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Distribution team.


Prometheus: Kubernetes monitoring (only available on Kubernetes)

prometheus: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Distribution team.


Executor

Executes jobs in an isolated environment.

Executor: Executor: Executor jobs

executor: executor_queue_size

This panel indicates unprocessed executor job queue size.

Managed by the Sourcegraph Code-intelligence team.


executor: executor_queue_growth_rate

This panel indicates unprocessed executor job queue growth rate over 30m.

This value compares the rate of enqueues against the rate of finished jobs for the selected queue.

- A value < than 1 indicates that process rate > enqueue rate
- A value = than 1 indicates that process rate = enqueue rate
- A value > than 1 indicates that process rate < enqueue rate

Managed by the Sourcegraph Code-intelligence team.


Executor: Executor: Executor jobs

executor: executor_handlers

This panel indicates handler active handlers.

Managed by the Sourcegraph Code-intelligence team.


executor: executor_processor_total

This panel indicates handler operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: executor_processor_99th_percentile_duration

This panel indicates 99th percentile successful handler operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: executor_processor_errors_total

This panel indicates handler operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: executor_processor_error_rate

This panel indicates handler operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Executor: Executor: Queue API client

executor: apiworker_apiclient_total

This panel indicates aggregate client operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_apiclient_99th_percentile_duration

This panel indicates 99th percentile successful aggregate client operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_apiclient_errors_total

This panel indicates aggregate client operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_apiclient_error_rate

This panel indicates aggregate client operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_apiclient_total

This panel indicates client operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_apiclient_99th_percentile_duration

This panel indicates 99th percentile successful client operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_apiclient_errors_total

This panel indicates client operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_apiclient_error_rate

This panel indicates client operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Executor: Executor: Job setup

executor: apiworker_command_total

This panel indicates aggregate command operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_99th_percentile_duration

This panel indicates 99th percentile successful aggregate command operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_errors_total

This panel indicates aggregate command operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_error_rate

This panel indicates aggregate command operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_total

This panel indicates command operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_99th_percentile_duration

This panel indicates 99th percentile successful command operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_errors_total

This panel indicates command operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_error_rate

This panel indicates command operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Executor: Executor: Job execution

executor: apiworker_command_total

This panel indicates aggregate command operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_99th_percentile_duration

This panel indicates 99th percentile successful aggregate command operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_errors_total

This panel indicates aggregate command operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_error_rate

This panel indicates aggregate command operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_total

This panel indicates command operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_99th_percentile_duration

This panel indicates 99th percentile successful command operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_errors_total

This panel indicates command operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_error_rate

This panel indicates command operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Executor: Executor: Job teardown

executor: apiworker_command_total

This panel indicates aggregate command operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_99th_percentile_duration

This panel indicates 99th percentile successful aggregate command operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_errors_total

This panel indicates aggregate command operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_error_rate

This panel indicates aggregate command operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_total

This panel indicates command operations every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_99th_percentile_duration

This panel indicates 99th percentile successful command operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_errors_total

This panel indicates command operation errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor: apiworker_command_error_rate

This panel indicates command operation error rate over 5m.

Managed by the Sourcegraph Code-intelligence team.


Executor: Container monitoring (not available on server)

executor: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod (executor|sourcegraph-code-intel-indexers|executor-batches) (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p (executor|sourcegraph-code-intel-indexers|executor-batches).
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' (executor|sourcegraph-code-intel-indexers|executor-batches) (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the (executor|sourcegraph-code-intel-indexers|executor-batches) container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs (executor|sourcegraph-code-intel-indexers|executor-batches) (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Code-intelligence team.


executor: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


executor: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Code-intelligence team.


executor: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Executor: Provisioning indicators (not available on server)

executor: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


executor: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


executor: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


executor: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


Executor: Golang runtime monitoring

executor: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Code-intelligence team.


executor: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Code-intelligence team.


Executor: Kubernetes monitoring (only available on Kubernetes)

executor: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Code-intelligence team.