Dashboards reference
This document contains a complete reference on Sourcegraph’s available dashboards, as well as details on how to interpret the panels and metrics.
To learn more about Sourcegraph’s metrics and how to view these dashboards, see our metrics guide.
Frontend
Serves all end-user browser and API requests.
Frontend: Search at a glance
frontend: 99th_percentile_search_request_duration
This panel indicates 99th percentile successful search request duration over 5m.
Managed by the Sourcegraph Search team.
frontend: 90th_percentile_search_request_duration
This panel indicates 90th percentile successful search request duration over 5m.
Managed by the Sourcegraph Search team.
frontend: hard_timeout_search_responses
This panel indicates hard timeout search responses every 5m.
Managed by the Sourcegraph Search team.
frontend: hard_error_search_responses
This panel indicates hard error search responses every 5m.
Managed by the Sourcegraph Search team.
frontend: partial_timeout_search_responses
This panel indicates partial timeout search responses every 5m.
Managed by the Sourcegraph Search team.
frontend: search_alert_user_suggestions
This panel indicates search alert user suggestions shown every 5m.
Managed by the Sourcegraph Search team.
frontend: page_load_latency
This panel indicates 90th percentile page load latency over all routes over 10m.
Managed by the Sourcegraph Core application team.
frontend: blob_load_latency
This panel indicates 90th percentile blob load latency over 10m.
Managed by the Sourcegraph Core application team.
Frontend: Search-based code intelligence at a glance
frontend: 99th_percentile_search_codeintel_request_duration
This panel indicates 99th percentile code-intel successful search request duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: 90th_percentile_search_codeintel_request_duration
This panel indicates 90th percentile code-intel successful search request duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: hard_timeout_search_codeintel_responses
This panel indicates hard timeout search code-intel responses every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: hard_error_search_codeintel_responses
This panel indicates hard error search code-intel responses every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: partial_timeout_search_codeintel_responses
This panel indicates partial timeout search code-intel responses every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: search_codeintel_alert_user_suggestions
This panel indicates search code-intel alert user suggestions shown every 5m.
Managed by the Sourcegraph Code-intelligence team.
Frontend: Search API usage at a glance
frontend: 99th_percentile_search_api_request_duration
This panel indicates 99th percentile successful search API request duration over 5m.
Managed by the Sourcegraph Search team.
frontend: 90th_percentile_search_api_request_duration
This panel indicates 90th percentile successful search API request duration over 5m.
Managed by the Sourcegraph Search team.
frontend: hard_timeout_search_api_responses
This panel indicates hard timeout search API responses every 5m.
Managed by the Sourcegraph Search team.
frontend: hard_error_search_api_responses
This panel indicates hard error search API responses every 5m.
Managed by the Sourcegraph Search team.
frontend: partial_timeout_search_api_responses
This panel indicates partial timeout search API responses every 5m.
Managed by the Sourcegraph Search team.
frontend: search_api_alert_user_suggestions
This panel indicates search API alert user suggestions shown every 5m.
Managed by the Sourcegraph Search team.
Frontend: Codeintel: Precise code intelligence usage at a glance
frontend: codeintel_resolvers_total
This panel indicates aggregate graphql operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_resolvers_99th_percentile_duration
This panel indicates 99th percentile successful aggregate graphql operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_resolvers_errors_total
This panel indicates aggregate graphql operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_resolvers_error_rate
This panel indicates aggregate graphql operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_resolvers_total
This panel indicates graphql operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_resolvers_99th_percentile_duration
This panel indicates 99th percentile successful graphql operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_resolvers_errors_total
This panel indicates graphql operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_resolvers_error_rate
This panel indicates graphql operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Frontend: Codeintel: Auto-index enqueuer
frontend: codeintel_autoindex_enqueuer_total
This panel indicates aggregate enqueuer operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_autoindex_enqueuer_99th_percentile_duration
This panel indicates 99th percentile successful aggregate enqueuer operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_autoindex_enqueuer_errors_total
This panel indicates aggregate enqueuer operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_autoindex_enqueuer_error_rate
This panel indicates aggregate enqueuer operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_autoindex_enqueuer_total
This panel indicates enqueuer operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_autoindex_enqueuer_99th_percentile_duration
This panel indicates 99th percentile successful enqueuer operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_autoindex_enqueuer_errors_total
This panel indicates enqueuer operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_autoindex_enqueuer_error_rate
This panel indicates enqueuer operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Frontend: Codeintel: dbstore stats
frontend: codeintel_dbstore_total
This panel indicates aggregate store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_dbstore_99th_percentile_duration
This panel indicates 99th percentile successful aggregate store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_dbstore_errors_total
This panel indicates aggregate store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_dbstore_error_rate
This panel indicates aggregate store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_dbstore_total
This panel indicates store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_dbstore_99th_percentile_duration
This panel indicates 99th percentile successful store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_dbstore_errors_total
This panel indicates store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_dbstore_error_rate
This panel indicates store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Frontend: Workerutil: lsif_indexes dbworker/store stats
frontend: workerutil_dbworker_store_codeintel_index_total
This panel indicates store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: workerutil_dbworker_store_codeintel_index_99th_percentile_duration
This panel indicates 99th percentile successful store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: workerutil_dbworker_store_codeintel_index_errors_total
This panel indicates store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: workerutil_dbworker_store_codeintel_index_error_rate
This panel indicates store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Frontend: Codeintel: lsifstore stats
frontend: codeintel_lsifstore_total
This panel indicates aggregate store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_lsifstore_99th_percentile_duration
This panel indicates 99th percentile successful aggregate store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_lsifstore_errors_total
This panel indicates aggregate store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_lsifstore_error_rate
This panel indicates aggregate store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_lsifstore_total
This panel indicates store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_lsifstore_99th_percentile_duration
This panel indicates 99th percentile successful store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_lsifstore_errors_total
This panel indicates store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_lsifstore_error_rate
This panel indicates store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Frontend: Codeintel: gitserver client
frontend: codeintel_gitserver_total
This panel indicates aggregate client operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_gitserver_99th_percentile_duration
This panel indicates 99th percentile successful aggregate client operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_gitserver_errors_total
This panel indicates aggregate client operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_gitserver_error_rate
This panel indicates aggregate client operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_gitserver_total
This panel indicates client operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_gitserver_99th_percentile_duration
This panel indicates 99th percentile successful client operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_gitserver_errors_total
This panel indicates client operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_gitserver_error_rate
This panel indicates client operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Frontend: Codeintel: uploadstore stats
frontend: codeintel_uploadstore_total
This panel indicates aggregate store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_uploadstore_99th_percentile_duration
This panel indicates 99th percentile successful aggregate store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_uploadstore_errors_total
This panel indicates aggregate store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_uploadstore_error_rate
This panel indicates aggregate store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_uploadstore_total
This panel indicates store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_uploadstore_99th_percentile_duration
This panel indicates 99th percentile successful store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_uploadstore_errors_total
This panel indicates store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: codeintel_uploadstore_error_rate
This panel indicates store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Frontend: Batches: dbstore stats
frontend: batches_dbstore_total
This panel indicates aggregate store operations every 5m.
Managed by the Sourcegraph Batches team.
frontend: batches_dbstore_99th_percentile_duration
This panel indicates 99th percentile successful aggregate store operation duration over 5m.
Managed by the Sourcegraph Batches team.
frontend: batches_dbstore_errors_total
This panel indicates aggregate store operation errors every 5m.
Managed by the Sourcegraph Batches team.
frontend: batches_dbstore_error_rate
This panel indicates aggregate store operation error rate over 5m.
Managed by the Sourcegraph Batches team.
frontend: batches_dbstore_total
This panel indicates store operations every 5m.
Managed by the Sourcegraph Batches team.
frontend: batches_dbstore_99th_percentile_duration
This panel indicates 99th percentile successful store operation duration over 5m.
Managed by the Sourcegraph Batches team.
frontend: batches_dbstore_errors_total
This panel indicates store operation errors every 5m.
Managed by the Sourcegraph Batches team.
frontend: batches_dbstore_error_rate
This panel indicates store operation error rate over 5m.
Managed by the Sourcegraph Batches team.
Frontend: Out-of-band migrations: up migration invocation (one batch processed)
frontend: oobmigration_total
This panel indicates migration handler operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: oobmigration_99th_percentile_duration
This panel indicates 99th percentile successful migration handler operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: oobmigration_errors_total
This panel indicates migration handler operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: oobmigration_error_rate
This panel indicates migration handler operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Frontend: Out-of-band migrations: down migration invocation (one batch processed)
frontend: oobmigration_total
This panel indicates migration handler operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: oobmigration_99th_percentile_duration
This panel indicates 99th percentile successful migration handler operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: oobmigration_errors_total
This panel indicates migration handler operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
frontend: oobmigration_error_rate
This panel indicates migration handler operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Frontend: Internal service requests
frontend: internal_indexed_search_error_responses
This panel indicates internal indexed search error responses every 5m.
Managed by the Sourcegraph Search team.
frontend: internal_unindexed_search_error_responses
This panel indicates internal unindexed search error responses every 5m.
Managed by the Sourcegraph Search team.
frontend: internal_api_error_responses
This panel indicates internal API error responses every 5m by route.
Managed by the Sourcegraph Core application team.
frontend: 99th_percentile_gitserver_duration
This panel indicates 99th percentile successful gitserver query duration over 5m.
Managed by the Sourcegraph Core application team.
frontend: gitserver_error_responses
This panel indicates gitserver error responses every 5m.
Managed by the Sourcegraph Core application team.
frontend: observability_test_alert_warning
This panel indicates warning test alert metric.
Managed by the Sourcegraph Distribution team.
frontend: observability_test_alert_critical
This panel indicates critical test alert metric.
Managed by the Sourcegraph Distribution team.
Frontend: Database connections
frontend: max_open_conns
This panel indicates maximum open.
Managed by the Sourcegraph Core application team.
frontend: open_conns
This panel indicates established.
Managed by the Sourcegraph Core application team.
frontend: in_use
This panel indicates used.
Managed by the Sourcegraph Core application team.
frontend: idle
This panel indicates idle.
Managed by the Sourcegraph Core application team.
frontend: mean_blocked_seconds_per_conn_request
This panel indicates mean blocked seconds per conn request.
Managed by the Sourcegraph Core application team.
frontend: closed_max_idle
This panel indicates closed by SetMaxIdleConns.
Managed by the Sourcegraph Core application team.
frontend: closed_max_lifetime
This panel indicates closed by SetConnMaxLifetime.
Managed by the Sourcegraph Core application team.
frontend: closed_max_idle_time
This panel indicates closed by SetConnMaxIdleTime.
Managed by the Sourcegraph Core application team.
Frontend: Container monitoring (not available on server)
frontend: container_missing
This panel indicates container missing.
This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.
- Kubernetes:
- Determine if the pod was OOM killed using
kubectl describe pod (frontend|sourcegraph-frontend)
(look forOOMKilled: true
) and, if so, consider increasing the memory limit in the relevantDeployment.yaml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingkubectl logs -p (frontend|sourcegraph-frontend)
.
- Determine if the pod was OOM killed using
- Docker Compose:
- Determine if the pod was OOM killed using
docker inspect -f '{{json .State}}' (frontend|sourcegraph-frontend)
(look for"OOMKilled":true
) and, if so, consider increasing the memory limit of the (frontend|sourcegraph-frontend) container indocker-compose.yml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingdocker logs (frontend|sourcegraph-frontend)
(note this will include logs from the previous and currently running container).
- Determine if the pod was OOM killed using
Managed by the Sourcegraph Core application team.
frontend: container_cpu_usage
This panel indicates container cpu usage total (1m average) across all cores by instance.
Managed by the Sourcegraph Core application team.
frontend: container_memory_usage
This panel indicates container memory usage by instance.
Managed by the Sourcegraph Core application team.
frontend: fs_io_operations
This panel indicates filesystem reads and writes rate by instance over 1h.
This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.
Managed by the Sourcegraph Core application team.
Frontend: Provisioning indicators (not available on server)
frontend: provisioning_container_cpu_usage_long_term
This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.
Managed by the Sourcegraph Core application team.
frontend: provisioning_container_memory_usage_long_term
This panel indicates container memory usage (1d maximum) by instance.
Managed by the Sourcegraph Core application team.
frontend: provisioning_container_cpu_usage_short_term
This panel indicates container cpu usage total (5m maximum) across all cores by instance.
Managed by the Sourcegraph Core application team.
frontend: provisioning_container_memory_usage_short_term
This panel indicates container memory usage (5m maximum) by instance.
Managed by the Sourcegraph Core application team.
Frontend: Golang runtime monitoring
frontend: go_goroutines
This panel indicates maximum active goroutines.
A high value here indicates a possible goroutine leak.
Managed by the Sourcegraph Core application team.
frontend: go_gc_duration_seconds
This panel indicates maximum go garbage collection duration.
Managed by the Sourcegraph Core application team.
Frontend: Kubernetes monitoring (only available on Kubernetes)
frontend: pods_available_percentage
This panel indicates percentage pods available.
Managed by the Sourcegraph Core application team.
Frontend: Sentinel queries (only on sourcegraph.com)
frontend: mean_successful_sentinel_duration_5m
This panel indicates mean successful sentinel search duration over 5m.
Managed by the Sourcegraph Search team.
frontend: mean_sentinel_stream_latency_5m
This panel indicates mean sentinel stream latency over 5m.
Managed by the Sourcegraph Search team.
frontend: 90th_percentile_successful_sentinel_duration_5m
This panel indicates 90th percentile successful sentinel search duration over 5m.
Managed by the Sourcegraph Search team.
frontend: 90th_percentile_sentinel_stream_latency_5m
This panel indicates 90th percentile sentinel stream latency over 5m.
Managed by the Sourcegraph Search team.
frontend: mean_successful_sentinel_duration_by_query_5m
This panel indicates mean successful sentinel search duration by query over 5m.
- The mean search duration for sentinel queries, broken down by query. Useful for debugging whether a slowdown is limited to a specific type of query.
Managed by the Sourcegraph Search team.
frontend: mean_sentinel_stream_latency_by_query_5m
This panel indicates mean sentinel stream latency by query over 5m.
- The mean streaming search latency for sentinel queries, broken down by query. Useful for debugging whether a slowdown is limited to a specific type of query.
Managed by the Sourcegraph Search team.
frontend: unsuccessful_status_rate_5m
This panel indicates unsuccessful status rate per 5m.
- The rate of unsuccessful sentinel query, broken down by failure type
Managed by the Sourcegraph Search team.
Git Server
Stores, manages, and operates Git repositories.
gitserver: memory_working_set
This panel indicates memory working set.
Managed by the Sourcegraph Core application team.
gitserver: go_routines
This panel indicates go routines.
Managed by the Sourcegraph Core application team.
gitserver: cpu_throttling_time
This panel indicates container CPU throttling time %.
Managed by the Sourcegraph Core application team.
gitserver: cpu_usage_seconds
This panel indicates cpu usage seconds.
Managed by the Sourcegraph Core application team.
gitserver: disk_space_remaining
This panel indicates disk space remaining by instance.
Managed by the Sourcegraph Core application team.
gitserver: io_reads_total
This panel indicates i/o reads total.
Managed by the Sourcegraph Core application team.
gitserver: io_writes_total
This panel indicates i/o writes total.
Managed by the Sourcegraph Core application team.
gitserver: io_reads
This panel indicates i/o reads.
Managed by the Sourcegraph Core application team.
gitserver: io_writes
This panel indicates i/o writes.
Managed by the Sourcegraph Core application team.
gitserver: io_read_througput
This panel indicates i/o read throughput.
Managed by the Sourcegraph Core application team.
gitserver: io_write_throughput
This panel indicates i/o write throughput.
Managed by the Sourcegraph Core application team.
gitserver: running_git_commands
This panel indicates git commands running on each gitserver instance.
A high value signals load.
Managed by the Sourcegraph Core application team.
gitserver: git_commands_received
This panel indicates rate of git commands received across all instances.
per second rate per command across all instances
Managed by the Sourcegraph Core application team.
gitserver: repository_clone_queue_size
This panel indicates repository clone queue size.
Managed by the Sourcegraph Core application team.
gitserver: repository_existence_check_queue_size
This panel indicates repository existence check queue size.
Managed by the Sourcegraph Core application team.
gitserver: echo_command_duration_test
This panel indicates echo test command duration.
A high value here likely indicates a problem, especially if consistently high.
You can query for individual commands using sum by (cmd)(src_gitserver_exec_running)
in Grafana (/-/debug/grafana
) to see if a specific Git Server command might be spiking in frequency.
If this value is consistently high, consider the following:
- Single container deployments: Upgrade to a Docker Compose deployment which offers better scalability and resource isolation.
- Kubernetes and Docker Compose: Check that you are running a similar number of git server replicas and that their CPU/memory limits are allocated according to what is shown in the Sourcegraph resource estimator.
Managed by the Sourcegraph Core application team.
gitserver: frontend_internal_api_error_responses
This panel indicates frontend-internal API error responses every 5m by route.
Managed by the Sourcegraph Core application team.
Git Server: Gitserver cleanup jobs
gitserver: janitor_running
This panel indicates if the janitor process is running.
1, if the janitor process is currently running
Managed by the Sourcegraph Core application team.
gitserver: janitor_job_duration
This panel indicates 95th percentile job run duration.
95th percentile job run duration
Managed by the Sourcegraph Core application team.
gitserver: repos_removed
This panel indicates repositories removed due to disk pressure.
Repositories removed due to disk pressure
Managed by the Sourcegraph Core application team.
Git Server: Codeintel: Coursier invocation stats
gitserver: codeintel_coursier_total
This panel indicates aggregate invocations operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
gitserver: codeintel_coursier_99th_percentile_duration
This panel indicates 99th percentile successful aggregate invocations operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
gitserver: codeintel_coursier_errors_total
This panel indicates aggregate invocations operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
gitserver: codeintel_coursier_error_rate
This panel indicates aggregate invocations operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
gitserver: codeintel_coursier_total
This panel indicates invocations operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
gitserver: codeintel_coursier_99th_percentile_duration
This panel indicates 99th percentile successful invocations operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
gitserver: codeintel_coursier_errors_total
This panel indicates invocations operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
gitserver: codeintel_coursier_error_rate
This panel indicates invocations operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Git Server: Database connections
gitserver: max_open_conns
This panel indicates maximum open.
Managed by the Sourcegraph Core application team.
gitserver: open_conns
This panel indicates established.
Managed by the Sourcegraph Core application team.
gitserver: in_use
This panel indicates used.
Managed by the Sourcegraph Core application team.
gitserver: idle
This panel indicates idle.
Managed by the Sourcegraph Core application team.
gitserver: mean_blocked_seconds_per_conn_request
This panel indicates mean blocked seconds per conn request.
Managed by the Sourcegraph Core application team.
gitserver: closed_max_idle
This panel indicates closed by SetMaxIdleConns.
Managed by the Sourcegraph Core application team.
gitserver: closed_max_lifetime
This panel indicates closed by SetConnMaxLifetime.
Managed by the Sourcegraph Core application team.
gitserver: closed_max_idle_time
This panel indicates closed by SetConnMaxIdleTime.
Managed by the Sourcegraph Core application team.
Git Server: Container monitoring (not available on server)
gitserver: container_missing
This panel indicates container missing.
This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.
- Kubernetes:
- Determine if the pod was OOM killed using
kubectl describe pod gitserver
(look forOOMKilled: true
) and, if so, consider increasing the memory limit in the relevantDeployment.yaml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingkubectl logs -p gitserver
.
- Determine if the pod was OOM killed using
- Docker Compose:
- Determine if the pod was OOM killed using
docker inspect -f '{{json .State}}' gitserver
(look for"OOMKilled":true
) and, if so, consider increasing the memory limit of the gitserver container indocker-compose.yml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingdocker logs gitserver
(note this will include logs from the previous and currently running container).
- Determine if the pod was OOM killed using
Managed by the Sourcegraph Core application team.
gitserver: container_cpu_usage
This panel indicates container cpu usage total (1m average) across all cores by instance.
Managed by the Sourcegraph Core application team.
gitserver: container_memory_usage
This panel indicates container memory usage by instance.
Managed by the Sourcegraph Core application team.
gitserver: fs_io_operations
This panel indicates filesystem reads and writes rate by instance over 1h.
This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.
Managed by the Sourcegraph Core application team.
Git Server: Provisioning indicators (not available on server)
gitserver: provisioning_container_cpu_usage_long_term
This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.
Managed by the Sourcegraph Core application team.
gitserver: provisioning_container_memory_usage_long_term
This panel indicates container memory usage (1d maximum) by instance.
Git Server is expected to use up all the memory it is provided.
Managed by the Sourcegraph Core application team.
gitserver: provisioning_container_cpu_usage_short_term
This panel indicates container cpu usage total (5m maximum) across all cores by instance.
Managed by the Sourcegraph Core application team.
gitserver: provisioning_container_memory_usage_short_term
This panel indicates container memory usage (5m maximum) by instance.
Git Server is expected to use up all the memory it is provided.
Managed by the Sourcegraph Core application team.
Git Server: Golang runtime monitoring
gitserver: go_goroutines
This panel indicates maximum active goroutines.
A high value here indicates a possible goroutine leak.
Managed by the Sourcegraph Core application team.
gitserver: go_gc_duration_seconds
This panel indicates maximum go garbage collection duration.
Managed by the Sourcegraph Core application team.
Git Server: Kubernetes monitoring (only available on Kubernetes)
gitserver: pods_available_percentage
This panel indicates percentage pods available.
Managed by the Sourcegraph Core application team.
GitHub Proxy
Proxies all requests to github.com, keeping track of and managing rate limits.
GitHub Proxy: GitHub API monitoring
github-proxy: github_proxy_waiting_requests
This panel indicates number of requests waiting on the global mutex.
Managed by the Sourcegraph Core application team.
GitHub Proxy: Container monitoring (not available on server)
github-proxy: container_missing
This panel indicates container missing.
This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.
- Kubernetes:
- Determine if the pod was OOM killed using
kubectl describe pod github-proxy
(look forOOMKilled: true
) and, if so, consider increasing the memory limit in the relevantDeployment.yaml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingkubectl logs -p github-proxy
.
- Determine if the pod was OOM killed using
- Docker Compose:
- Determine if the pod was OOM killed using
docker inspect -f '{{json .State}}' github-proxy
(look for"OOMKilled":true
) and, if so, consider increasing the memory limit of the github-proxy container indocker-compose.yml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingdocker logs github-proxy
(note this will include logs from the previous and currently running container).
- Determine if the pod was OOM killed using
Managed by the Sourcegraph Core application team.
github-proxy: container_cpu_usage
This panel indicates container cpu usage total (1m average) across all cores by instance.
Managed by the Sourcegraph Core application team.
github-proxy: container_memory_usage
This panel indicates container memory usage by instance.
Managed by the Sourcegraph Core application team.
github-proxy: fs_io_operations
This panel indicates filesystem reads and writes rate by instance over 1h.
This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.
Managed by the Sourcegraph Core application team.
GitHub Proxy: Provisioning indicators (not available on server)
github-proxy: provisioning_container_cpu_usage_long_term
This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.
Managed by the Sourcegraph Core application team.
github-proxy: provisioning_container_memory_usage_long_term
This panel indicates container memory usage (1d maximum) by instance.
Managed by the Sourcegraph Core application team.
github-proxy: provisioning_container_cpu_usage_short_term
This panel indicates container cpu usage total (5m maximum) across all cores by instance.
Managed by the Sourcegraph Core application team.
github-proxy: provisioning_container_memory_usage_short_term
This panel indicates container memory usage (5m maximum) by instance.
Managed by the Sourcegraph Core application team.
GitHub Proxy: Golang runtime monitoring
github-proxy: go_goroutines
This panel indicates maximum active goroutines.
A high value here indicates a possible goroutine leak.
Managed by the Sourcegraph Core application team.
github-proxy: go_gc_duration_seconds
This panel indicates maximum go garbage collection duration.
Managed by the Sourcegraph Core application team.
GitHub Proxy: Kubernetes monitoring (only available on Kubernetes)
github-proxy: pods_available_percentage
This panel indicates percentage pods available.
Managed by the Sourcegraph Core application team.
Postgres
Postgres metrics, exported from postgres_exporter (only available on Kubernetes).
postgres: connections
This panel indicates active connections.
Managed by the Sourcegraph Core application team.
postgres: transaction_durations
This panel indicates maximum transaction durations.
Managed by the Sourcegraph Core application team.
Postgres: Database and collector status
postgres: postgres_up
This panel indicates database availability.
A non-zero value indicates the database is online.
Managed by the Sourcegraph Core application team.
postgres: invalid_indexes
This panel indicates invalid indexes (unusable by the query planner).
A non-zero value indicates the that Postgres failed to build an index. Expect degraded performance until the index is manually rebuilt.
Managed by the Sourcegraph Core application team.
postgres: pg_exporter_err
This panel indicates errors scraping postgres exporter.
This value indicates issues retrieving metrics from postgres_exporter.
Managed by the Sourcegraph Core application team.
postgres: migration_in_progress
This panel indicates active schema migration.
A 0 value indicates that no migration is in progress.
Managed by the Sourcegraph Core application team.
Postgres: Object size and bloat
postgres: pg_table_size
This panel indicates table size.
Total size of this table
Managed by the Sourcegraph Core application team.
postgres: pg_table_bloat_ratio
This panel indicates table bloat ratio.
Estimated bloat ratio of this table (high bloat = high overhead)
Managed by the Sourcegraph Core application team.
postgres: pg_index_size
This panel indicates index size.
Total size of this index
Managed by the Sourcegraph Core application team.
postgres: pg_index_bloat_ratio
This panel indicates index bloat ratio.
Estimated bloat ratio of this index (high bloat = high overhead)
Managed by the Sourcegraph Core application team.
Postgres: Provisioning indicators (not available on server)
postgres: provisioning_container_cpu_usage_long_term
This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.
Managed by the Sourcegraph Core application team.
postgres: provisioning_container_memory_usage_long_term
This panel indicates container memory usage (1d maximum) by instance.
Managed by the Sourcegraph Core application team.
postgres: provisioning_container_cpu_usage_short_term
This panel indicates container cpu usage total (5m maximum) across all cores by instance.
Managed by the Sourcegraph Core application team.
postgres: provisioning_container_memory_usage_short_term
This panel indicates container memory usage (5m maximum) by instance.
Managed by the Sourcegraph Core application team.
Postgres: Kubernetes monitoring (only available on Kubernetes)
postgres: pods_available_percentage
This panel indicates percentage pods available.
Managed by the Sourcegraph Core application team.
Precise Code Intel Worker
Handles conversion of uploaded precise code intelligence bundles.
Precise Code Intel Worker: Codeintel: LSIF uploads
precise-code-intel-worker: codeintel_upload_queue_size
This panel indicates unprocessed upload record queue size.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_upload_queue_growth_rate
This panel indicates unprocessed upload record queue growth rate over 30m.
This value compares the rate of enqueues against the rate of finished jobs.
- A value < than 1 indicates that process rate > enqueue rate - A value = than 1 indicates that process rate = enqueue rate - A value > than 1 indicates that process rate < enqueue rate
Managed by the Sourcegraph Code-intelligence team.
Precise Code Intel Worker: Codeintel: LSIF uploads
precise-code-intel-worker: codeintel_upload_handlers
This panel indicates handler active handlers.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_upload_processor_total
This panel indicates handler operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_upload_processor_99th_percentile_duration
This panel indicates 99th percentile successful handler operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_upload_processor_errors_total
This panel indicates handler operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_upload_processor_error_rate
This panel indicates handler operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Precise Code Intel Worker: Codeintel: dbstore stats
precise-code-intel-worker: codeintel_dbstore_total
This panel indicates aggregate store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_dbstore_99th_percentile_duration
This panel indicates 99th percentile successful aggregate store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_dbstore_errors_total
This panel indicates aggregate store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_dbstore_error_rate
This panel indicates aggregate store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_dbstore_total
This panel indicates store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_dbstore_99th_percentile_duration
This panel indicates 99th percentile successful store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_dbstore_errors_total
This panel indicates store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_dbstore_error_rate
This panel indicates store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Precise Code Intel Worker: Codeintel: lsifstore stats
precise-code-intel-worker: codeintel_lsifstore_total
This panel indicates aggregate store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_lsifstore_99th_percentile_duration
This panel indicates 99th percentile successful aggregate store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_lsifstore_errors_total
This panel indicates aggregate store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_lsifstore_error_rate
This panel indicates aggregate store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_lsifstore_total
This panel indicates store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_lsifstore_99th_percentile_duration
This panel indicates 99th percentile successful store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_lsifstore_errors_total
This panel indicates store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_lsifstore_error_rate
This panel indicates store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Precise Code Intel Worker: Workerutil: lsif_uploads dbworker/store stats
precise-code-intel-worker: workerutil_dbworker_store_codeintel_upload_total
This panel indicates store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: workerutil_dbworker_store_codeintel_upload_99th_percentile_duration
This panel indicates 99th percentile successful store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: workerutil_dbworker_store_codeintel_upload_errors_total
This panel indicates store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: workerutil_dbworker_store_codeintel_upload_error_rate
This panel indicates store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Precise Code Intel Worker: Codeintel: gitserver client
precise-code-intel-worker: codeintel_gitserver_total
This panel indicates aggregate client operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_gitserver_99th_percentile_duration
This panel indicates 99th percentile successful aggregate client operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_gitserver_errors_total
This panel indicates aggregate client operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_gitserver_error_rate
This panel indicates aggregate client operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_gitserver_total
This panel indicates client operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_gitserver_99th_percentile_duration
This panel indicates 99th percentile successful client operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_gitserver_errors_total
This panel indicates client operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_gitserver_error_rate
This panel indicates client operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Precise Code Intel Worker: Codeintel: uploadstore stats
precise-code-intel-worker: codeintel_uploadstore_total
This panel indicates aggregate store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_uploadstore_99th_percentile_duration
This panel indicates 99th percentile successful aggregate store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_uploadstore_errors_total
This panel indicates aggregate store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_uploadstore_error_rate
This panel indicates aggregate store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_uploadstore_total
This panel indicates store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_uploadstore_99th_percentile_duration
This panel indicates 99th percentile successful store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_uploadstore_errors_total
This panel indicates store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: codeintel_uploadstore_error_rate
This panel indicates store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Precise Code Intel Worker: Internal service requests
precise-code-intel-worker: frontend_internal_api_error_responses
This panel indicates frontend-internal API error responses every 5m by route.
Managed by the Sourcegraph Code-intelligence team.
Precise Code Intel Worker: Database connections
precise-code-intel-worker: max_open_conns
This panel indicates maximum open.
Managed by the Sourcegraph Core application team.
precise-code-intel-worker: open_conns
This panel indicates established.
Managed by the Sourcegraph Core application team.
precise-code-intel-worker: in_use
This panel indicates used.
Managed by the Sourcegraph Core application team.
precise-code-intel-worker: idle
This panel indicates idle.
Managed by the Sourcegraph Core application team.
precise-code-intel-worker: mean_blocked_seconds_per_conn_request
This panel indicates mean blocked seconds per conn request.
Managed by the Sourcegraph Core application team.
precise-code-intel-worker: closed_max_idle
This panel indicates closed by SetMaxIdleConns.
Managed by the Sourcegraph Core application team.
precise-code-intel-worker: closed_max_lifetime
This panel indicates closed by SetConnMaxLifetime.
Managed by the Sourcegraph Core application team.
precise-code-intel-worker: closed_max_idle_time
This panel indicates closed by SetConnMaxIdleTime.
Managed by the Sourcegraph Core application team.
Precise Code Intel Worker: Container monitoring (not available on server)
precise-code-intel-worker: container_missing
This panel indicates container missing.
This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.
- Kubernetes:
- Determine if the pod was OOM killed using
kubectl describe pod precise-code-intel-worker
(look forOOMKilled: true
) and, if so, consider increasing the memory limit in the relevantDeployment.yaml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingkubectl logs -p precise-code-intel-worker
.
- Determine if the pod was OOM killed using
- Docker Compose:
- Determine if the pod was OOM killed using
docker inspect -f '{{json .State}}' precise-code-intel-worker
(look for"OOMKilled":true
) and, if so, consider increasing the memory limit of the precise-code-intel-worker container indocker-compose.yml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingdocker logs precise-code-intel-worker
(note this will include logs from the previous and currently running container).
- Determine if the pod was OOM killed using
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: container_cpu_usage
This panel indicates container cpu usage total (1m average) across all cores by instance.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: container_memory_usage
This panel indicates container memory usage by instance.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: fs_io_operations
This panel indicates filesystem reads and writes rate by instance over 1h.
This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.
Managed by the Sourcegraph Core application team.
Precise Code Intel Worker: Provisioning indicators (not available on server)
precise-code-intel-worker: provisioning_container_cpu_usage_long_term
This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: provisioning_container_memory_usage_long_term
This panel indicates container memory usage (1d maximum) by instance.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: provisioning_container_cpu_usage_short_term
This panel indicates container cpu usage total (5m maximum) across all cores by instance.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: provisioning_container_memory_usage_short_term
This panel indicates container memory usage (5m maximum) by instance.
Managed by the Sourcegraph Code-intelligence team.
Precise Code Intel Worker: Golang runtime monitoring
precise-code-intel-worker: go_goroutines
This panel indicates maximum active goroutines.
A high value here indicates a possible goroutine leak.
Managed by the Sourcegraph Code-intelligence team.
precise-code-intel-worker: go_gc_duration_seconds
This panel indicates maximum go garbage collection duration.
Managed by the Sourcegraph Code-intelligence team.
Precise Code Intel Worker: Kubernetes monitoring (only available on Kubernetes)
precise-code-intel-worker: pods_available_percentage
This panel indicates percentage pods available.
Managed by the Sourcegraph Code-intelligence team.
Query Runner
Periodically runs saved searches and instructs the frontend to send out notifications.
Query Runner: Internal service requests
query-runner: frontend_internal_api_error_responses
This panel indicates frontend-internal API error responses every 5m by route.
Managed by the Sourcegraph Search team.
Query Runner: Container monitoring (not available on server)
query-runner: container_missing
This panel indicates container missing.
This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.
- Kubernetes:
- Determine if the pod was OOM killed using
kubectl describe pod query-runner
(look forOOMKilled: true
) and, if so, consider increasing the memory limit in the relevantDeployment.yaml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingkubectl logs -p query-runner
.
- Determine if the pod was OOM killed using
- Docker Compose:
- Determine if the pod was OOM killed using
docker inspect -f '{{json .State}}' query-runner
(look for"OOMKilled":true
) and, if so, consider increasing the memory limit of the query-runner container indocker-compose.yml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingdocker logs query-runner
(note this will include logs from the previous and currently running container).
- Determine if the pod was OOM killed using
Managed by the Sourcegraph Search team.
query-runner: container_cpu_usage
This panel indicates container cpu usage total (1m average) across all cores by instance.
Managed by the Sourcegraph Search team.
query-runner: container_memory_usage
This panel indicates container memory usage by instance.
Managed by the Sourcegraph Search team.
query-runner: fs_io_operations
This panel indicates filesystem reads and writes rate by instance over 1h.
This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.
Managed by the Sourcegraph Core application team.
Query Runner: Provisioning indicators (not available on server)
query-runner: provisioning_container_cpu_usage_long_term
This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.
Managed by the Sourcegraph Search team.
query-runner: provisioning_container_memory_usage_long_term
This panel indicates container memory usage (1d maximum) by instance.
Managed by the Sourcegraph Search team.
query-runner: provisioning_container_cpu_usage_short_term
This panel indicates container cpu usage total (5m maximum) across all cores by instance.
Managed by the Sourcegraph Search team.
query-runner: provisioning_container_memory_usage_short_term
This panel indicates container memory usage (5m maximum) by instance.
Managed by the Sourcegraph Search team.
Query Runner: Golang runtime monitoring
query-runner: go_goroutines
This panel indicates maximum active goroutines.
A high value here indicates a possible goroutine leak.
Managed by the Sourcegraph Search team.
query-runner: go_gc_duration_seconds
This panel indicates maximum go garbage collection duration.
Managed by the Sourcegraph Search team.
Query Runner: Kubernetes monitoring (only available on Kubernetes)
query-runner: pods_available_percentage
This panel indicates percentage pods available.
Managed by the Sourcegraph Search team.
Worker
Manages background processes.
Worker: Active jobs
worker: worker_job_count
This panel indicates number of worker instances running each job.
The number of worker instances running each job type. It is necessary for each job type to be managed by at least one worker instance.
worker: worker_job_codeintel-janitor_count
This panel indicates number of worker instances running the codeintel-janitor job.
Managed by the Sourcegraph Code-intelligence team.
worker: worker_job_codeintel-commitgraph_count
This panel indicates number of worker instances running the codeintel-commitgraph job.
Managed by the Sourcegraph Code-intelligence team.
worker: worker_job_codeintel-auto-indexing_count
This panel indicates number of worker instances running the codeintel-auto-indexing job.
Managed by the Sourcegraph Code-intelligence team.
Worker: Codeintel: Repository with stale commit graph
worker: codeintel_commit_graph_queue_size
This panel indicates repository queue size.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_commit_graph_queue_growth_rate
This panel indicates repository queue growth rate over 30m.
This value compares the rate of enqueues against the rate of finished jobs.
- A value < than 1 indicates that process rate > enqueue rate - A value = than 1 indicates that process rate = enqueue rate - A value > than 1 indicates that process rate < enqueue rate
Managed by the Sourcegraph Code-intelligence team.
Worker: Codeintel: Repository commit graph updates
worker: codeintel_commit_graph_processor_total
This panel indicates update operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_commit_graph_processor_99th_percentile_duration
This panel indicates 99th percentile successful update operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_commit_graph_processor_errors_total
This panel indicates update operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_commit_graph_processor_error_rate
This panel indicates update operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Worker: Codeintel: Dependency index job
worker: codeintel_dependency_index_queue_size
This panel indicates dependency index job queue size.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dependency_index_queue_growth_rate
This panel indicates dependency index job queue growth rate over 30m.
This value compares the rate of enqueues against the rate of finished jobs.
- A value < than 1 indicates that process rate > enqueue rate - A value = than 1 indicates that process rate = enqueue rate - A value > than 1 indicates that process rate < enqueue rate
Managed by the Sourcegraph Code-intelligence team.
Worker: Codeintel: Dependency index jobs
worker: codeintel_dependency_index_handlers
This panel indicates handler active handlers.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dependency_index_processor_total
This panel indicates handler operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dependency_index_processor_99th_percentile_duration
This panel indicates 99th percentile successful handler operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dependency_index_processor_errors_total
This panel indicates handler operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dependency_index_processor_error_rate
This panel indicates handler operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Worker: [codeintel] Janitor stats
worker: codeintel_background_upload_records_removed_total
This panel indicates lsif_upload records deleted every 5m.
Number of LSIF upload records deleted due to expiration or unreachability every 5m
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_background_index_records_removed_total
This panel indicates lsif_index records deleted every 5m.
Number of LSIF index records deleted due to expiration or unreachability every 5m
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_background_uploads_purged_total
This panel indicates lsif_upload data bundles deleted every 5m.
Number of LSIF upload data bundles purged from the codeintel-db database every 5m
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_background_errors_total
This panel indicates janitor operation errors every 5m.
Number of code intelligence janitor errors every 5m
Managed by the Sourcegraph Code-intelligence team.
Worker: Codeintel: Auto-index scheduler
worker: codeintel_index_scheduler_total
This panel indicates aggregate scheduler operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_index_scheduler_99th_percentile_duration
This panel indicates 99th percentile successful aggregate scheduler operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_index_scheduler_errors_total
This panel indicates aggregate scheduler operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_index_scheduler_error_rate
This panel indicates aggregate scheduler operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_index_scheduler_total
This panel indicates scheduler operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_index_scheduler_99th_percentile_duration
This panel indicates 99th percentile successful scheduler operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_index_scheduler_errors_total
This panel indicates scheduler operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_index_scheduler_error_rate
This panel indicates scheduler operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Worker: Codeintel: Auto-index enqueuer
worker: codeintel_autoindex_enqueuer_total
This panel indicates aggregate enqueuer operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_autoindex_enqueuer_99th_percentile_duration
This panel indicates 99th percentile successful aggregate enqueuer operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_autoindex_enqueuer_errors_total
This panel indicates aggregate enqueuer operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_autoindex_enqueuer_error_rate
This panel indicates aggregate enqueuer operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_autoindex_enqueuer_total
This panel indicates enqueuer operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_autoindex_enqueuer_99th_percentile_duration
This panel indicates 99th percentile successful enqueuer operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_autoindex_enqueuer_errors_total
This panel indicates enqueuer operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_autoindex_enqueuer_error_rate
This panel indicates enqueuer operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Worker: Codeintel: dbstore stats
worker: codeintel_dbstore_total
This panel indicates aggregate store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dbstore_99th_percentile_duration
This panel indicates 99th percentile successful aggregate store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dbstore_errors_total
This panel indicates aggregate store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dbstore_error_rate
This panel indicates aggregate store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dbstore_total
This panel indicates store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dbstore_99th_percentile_duration
This panel indicates 99th percentile successful store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dbstore_errors_total
This panel indicates store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dbstore_error_rate
This panel indicates store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Worker: Codeintel: lsifstore stats
worker: codeintel_lsifstore_total
This panel indicates aggregate store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_lsifstore_99th_percentile_duration
This panel indicates 99th percentile successful aggregate store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_lsifstore_errors_total
This panel indicates aggregate store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_lsifstore_error_rate
This panel indicates aggregate store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_lsifstore_total
This panel indicates store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_lsifstore_99th_percentile_duration
This panel indicates 99th percentile successful store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_lsifstore_errors_total
This panel indicates store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_lsifstore_error_rate
This panel indicates store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Worker: Workerutil: lsif_dependency_indexes dbworker/store stats
worker: workerutil_dbworker_store_codeintel_dependency_index_total
This panel indicates store operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: workerutil_dbworker_store_codeintel_dependency_index_99th_percentile_duration
This panel indicates 99th percentile successful store operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: workerutil_dbworker_store_codeintel_dependency_index_errors_total
This panel indicates store operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: workerutil_dbworker_store_codeintel_dependency_index_error_rate
This panel indicates store operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Worker: Codeintel: gitserver client
worker: codeintel_gitserver_total
This panel indicates aggregate client operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_gitserver_99th_percentile_duration
This panel indicates 99th percentile successful aggregate client operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_gitserver_errors_total
This panel indicates aggregate client operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_gitserver_error_rate
This panel indicates aggregate client operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_gitserver_total
This panel indicates client operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_gitserver_99th_percentile_duration
This panel indicates 99th percentile successful client operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_gitserver_errors_total
This panel indicates client operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_gitserver_error_rate
This panel indicates client operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Worker: Codeintel: Dependency repository insert
worker: codeintel_dependency_repos_total
This panel indicates aggregate insert operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dependency_repos_99th_percentile_duration
This panel indicates 99th percentile successful aggregate insert operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dependency_repos_errors_total
This panel indicates aggregate insert operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dependency_repos_error_rate
This panel indicates aggregate insert operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dependency_repos_total
This panel indicates insert operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dependency_repos_99th_percentile_duration
This panel indicates 99th percentile successful insert operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dependency_repos_errors_total
This panel indicates insert operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_dependency_repos_error_rate
This panel indicates insert operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Worker: Codeintel: lsif_upload record resetter
worker: codeintel_background_upload_record_resets_total
This panel indicates lsif_upload records reset to queued state every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_background_upload_record_reset_failures_total
This panel indicates lsif_upload records reset to errored state every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_background_upload_record_reset_errors_total
This panel indicates lsif_upload operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
Worker: Codeintel: lsif_index record resetter
worker: codeintel_background_index_record_resets_total
This panel indicates lsif_index records reset to queued state every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_background_index_record_reset_failures_total
This panel indicates lsif_index records reset to errored state every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_background_index_record_reset_errors_total
This panel indicates lsif_index operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
Worker: Codeintel: lsif_dependency_index record resetter
worker: codeintel_background_dependency_index_record_resets_total
This panel indicates lsif_dependency_index records reset to queued state every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_background_dependency_index_record_reset_failures_total
This panel indicates lsif_dependency_index records reset to errored state every 5m.
Managed by the Sourcegraph Code-intelligence team.
worker: codeintel_background_dependency_index_record_reset_errors_total
This panel indicates lsif_dependency_index operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
Worker: Codeinsights: Query Runner Queue
worker: insights_search_queue_queue_size
This panel indicates code insights search queue queue size.
Managed by the Sourcegraph Code-insights team.
worker: insights_search_queue_queue_growth_rate
This panel indicates code insights search queue queue growth rate over 30m.
This value compares the rate of enqueues against the rate of finished jobs.
- A value < than 1 indicates that process rate > enqueue rate - A value = than 1 indicates that process rate = enqueue rate - A value > than 1 indicates that process rate < enqueue rate
Managed by the Sourcegraph Code-insights team.
Worker: Codeinsights: insights queue processor
worker: insights_search_queue_handlers
This panel indicates handler active handlers.
Managed by the Sourcegraph Code-insights team.
worker: insights_search_queue_processor_total
This panel indicates handler operations every 5m.
Managed by the Sourcegraph Code-insights team.
worker: insights_search_queue_processor_99th_percentile_duration
This panel indicates 99th percentile successful handler operation duration over 5m.
Managed by the Sourcegraph Code-insights team.
worker: insights_search_queue_processor_errors_total
This panel indicates handler operation errors every 5m.
Managed by the Sourcegraph Code-insights team.
worker: insights_search_queue_processor_error_rate
This panel indicates handler operation error rate over 5m.
Managed by the Sourcegraph Code-insights team.
Worker: Codeinsights: code insights search queue record resetter
worker: insights_search_queue_record_resets_total
This panel indicates insights_search_queue records reset to queued state every 5m.
Managed by the Sourcegraph Code-insights team.
worker: insights_search_queue_record_reset_failures_total
This panel indicates insights_search_queue records reset to errored state every 5m.
Managed by the Sourcegraph Code-insights team.
worker: insights_search_queue_record_reset_errors_total
This panel indicates insights_search_queue operation errors every 5m.
Managed by the Sourcegraph Code-insights team.
Worker: Codeinsights: dbstore stats
worker: workerutil_dbworker_store_insights_query_runner_jobs_store_total
This panel indicates aggregate store operations every 5m.
Managed by the Sourcegraph Code-insights team.
worker: workerutil_dbworker_store_insights_query_runner_jobs_store_99th_percentile_duration
This panel indicates 99th percentile successful aggregate store operation duration over 5m.
Managed by the Sourcegraph Code-insights team.
worker: workerutil_dbworker_store_insights_query_runner_jobs_store_errors_total
This panel indicates aggregate store operation errors every 5m.
Managed by the Sourcegraph Code-insights team.
worker: workerutil_dbworker_store_insights_query_runner_jobs_store_error_rate
This panel indicates aggregate store operation error rate over 5m.
Managed by the Sourcegraph Code-insights team.
worker: workerutil_dbworker_store_insights_query_runner_jobs_store_total
This panel indicates store operations every 5m.
Managed by the Sourcegraph Code-insights team.
worker: workerutil_dbworker_store_insights_query_runner_jobs_store_99th_percentile_duration
This panel indicates 99th percentile successful store operation duration over 5m.
Managed by the Sourcegraph Code-insights team.
worker: workerutil_dbworker_store_insights_query_runner_jobs_store_errors_total
This panel indicates store operation errors every 5m.
Managed by the Sourcegraph Code-insights team.
worker: workerutil_dbworker_store_insights_query_runner_jobs_store_error_rate
This panel indicates store operation error rate over 5m.
Managed by the Sourcegraph Code-insights team.
Worker: Internal service requests
worker: frontend_internal_api_error_responses
This panel indicates frontend-internal API error responses every 5m by route.
Managed by the Sourcegraph Code-intelligence team.
Worker: Database connections
worker: max_open_conns
This panel indicates maximum open.
Managed by the Sourcegraph Core application team.
worker: open_conns
This panel indicates established.
Managed by the Sourcegraph Core application team.
worker: in_use
This panel indicates used.
Managed by the Sourcegraph Core application team.
worker: idle
This panel indicates idle.
Managed by the Sourcegraph Core application team.
worker: mean_blocked_seconds_per_conn_request
This panel indicates mean blocked seconds per conn request.
Managed by the Sourcegraph Core application team.
worker: closed_max_idle
This panel indicates closed by SetMaxIdleConns.
Managed by the Sourcegraph Core application team.
worker: closed_max_lifetime
This panel indicates closed by SetConnMaxLifetime.
Managed by the Sourcegraph Core application team.
worker: closed_max_idle_time
This panel indicates closed by SetConnMaxIdleTime.
Managed by the Sourcegraph Core application team.
Worker: Container monitoring (not available on server)
worker: container_missing
This panel indicates container missing.
This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.
- Kubernetes:
- Determine if the pod was OOM killed using
kubectl describe pod worker
(look forOOMKilled: true
) and, if so, consider increasing the memory limit in the relevantDeployment.yaml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingkubectl logs -p worker
.
- Determine if the pod was OOM killed using
- Docker Compose:
- Determine if the pod was OOM killed using
docker inspect -f '{{json .State}}' worker
(look for"OOMKilled":true
) and, if so, consider increasing the memory limit of the worker container indocker-compose.yml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingdocker logs worker
(note this will include logs from the previous and currently running container).
- Determine if the pod was OOM killed using
Managed by the Sourcegraph Code-intelligence team.
worker: container_cpu_usage
This panel indicates container cpu usage total (1m average) across all cores by instance.
Managed by the Sourcegraph Code-intelligence team.
worker: container_memory_usage
This panel indicates container memory usage by instance.
Managed by the Sourcegraph Code-intelligence team.
worker: fs_io_operations
This panel indicates filesystem reads and writes rate by instance over 1h.
This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.
Managed by the Sourcegraph Core application team.
Worker: Provisioning indicators (not available on server)
worker: provisioning_container_cpu_usage_long_term
This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.
Managed by the Sourcegraph Code-intelligence team.
worker: provisioning_container_memory_usage_long_term
This panel indicates container memory usage (1d maximum) by instance.
Managed by the Sourcegraph Code-intelligence team.
worker: provisioning_container_cpu_usage_short_term
This panel indicates container cpu usage total (5m maximum) across all cores by instance.
Managed by the Sourcegraph Code-intelligence team.
worker: provisioning_container_memory_usage_short_term
This panel indicates container memory usage (5m maximum) by instance.
Managed by the Sourcegraph Code-intelligence team.
Worker: Golang runtime monitoring
worker: go_goroutines
This panel indicates maximum active goroutines.
A high value here indicates a possible goroutine leak.
Managed by the Sourcegraph Code-intelligence team.
worker: go_gc_duration_seconds
This panel indicates maximum go garbage collection duration.
Managed by the Sourcegraph Code-intelligence team.
Worker: Kubernetes monitoring (only available on Kubernetes)
worker: pods_available_percentage
This panel indicates percentage pods available.
Managed by the Sourcegraph Code-intelligence team.
Repo Updater
Manages interaction with code hosts, instructs Gitserver to update repositories.
Repo Updater: Repositories
repo-updater: syncer_sync_last_time
This panel indicates time since last sync.
A high value here indicates issues synchronizing repo metadata. If the value is persistently high, make sure all external services have valid tokens.
Managed by the Sourcegraph Core application team.
repo-updater: src_repoupdater_max_sync_backoff
This panel indicates time since oldest sync.
Managed by the Sourcegraph Core application team.
repo-updater: src_repoupdater_syncer_sync_errors_total
This panel indicates site level external service sync error rate.
Managed by the Sourcegraph Core application team.
repo-updater: syncer_sync_start
This panel indicates repo metadata sync was started.
Managed by the Sourcegraph Core application team.
repo-updater: syncer_sync_duration
This panel indicates 95th repositories sync duration.
Managed by the Sourcegraph Core application team.
repo-updater: source_duration
This panel indicates 95th repositories source duration.
Managed by the Sourcegraph Core application team.
repo-updater: syncer_synced_repos
This panel indicates repositories synced.
Managed by the Sourcegraph Core application team.
repo-updater: sourced_repos
This panel indicates repositories sourced.
Managed by the Sourcegraph Core application team.
repo-updater: user_added_repos
This panel indicates total number of user added repos.
Managed by the Sourcegraph Core application team.
repo-updater: purge_failed
This panel indicates repositories purge failed.
Managed by the Sourcegraph Core application team.
repo-updater: sched_auto_fetch
This panel indicates repositories scheduled due to hitting a deadline.
Managed by the Sourcegraph Core application team.
repo-updater: sched_manual_fetch
This panel indicates repositories scheduled due to user traffic.
Check repo-updater logs if this value is persistently high. This does not indicate anything if there are no user added code hosts.
Managed by the Sourcegraph Core application team.
repo-updater: sched_known_repos
This panel indicates repositories managed by the scheduler.
Managed by the Sourcegraph Core application team.
repo-updater: sched_update_queue_length
This panel indicates rate of growth of update queue length over 5 minutes.
Managed by the Sourcegraph Core application team.
repo-updater: sched_loops
This panel indicates scheduler loops.
Managed by the Sourcegraph Core application team.
repo-updater: sched_error
This panel indicates repositories schedule error rate.
Managed by the Sourcegraph Core application team.
Repo Updater: Permissions
repo-updater: perms_syncer_perms
This panel indicates time gap between least and most up to date permissions.
Managed by the Sourcegraph Core application team.
repo-updater: perms_syncer_stale_perms
This panel indicates number of entities with stale permissions.
Managed by the Sourcegraph Core application team.
repo-updater: perms_syncer_no_perms
This panel indicates number of entities with no permissions.
Managed by the Sourcegraph Core application team.
repo-updater: perms_syncer_sync_duration
This panel indicates 95th permissions sync duration.
Managed by the Sourcegraph Core application team.
repo-updater: perms_syncer_queue_size
This panel indicates permissions sync queued items.
Managed by the Sourcegraph Core application team.
repo-updater: perms_syncer_sync_errors
This panel indicates permissions sync error rate.
Managed by the Sourcegraph Core application team.
Repo Updater: External services
repo-updater: src_repoupdater_external_services_total
This panel indicates the total number of external services.
Managed by the Sourcegraph Core application team.
repo-updater: src_repoupdater_user_external_services_total
This panel indicates the total number of user added external services.
Managed by the Sourcegraph Core application team.
repo-updater: repoupdater_queued_sync_jobs_total
This panel indicates the total number of queued sync jobs.
Managed by the Sourcegraph Core application team.
repo-updater: repoupdater_completed_sync_jobs_total
This panel indicates the total number of completed sync jobs.
Managed by the Sourcegraph Core application team.
repo-updater: repoupdater_errored_sync_jobs_percentage
This panel indicates the percentage of external services that have failed their most recent sync.
Managed by the Sourcegraph Core application team.
repo-updater: github_graphql_rate_limit_remaining
This panel indicates remaining calls to GitHub graphql API before hitting the rate limit.
Managed by the Sourcegraph Core application team.
repo-updater: github_rest_rate_limit_remaining
This panel indicates remaining calls to GitHub rest API before hitting the rate limit.
Managed by the Sourcegraph Core application team.
repo-updater: github_search_rate_limit_remaining
This panel indicates remaining calls to GitHub search API before hitting the rate limit.
Managed by the Sourcegraph Core application team.
repo-updater: github_graphql_rate_limit_wait_duration
This panel indicates time spent waiting for the GitHub graphql API rate limiter.
Indicates how long we`re waiting on the rate limit once it has been exceeded
Managed by the Sourcegraph Core application team.
repo-updater: github_rest_rate_limit_wait_duration
This panel indicates time spent waiting for the GitHub rest API rate limiter.
Indicates how long we`re waiting on the rate limit once it has been exceeded
Managed by the Sourcegraph Core application team.
repo-updater: github_search_rate_limit_wait_duration
This panel indicates time spent waiting for the GitHub search API rate limiter.
Indicates how long we`re waiting on the rate limit once it has been exceeded
Managed by the Sourcegraph Core application team.
repo-updater: gitlab_rest_rate_limit_remaining
This panel indicates remaining calls to GitLab rest API before hitting the rate limit.
Managed by the Sourcegraph Core application team.
repo-updater: gitlab_rest_rate_limit_wait_duration
This panel indicates time spent waiting for the GitLab rest API rate limiter.
Indicates how long we`re waiting on the rate limit once it has been exceeded
Managed by the Sourcegraph Core application team.
Repo Updater: Batches: dbstore stats
repo-updater: batches_dbstore_total
This panel indicates aggregate store operations every 5m.
Managed by the Sourcegraph Batches team.
repo-updater: batches_dbstore_99th_percentile_duration
This panel indicates 99th percentile successful aggregate store operation duration over 5m.
Managed by the Sourcegraph Batches team.
repo-updater: batches_dbstore_errors_total
This panel indicates aggregate store operation errors every 5m.
Managed by the Sourcegraph Batches team.
repo-updater: batches_dbstore_error_rate
This panel indicates aggregate store operation error rate over 5m.
Managed by the Sourcegraph Batches team.
repo-updater: batches_dbstore_total
This panel indicates store operations every 5m.
Managed by the Sourcegraph Batches team.
repo-updater: batches_dbstore_99th_percentile_duration
This panel indicates 99th percentile successful store operation duration over 5m.
Managed by the Sourcegraph Batches team.
repo-updater: batches_dbstore_errors_total
This panel indicates store operation errors every 5m.
Managed by the Sourcegraph Batches team.
repo-updater: batches_dbstore_error_rate
This panel indicates store operation error rate over 5m.
Managed by the Sourcegraph Batches team.
Repo Updater: Codeintel: Coursier invocation stats
repo-updater: codeintel_coursier_total
This panel indicates aggregate invocations operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
repo-updater: codeintel_coursier_99th_percentile_duration
This panel indicates 99th percentile successful aggregate invocations operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
repo-updater: codeintel_coursier_errors_total
This panel indicates aggregate invocations operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
repo-updater: codeintel_coursier_error_rate
This panel indicates aggregate invocations operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
repo-updater: codeintel_coursier_total
This panel indicates invocations operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
repo-updater: codeintel_coursier_99th_percentile_duration
This panel indicates 99th percentile successful invocations operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
repo-updater: codeintel_coursier_errors_total
This panel indicates invocations operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
repo-updater: codeintel_coursier_error_rate
This panel indicates invocations operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Repo Updater: Internal service requests
repo-updater: frontend_internal_api_error_responses
This panel indicates frontend-internal API error responses every 5m by route.
Managed by the Sourcegraph Core application team.
Repo Updater: Database connections
repo-updater: max_open_conns
This panel indicates maximum open.
Managed by the Sourcegraph Core application team.
repo-updater: open_conns
This panel indicates established.
Managed by the Sourcegraph Core application team.
repo-updater: in_use
This panel indicates used.
Managed by the Sourcegraph Core application team.
repo-updater: idle
This panel indicates idle.
Managed by the Sourcegraph Core application team.
repo-updater: mean_blocked_seconds_per_conn_request
This panel indicates mean blocked seconds per conn request.
Managed by the Sourcegraph Core application team.
repo-updater: closed_max_idle
This panel indicates closed by SetMaxIdleConns.
Managed by the Sourcegraph Core application team.
repo-updater: closed_max_lifetime
This panel indicates closed by SetConnMaxLifetime.
Managed by the Sourcegraph Core application team.
repo-updater: closed_max_idle_time
This panel indicates closed by SetConnMaxIdleTime.
Managed by the Sourcegraph Core application team.
Repo Updater: Container monitoring (not available on server)
repo-updater: container_missing
This panel indicates container missing.
This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.
- Kubernetes:
- Determine if the pod was OOM killed using
kubectl describe pod repo-updater
(look forOOMKilled: true
) and, if so, consider increasing the memory limit in the relevantDeployment.yaml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingkubectl logs -p repo-updater
.
- Determine if the pod was OOM killed using
- Docker Compose:
- Determine if the pod was OOM killed using
docker inspect -f '{{json .State}}' repo-updater
(look for"OOMKilled":true
) and, if so, consider increasing the memory limit of the repo-updater container indocker-compose.yml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingdocker logs repo-updater
(note this will include logs from the previous and currently running container).
- Determine if the pod was OOM killed using
Managed by the Sourcegraph Core application team.
repo-updater: container_cpu_usage
This panel indicates container cpu usage total (1m average) across all cores by instance.
Managed by the Sourcegraph Core application team.
repo-updater: container_memory_usage
This panel indicates container memory usage by instance.
Managed by the Sourcegraph Core application team.
repo-updater: fs_io_operations
This panel indicates filesystem reads and writes rate by instance over 1h.
This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.
Managed by the Sourcegraph Core application team.
Repo Updater: Provisioning indicators (not available on server)
repo-updater: provisioning_container_cpu_usage_long_term
This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.
Managed by the Sourcegraph Core application team.
repo-updater: provisioning_container_memory_usage_long_term
This panel indicates container memory usage (1d maximum) by instance.
Managed by the Sourcegraph Core application team.
repo-updater: provisioning_container_cpu_usage_short_term
This panel indicates container cpu usage total (5m maximum) across all cores by instance.
Managed by the Sourcegraph Core application team.
repo-updater: provisioning_container_memory_usage_short_term
This panel indicates container memory usage (5m maximum) by instance.
Managed by the Sourcegraph Core application team.
Repo Updater: Golang runtime monitoring
repo-updater: go_goroutines
This panel indicates maximum active goroutines.
A high value here indicates a possible goroutine leak.
Managed by the Sourcegraph Core application team.
repo-updater: go_gc_duration_seconds
This panel indicates maximum go garbage collection duration.
Managed by the Sourcegraph Core application team.
Repo Updater: Kubernetes monitoring (only available on Kubernetes)
repo-updater: pods_available_percentage
This panel indicates percentage pods available.
Managed by the Sourcegraph Core application team.
Searcher
Performs unindexed searches (diff and commit search, text search for unindexed branches).
searcher: unindexed_search_request_errors
This panel indicates unindexed search request errors every 5m by code.
Managed by the Sourcegraph Search team.
searcher: replica_traffic
This panel indicates requests per second over 10m.
Managed by the Sourcegraph Search team.
Searcher: Internal service requests
searcher: frontend_internal_api_error_responses
This panel indicates frontend-internal API error responses every 5m by route.
Managed by the Sourcegraph Search team.
Searcher: Container monitoring (not available on server)
searcher: container_missing
This panel indicates container missing.
This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.
- Kubernetes:
- Determine if the pod was OOM killed using
kubectl describe pod searcher
(look forOOMKilled: true
) and, if so, consider increasing the memory limit in the relevantDeployment.yaml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingkubectl logs -p searcher
.
- Determine if the pod was OOM killed using
- Docker Compose:
- Determine if the pod was OOM killed using
docker inspect -f '{{json .State}}' searcher
(look for"OOMKilled":true
) and, if so, consider increasing the memory limit of the searcher container indocker-compose.yml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingdocker logs searcher
(note this will include logs from the previous and currently running container).
- Determine if the pod was OOM killed using
Managed by the Sourcegraph Search team.
searcher: container_cpu_usage
This panel indicates container cpu usage total (1m average) across all cores by instance.
Managed by the Sourcegraph Search team.
searcher: container_memory_usage
This panel indicates container memory usage by instance.
Managed by the Sourcegraph Search team.
searcher: fs_io_operations
This panel indicates filesystem reads and writes rate by instance over 1h.
This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.
Managed by the Sourcegraph Core application team.
Searcher: Provisioning indicators (not available on server)
searcher: provisioning_container_cpu_usage_long_term
This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.
Managed by the Sourcegraph Search team.
searcher: provisioning_container_memory_usage_long_term
This panel indicates container memory usage (1d maximum) by instance.
Managed by the Sourcegraph Search team.
searcher: provisioning_container_cpu_usage_short_term
This panel indicates container cpu usage total (5m maximum) across all cores by instance.
Managed by the Sourcegraph Search team.
searcher: provisioning_container_memory_usage_short_term
This panel indicates container memory usage (5m maximum) by instance.
Managed by the Sourcegraph Search team.
Searcher: Golang runtime monitoring
searcher: go_goroutines
This panel indicates maximum active goroutines.
A high value here indicates a possible goroutine leak.
Managed by the Sourcegraph Search team.
searcher: go_gc_duration_seconds
This panel indicates maximum go garbage collection duration.
Managed by the Sourcegraph Search team.
Searcher: Kubernetes monitoring (only available on Kubernetes)
searcher: pods_available_percentage
This panel indicates percentage pods available.
Managed by the Sourcegraph Search team.
Symbols
Handles symbol searches for unindexed branches.
symbols: store_fetch_failures
This panel indicates store fetch failures every 5m.
Managed by the Sourcegraph Code-intelligence team.
symbols: current_fetch_queue_size
This panel indicates current fetch queue size.
Managed by the Sourcegraph Code-intelligence team.
Symbols: Internal service requests
symbols: frontend_internal_api_error_responses
This panel indicates frontend-internal API error responses every 5m by route.
Managed by the Sourcegraph Code-intelligence team.
Symbols: Container monitoring (not available on server)
symbols: container_missing
This panel indicates container missing.
This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.
- Kubernetes:
- Determine if the pod was OOM killed using
kubectl describe pod symbols
(look forOOMKilled: true
) and, if so, consider increasing the memory limit in the relevantDeployment.yaml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingkubectl logs -p symbols
.
- Determine if the pod was OOM killed using
- Docker Compose:
- Determine if the pod was OOM killed using
docker inspect -f '{{json .State}}' symbols
(look for"OOMKilled":true
) and, if so, consider increasing the memory limit of the symbols container indocker-compose.yml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingdocker logs symbols
(note this will include logs from the previous and currently running container).
- Determine if the pod was OOM killed using
Managed by the Sourcegraph Code-intelligence team.
symbols: container_cpu_usage
This panel indicates container cpu usage total (1m average) across all cores by instance.
Managed by the Sourcegraph Code-intelligence team.
symbols: container_memory_usage
This panel indicates container memory usage by instance.
Managed by the Sourcegraph Code-intelligence team.
symbols: fs_io_operations
This panel indicates filesystem reads and writes rate by instance over 1h.
This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.
Managed by the Sourcegraph Core application team.
Symbols: Provisioning indicators (not available on server)
symbols: provisioning_container_cpu_usage_long_term
This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.
Managed by the Sourcegraph Code-intelligence team.
symbols: provisioning_container_memory_usage_long_term
This panel indicates container memory usage (1d maximum) by instance.
Managed by the Sourcegraph Code-intelligence team.
symbols: provisioning_container_cpu_usage_short_term
This panel indicates container cpu usage total (5m maximum) across all cores by instance.
Managed by the Sourcegraph Code-intelligence team.
symbols: provisioning_container_memory_usage_short_term
This panel indicates container memory usage (5m maximum) by instance.
Managed by the Sourcegraph Code-intelligence team.
Symbols: Golang runtime monitoring
symbols: go_goroutines
This panel indicates maximum active goroutines.
A high value here indicates a possible goroutine leak.
Managed by the Sourcegraph Code-intelligence team.
symbols: go_gc_duration_seconds
This panel indicates maximum go garbage collection duration.
Managed by the Sourcegraph Code-intelligence team.
Symbols: Kubernetes monitoring (only available on Kubernetes)
symbols: pods_available_percentage
This panel indicates percentage pods available.
Managed by the Sourcegraph Code-intelligence team.
Syntect Server
Handles syntax highlighting for code files.
syntect-server: syntax_highlighting_errors
This panel indicates syntax highlighting errors every 5m.
Managed by the Sourcegraph Core application team.
syntect-server: syntax_highlighting_timeouts
This panel indicates syntax highlighting timeouts every 5m.
Managed by the Sourcegraph Core application team.
syntect-server: syntax_highlighting_panics
This panel indicates syntax highlighting panics every 5m.
Managed by the Sourcegraph Core application team.
syntect-server: syntax_highlighting_worker_deaths
This panel indicates syntax highlighter worker deaths every 5m.
Managed by the Sourcegraph Core application team.
Syntect Server: Container monitoring (not available on server)
syntect-server: container_missing
This panel indicates container missing.
This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.
- Kubernetes:
- Determine if the pod was OOM killed using
kubectl describe pod syntect-server
(look forOOMKilled: true
) and, if so, consider increasing the memory limit in the relevantDeployment.yaml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingkubectl logs -p syntect-server
.
- Determine if the pod was OOM killed using
- Docker Compose:
- Determine if the pod was OOM killed using
docker inspect -f '{{json .State}}' syntect-server
(look for"OOMKilled":true
) and, if so, consider increasing the memory limit of the syntect-server container indocker-compose.yml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingdocker logs syntect-server
(note this will include logs from the previous and currently running container).
- Determine if the pod was OOM killed using
Managed by the Sourcegraph Core application team.
syntect-server: container_cpu_usage
This panel indicates container cpu usage total (1m average) across all cores by instance.
Managed by the Sourcegraph Core application team.
syntect-server: container_memory_usage
This panel indicates container memory usage by instance.
Managed by the Sourcegraph Core application team.
syntect-server: fs_io_operations
This panel indicates filesystem reads and writes rate by instance over 1h.
This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.
Managed by the Sourcegraph Core application team.
Syntect Server: Provisioning indicators (not available on server)
syntect-server: provisioning_container_cpu_usage_long_term
This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.
Managed by the Sourcegraph Core application team.
syntect-server: provisioning_container_memory_usage_long_term
This panel indicates container memory usage (1d maximum) by instance.
Managed by the Sourcegraph Core application team.
syntect-server: provisioning_container_cpu_usage_short_term
This panel indicates container cpu usage total (5m maximum) across all cores by instance.
Managed by the Sourcegraph Core application team.
syntect-server: provisioning_container_memory_usage_short_term
This panel indicates container memory usage (5m maximum) by instance.
Managed by the Sourcegraph Core application team.
Syntect Server: Kubernetes monitoring (only available on Kubernetes)
syntect-server: pods_available_percentage
This panel indicates percentage pods available.
Managed by the Sourcegraph Core application team.
Zoekt Index Server
Indexes repositories and populates the search index.
zoekt-indexserver: repos_assigned
This panel indicates total number of repos.
Sudden changes should be caused by indexing configuration changes.
Managed by the Sourcegraph Search team.
zoekt-indexserver: repo_index_state
This panel indicates indexing results over 5m (noop=no changes, empty=no branches to index).
A persistent failing state indicates some repositories cannot be indexed, perhaps due to size and timeouts.
Managed by the Sourcegraph Search team.
zoekt-indexserver: repo_index_success_speed
This panel indicates successful indexing durations.
Latency increases can indicate bottlenecks in the indexserver.
Managed by the Sourcegraph Search team.
zoekt-indexserver: repo_index_fail_speed
This panel indicates failed indexing durations.
Failures happening after a long time indicates timeouts.
Managed by the Sourcegraph Search team.
zoekt-indexserver: average_resolve_revision_duration
This panel indicates average resolve revision duration over 5m.
Managed by the Sourcegraph Search team.
Zoekt Index Server: Container monitoring (not available on server)
zoekt-indexserver: container_missing
This panel indicates container missing.
This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.
- Kubernetes:
- Determine if the pod was OOM killed using
kubectl describe pod zoekt-indexserver
(look forOOMKilled: true
) and, if so, consider increasing the memory limit in the relevantDeployment.yaml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingkubectl logs -p zoekt-indexserver
.
- Determine if the pod was OOM killed using
- Docker Compose:
- Determine if the pod was OOM killed using
docker inspect -f '{{json .State}}' zoekt-indexserver
(look for"OOMKilled":true
) and, if so, consider increasing the memory limit of the zoekt-indexserver container indocker-compose.yml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingdocker logs zoekt-indexserver
(note this will include logs from the previous and currently running container).
- Determine if the pod was OOM killed using
Managed by the Sourcegraph Search team.
zoekt-indexserver: container_cpu_usage
This panel indicates container cpu usage total (1m average) across all cores by instance.
Managed by the Sourcegraph Search team.
zoekt-indexserver: container_memory_usage
This panel indicates container memory usage by instance.
Managed by the Sourcegraph Search team.
zoekt-indexserver: fs_io_operations
This panel indicates filesystem reads and writes rate by instance over 1h.
This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.
Managed by the Sourcegraph Core application team.
Zoekt Index Server: Provisioning indicators (not available on server)
zoekt-indexserver: provisioning_container_cpu_usage_long_term
This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.
Managed by the Sourcegraph Search team.
zoekt-indexserver: provisioning_container_memory_usage_long_term
This panel indicates container memory usage (1d maximum) by instance.
Managed by the Sourcegraph Search team.
zoekt-indexserver: provisioning_container_cpu_usage_short_term
This panel indicates container cpu usage total (5m maximum) across all cores by instance.
Managed by the Sourcegraph Search team.
zoekt-indexserver: provisioning_container_memory_usage_short_term
This panel indicates container memory usage (5m maximum) by instance.
Managed by the Sourcegraph Search team.
Zoekt Index Server: Kubernetes monitoring (only available on Kubernetes)
zoekt-indexserver: pods_available_percentage
This panel indicates percentage pods available.
Managed by the Sourcegraph Search team.
Zoekt Web Server
Serves indexed search requests using the search index.
zoekt-webserver: indexed_search_request_errors
This panel indicates indexed search request errors every 5m by code.
Managed by the Sourcegraph Search team.
Zoekt Web Server: Container monitoring (not available on server)
zoekt-webserver: container_missing
This panel indicates container missing.
This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.
- Kubernetes:
- Determine if the pod was OOM killed using
kubectl describe pod zoekt-webserver
(look forOOMKilled: true
) and, if so, consider increasing the memory limit in the relevantDeployment.yaml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingkubectl logs -p zoekt-webserver
.
- Determine if the pod was OOM killed using
- Docker Compose:
- Determine if the pod was OOM killed using
docker inspect -f '{{json .State}}' zoekt-webserver
(look for"OOMKilled":true
) and, if so, consider increasing the memory limit of the zoekt-webserver container indocker-compose.yml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingdocker logs zoekt-webserver
(note this will include logs from the previous and currently running container).
- Determine if the pod was OOM killed using
Managed by the Sourcegraph Search team.
zoekt-webserver: container_cpu_usage
This panel indicates container cpu usage total (1m average) across all cores by instance.
Managed by the Sourcegraph Search team.
zoekt-webserver: container_memory_usage
This panel indicates container memory usage by instance.
Managed by the Sourcegraph Search team.
zoekt-webserver: fs_io_operations
This panel indicates filesystem reads and writes rate by instance over 1h.
This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.
Managed by the Sourcegraph Core application team.
Zoekt Web Server: Provisioning indicators (not available on server)
zoekt-webserver: provisioning_container_cpu_usage_long_term
This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.
Managed by the Sourcegraph Search team.
zoekt-webserver: provisioning_container_memory_usage_long_term
This panel indicates container memory usage (1d maximum) by instance.
Managed by the Sourcegraph Search team.
zoekt-webserver: provisioning_container_cpu_usage_short_term
This panel indicates container cpu usage total (5m maximum) across all cores by instance.
Managed by the Sourcegraph Search team.
zoekt-webserver: provisioning_container_memory_usage_short_term
This panel indicates container memory usage (5m maximum) by instance.
Managed by the Sourcegraph Search team.
Prometheus
Sourcegraph's all-in-one Prometheus and Alertmanager service.
Prometheus: Metrics
prometheus: prometheus_rule_eval_duration
This panel indicates average prometheus rule group evaluation duration over 10m by rule group.
A high value here indicates Prometheus rule evaluation is taking longer than expected. It might indicate that certain rule groups are taking too long to evaluate, or Prometheus is underprovisioned.
Rules that Sourcegraph ships with are grouped under /sg_config_prometheus
. Custom rules are grouped under /sg_prometheus_addons
.
Managed by the Sourcegraph Distribution team.
prometheus: prometheus_rule_eval_failures
This panel indicates failed prometheus rule evaluations over 5m by rule group.
Rules that Sourcegraph ships with are grouped under /sg_config_prometheus
. Custom rules are grouped under /sg_prometheus_addons
.
Managed by the Sourcegraph Distribution team.
Prometheus: Alerts
prometheus: alertmanager_notification_latency
This panel indicates alertmanager notification latency over 1m by integration.
Managed by the Sourcegraph Distribution team.
prometheus: alertmanager_notification_failures
This panel indicates failed alertmanager notifications over 1m by integration.
Managed by the Sourcegraph Distribution team.
Prometheus: Internals
prometheus: prometheus_config_status
This panel indicates prometheus configuration reload status.
A 1
indicates Prometheus reloaded its configuration successfully.
Managed by the Sourcegraph Distribution team.
prometheus: alertmanager_config_status
This panel indicates alertmanager configuration reload status.
A 1
indicates Alertmanager reloaded its configuration successfully.
Managed by the Sourcegraph Distribution team.
prometheus: prometheus_tsdb_op_failure
This panel indicates prometheus tsdb failures by operation over 1m by operation.
Managed by the Sourcegraph Distribution team.
prometheus: prometheus_target_sample_exceeded
This panel indicates prometheus scrapes that exceed the sample limit over 10m.
Managed by the Sourcegraph Distribution team.
prometheus: prometheus_target_sample_duplicate
This panel indicates prometheus scrapes rejected due to duplicate timestamps over 10m.
Managed by the Sourcegraph Distribution team.
Prometheus: Container monitoring (not available on server)
prometheus: container_missing
This panel indicates container missing.
This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.
- Kubernetes:
- Determine if the pod was OOM killed using
kubectl describe pod prometheus
(look forOOMKilled: true
) and, if so, consider increasing the memory limit in the relevantDeployment.yaml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingkubectl logs -p prometheus
.
- Determine if the pod was OOM killed using
- Docker Compose:
- Determine if the pod was OOM killed using
docker inspect -f '{{json .State}}' prometheus
(look for"OOMKilled":true
) and, if so, consider increasing the memory limit of the prometheus container indocker-compose.yml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingdocker logs prometheus
(note this will include logs from the previous and currently running container).
- Determine if the pod was OOM killed using
Managed by the Sourcegraph Distribution team.
prometheus: container_cpu_usage
This panel indicates container cpu usage total (1m average) across all cores by instance.
Managed by the Sourcegraph Distribution team.
prometheus: container_memory_usage
This panel indicates container memory usage by instance.
Managed by the Sourcegraph Distribution team.
prometheus: fs_io_operations
This panel indicates filesystem reads and writes rate by instance over 1h.
This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.
Managed by the Sourcegraph Core application team.
Prometheus: Provisioning indicators (not available on server)
prometheus: provisioning_container_cpu_usage_long_term
This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.
Managed by the Sourcegraph Distribution team.
prometheus: provisioning_container_memory_usage_long_term
This panel indicates container memory usage (1d maximum) by instance.
Managed by the Sourcegraph Distribution team.
prometheus: provisioning_container_cpu_usage_short_term
This panel indicates container cpu usage total (5m maximum) across all cores by instance.
Managed by the Sourcegraph Distribution team.
prometheus: provisioning_container_memory_usage_short_term
This panel indicates container memory usage (5m maximum) by instance.
Managed by the Sourcegraph Distribution team.
Prometheus: Kubernetes monitoring (only available on Kubernetes)
prometheus: pods_available_percentage
This panel indicates percentage pods available.
Managed by the Sourcegraph Distribution team.
Executor
Executes jobs in an isolated environment.
Executor: Executor: Executor jobs
executor: executor_queue_size
This panel indicates unprocessed executor job queue size.
Managed by the Sourcegraph Code-intelligence team.
executor: executor_queue_growth_rate
This panel indicates unprocessed executor job queue growth rate over 30m.
This value compares the rate of enqueues against the rate of finished jobs for the selected queue.
- A value < than 1 indicates that process rate > enqueue rate - A value = than 1 indicates that process rate = enqueue rate - A value > than 1 indicates that process rate < enqueue rate
Managed by the Sourcegraph Code-intelligence team.
Executor: Executor: Executor jobs
executor: executor_handlers
This panel indicates handler active handlers.
Managed by the Sourcegraph Code-intelligence team.
executor: executor_processor_total
This panel indicates handler operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: executor_processor_99th_percentile_duration
This panel indicates 99th percentile successful handler operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: executor_processor_errors_total
This panel indicates handler operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: executor_processor_error_rate
This panel indicates handler operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Executor: Executor: Queue API client
executor: apiworker_apiclient_total
This panel indicates aggregate client operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_apiclient_99th_percentile_duration
This panel indicates 99th percentile successful aggregate client operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_apiclient_errors_total
This panel indicates aggregate client operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_apiclient_error_rate
This panel indicates aggregate client operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_apiclient_total
This panel indicates client operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_apiclient_99th_percentile_duration
This panel indicates 99th percentile successful client operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_apiclient_errors_total
This panel indicates client operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_apiclient_error_rate
This panel indicates client operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Executor: Executor: Job setup
executor: apiworker_command_total
This panel indicates aggregate command operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_99th_percentile_duration
This panel indicates 99th percentile successful aggregate command operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_errors_total
This panel indicates aggregate command operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_error_rate
This panel indicates aggregate command operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_total
This panel indicates command operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_99th_percentile_duration
This panel indicates 99th percentile successful command operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_errors_total
This panel indicates command operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_error_rate
This panel indicates command operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Executor: Executor: Job execution
executor: apiworker_command_total
This panel indicates aggregate command operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_99th_percentile_duration
This panel indicates 99th percentile successful aggregate command operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_errors_total
This panel indicates aggregate command operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_error_rate
This panel indicates aggregate command operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_total
This panel indicates command operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_99th_percentile_duration
This panel indicates 99th percentile successful command operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_errors_total
This panel indicates command operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_error_rate
This panel indicates command operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Executor: Executor: Job teardown
executor: apiworker_command_total
This panel indicates aggregate command operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_99th_percentile_duration
This panel indicates 99th percentile successful aggregate command operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_errors_total
This panel indicates aggregate command operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_error_rate
This panel indicates aggregate command operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_total
This panel indicates command operations every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_99th_percentile_duration
This panel indicates 99th percentile successful command operation duration over 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_errors_total
This panel indicates command operation errors every 5m.
Managed by the Sourcegraph Code-intelligence team.
executor: apiworker_command_error_rate
This panel indicates command operation error rate over 5m.
Managed by the Sourcegraph Code-intelligence team.
Executor: Container monitoring (not available on server)
executor: container_missing
This panel indicates container missing.
This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.
- Kubernetes:
- Determine if the pod was OOM killed using
kubectl describe pod (executor|sourcegraph-code-intel-indexers|executor-batches)
(look forOOMKilled: true
) and, if so, consider increasing the memory limit in the relevantDeployment.yaml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingkubectl logs -p (executor|sourcegraph-code-intel-indexers|executor-batches)
.
- Determine if the pod was OOM killed using
- Docker Compose:
- Determine if the pod was OOM killed using
docker inspect -f '{{json .State}}' (executor|sourcegraph-code-intel-indexers|executor-batches)
(look for"OOMKilled":true
) and, if so, consider increasing the memory limit of the (executor|sourcegraph-code-intel-indexers|executor-batches) container indocker-compose.yml
. - Check the logs before the container restarted to see if there are
panic:
messages or similar usingdocker logs (executor|sourcegraph-code-intel-indexers|executor-batches)
(note this will include logs from the previous and currently running container).
- Determine if the pod was OOM killed using
Managed by the Sourcegraph Code-intelligence team.
executor: container_cpu_usage
This panel indicates container cpu usage total (1m average) across all cores by instance.
Managed by the Sourcegraph Code-intelligence team.
executor: container_memory_usage
This panel indicates container memory usage by instance.
Managed by the Sourcegraph Code-intelligence team.
executor: fs_io_operations
This panel indicates filesystem reads and writes rate by instance over 1h.
This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.
Managed by the Sourcegraph Core application team.
Executor: Provisioning indicators (not available on server)
executor: provisioning_container_cpu_usage_long_term
This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.
Managed by the Sourcegraph Code-intelligence team.
executor: provisioning_container_memory_usage_long_term
This panel indicates container memory usage (1d maximum) by instance.
Managed by the Sourcegraph Code-intelligence team.
executor: provisioning_container_cpu_usage_short_term
This panel indicates container cpu usage total (5m maximum) across all cores by instance.
Managed by the Sourcegraph Code-intelligence team.
executor: provisioning_container_memory_usage_short_term
This panel indicates container memory usage (5m maximum) by instance.
Managed by the Sourcegraph Code-intelligence team.
Executor: Golang runtime monitoring
executor: go_goroutines
This panel indicates maximum active goroutines.
A high value here indicates a possible goroutine leak.
Managed by the Sourcegraph Code-intelligence team.
executor: go_gc_duration_seconds
This panel indicates maximum go garbage collection duration.
Managed by the Sourcegraph Code-intelligence team.
Executor: Kubernetes monitoring (only available on Kubernetes)
executor: pods_available_percentage
This panel indicates percentage pods available.
Managed by the Sourcegraph Code-intelligence team.