Commit e6cdc2d3 authored by Tobias Schmidt's avatar Tobias Schmidt

Import querying documentation from prometheus/docs

parent 299802df
title: Configuration
sort_rank: 3
# Configuration
title: Getting started
sort_rank: 10
sort_rank: 1
# Getting started
......@@ -14,3 +14,4 @@ The documentation is available alongside all the project documentation at
- [Installing](
- [Getting started](
- [Configuration](
- [Querying](querying/
title: Installing
title: Installation
sort_rank: 2
# Installing
# Installation
## Using pre-compiled binaries
This diff is collapsed.
title: Querying basics
nav_title: Basics
sort_rank: 1
# Querying Prometheus
Prometheus provides a functional expression language that lets the user select
and aggregate time series data in real time. The result of an expression can
either be shown as a graph, viewed as tabular data in Prometheus's expression
browser, or consumed by external systems via the [HTTP API](
## Examples
This document is meant as a reference. For learning, it might be easier to
start with a couple of [examples](
## Expression language data types
In Prometheus's expression language, an expression or sub-expression can
evaluate to one of four types:
* **Instant vector** - a set of time series containing a single sample for each time series, all sharing the same timestamp
* **Range vector** - a set of time series containing a range of data points over time for each time series
* **Scalar** - a simple numeric floating point value
* **String** - a simple string value; currently unused
Depending on the use-case (e.g. when graphing vs. displaying the output of an
expression), only some of these types are legal as the result from a
user-specified expression. For example, an expression that returns an instant
vector is the only type that can be directly graphed.
## Literals
### String literals
Strings may be specified as literals in single quotes, double quotes or
PromQL follows the same [escaping rules as
Go]( In single or double quotes a
backslash begins an escape sequence, which may be followed by `a`, `b`, `f`,
`n`, `r`, `t`, `v` or `\`. Specific characters can be provided using octal
(`\nnn`) or hexadecimal (`\xnn`, `\unnnn` and `\Unnnnnnnn`).
No escaping is processed inside backticks. Unlike Go, Prometheus does not discard newlines inside backticks.
"this is a string"
'these are unescaped: \n \\ \t'
`these are not unescaped: \n ' " \t`
### Float literals
Scalar float values can be literally written as numbers of the form
## Time series Selectors
### Instant vector selectors
Instant vector selectors allow the selection of a set of time series and a
single sample value for each at a given timestamp (instant): in the simplest
form, only a metric name is specified. This results in an instant vector
containing elements for all time series that have this metric name.
This example selects all time series that have the `http_requests_total` metric
It is possible to filter these time series further by appending a set of labels
to match in curly braces (`{}`).
This example selects only those time series with the `http_requests_total`
metric name that also have the `job` label set to `prometheus` and their
`group` label set to `canary`:
It is also possible to negatively match a label value, or to match label values
against regular expressions. The following label matching operators exist:
* `=`: Select labels that are exactly equal to the provided string.
* `!=`: Select labels that are not equal to the provided string.
* `=~`: Select labels that regex-match the provided string (or substring).
* `!~`: Select labels that do not regex-match the provided string (or substring).
For example, this selects all `http_requests_total` time series for `staging`,
`testing`, and `development` environments and HTTP methods other than `GET`.
Label matchers that match empty label values also select all time series that do
not have the specific label set at all. Regex-matches are fully anchored.
Vector selectors must either specify a name or at least one label matcher
that does not match the empty string. The following expression is illegal:
{job=~".*"} # Bad!
In contrast, these expressions are valid as they both have a selector that does not
match empty label values.
{job=~".+"} # Good!
{job=~".*",method="get"} # Good!
Label matchers can also be applied to metric names by matching against the internal
`__name__` label. For example, the expression `http_requests_total` is equivalent to
`{__name__="http_requests_total"}`. Matchers other than `=` (`!=`, `=~`, `!~`) may also be used.
The following expression selects all metrics that have a name starting with `job:`:
### Range Vector Selectors
Range vector literals work like instant vector literals, except that they
select a range of samples back from the current instant. Syntactically, a range
duration is appended in square brackets (`[]`) at the end of a vector selector
to specify how far back in time values should be fetched for each resulting
range vector element.
Time durations are specified as a number, followed immediately by one of the
following units:
* `s` - seconds
* `m` - minutes
* `h` - hours
* `d` - days
* `w` - weeks
* `y` - years
In this example, we select all the values we have recorded within the last 5
minutes for all time series that have the metric name `http_requests_total` and
a `job` label set to `prometheus`:
### Offset modifier
The `offset` modifier allows changing the time offset for individual
instant and range vectors in a query.
For example, the following expression returns the value of
`http_requests_total` 5 minutes in the past relative to the current
query evaluation time:
http_requests_total offset 5m
Note that the `offset` modifier always needs to follow the selector
immediately, i.e. the following would be correct:
sum(http_requests_total{method="GET"} offset 5m) // GOOD.
While the following would be *incorrect*:
sum(http_requests_total{method="GET"}) offset 5m // INVALID.
The same works for range vectors. This returns the 5-minutes rate that
`http_requests_total` had a week ago:
rate(http_requests_total[5m] offset 1w)
## Operators
Prometheus supports many binary and aggregation operators. These are described
in detail in the [expression language operators]( page.
## Functions
Prometheus supports several functions to operate on data. These are described
in detail in the [expression language functions]( page.
## Gotchas
### Interpolation and staleness
When queries are run, timestamps at which to sample data are selected
independently of the actual present time series data. This is mainly to support
cases like aggregation (`sum`, `avg`, and so on), where multiple aggregated
time series do not exactly align in time. Because of their independence,
Prometheus needs to assign a value at those timestamps for each relevant time
series. It does so by simply taking the newest sample before this timestamp.
If no stored sample is found (by default) 5 minutes before a sampling timestamp,
no value is assigned for this time series at this point in time. This
effectively means that time series "disappear" from graphs at times where their
latest collected sample is older than 5 minutes.
NOTE: <b>NOTE:</b> Staleness and interpolation handling might change. See and
### Avoiding slow queries and overloads
If a query needs to operate on a very large amount of data, graphing it might
time out or overload the server or browser. Thus, when constructing queries
over unknown data, always start building the query in the tabular view of
Prometheus's expression browser until the result set seems reasonable
(hundreds, not thousands, of time series at most). Only when you have filtered
or aggregated your data sufficiently, switch to graph mode. If the expression
still takes too long to graph ad-hoc, pre-record it via a [recording
This is especially relevant for Prometheus's query language, where a bare
metric name selector like `api_http_requests_total` could expand to thousands
of time series with different labels. Also keep in mind that expressions which
aggregate over many time series will generate load on the server even if the
output is only a small number of time series. This is similar to how it would
be slow to sum all values of a column in a relational database, even if the
output value is only a single number.
title: Querying examples
nav_title: Examples
sort_rank: 4
# Query examples
## Simple time series selection
Return all time series with the metric `http_requests_total`:
Return all time series with the metric `http_requests_total` and the given
`job` and `handler` labels:
http_requests_total{job="apiserver", handler="/api/comments"}
Return a whole range of time (in this case 5 minutes) for the same vector,
making it a range vector:
http_requests_total{job="apiserver", handler="/api/comments"}[5m]
Note that an expression resulting in a range vector cannot be graphed directly,
but viewed in the tabular ("Console") view of the expression browser.
Using regular expressions, you could select time series only for jobs whose
name match a certain pattern, in this case, all jobs that end with `server`.
Note that this does a substring match, not a full string match:
To select all HTTP status codes except 4xx ones, you could run:
## Using functions, operators, etc.
Return the per-second rate for all time series with the `http_requests_total`
metric name, as measured over the last 5 minutes:
Assuming that the `http_requests_total` time series all have the labels `job`
(fanout by job name) and `instance` (fanout by instance of the job), we might
want to sum over the rate of all instances, so we get fewer output time series,
but still preserve the `job` dimension:
sum(rate(http_requests_total[5m])) by (job)
If we have two different metrics with the same dimensional labels, we can apply
binary operators to them and elements on both sides with the same label set
will get matched and propagated to the output. For example, this expression
returns the unused memory in MiB for every instance (on a fictional cluster
scheduler exposing these metrics about the instances it runs):
(instance_memory_limit_bytes - instance_memory_usage_bytes) / 1024 / 1024
The same expression, but summed by application, could be written like this:
instance_memory_limit_bytes - instance_memory_usage_bytes
) by (app, proc) / 1024 / 1024
If the same fictional cluster scheduler exposed CPU usage metrics like the
following for every instance:
instance_cpu_time_ns{app="lion", proc="web", rev="34d0f99", env="prod", job="cluster-manager"}
instance_cpu_time_ns{app="elephant", proc="worker", rev="34d0f99", env="prod", job="cluster-manager"}
instance_cpu_time_ns{app="turtle", proc="api", rev="4d3a513", env="prod", job="cluster-manager"}
instance_cpu_time_ns{app="fox", proc="widget", rev="4d3a513", env="prod", job="cluster-manager"}
...we could get the top 3 CPU users grouped by application (`app`) and process
type (`proc`) like this:
topk(3, sum(rate(instance_cpu_time_ns[5m])) by (app, proc))
Assuming this metric contains one time series per running instance, you could
count the number of running instances per application like this:
count(instance_cpu_time_ns) by (app)
This diff is collapsed.
title: Querying
sort_rank: 4
This diff is collapsed.
title: Recording rules
sort_rank: 6
# Defining recording rules
## Configuring rules
Prometheus supports two types of rules which may be configured and then
evaluated at regular intervals: recording rules and [alerting
rules]( To include rules in
Prometheus, create a file containing the necessary rule statements and have
Prometheus load the file via the `rule_files` field in the [Prometheus
The rule files can be reloaded at runtime by sending `SIGHUP` to the Prometheus
process. The changes are only applied if all rule files are well-formatted.
## Syntax-checking rules
To quickly check whether a rule file is syntactically correct without starting
a Prometheus server, install and run Prometheus's `promtool` command-line
utility tool:
go get
promtool check-rules /path/to/example.rules
When the file is syntactically valid, the checker prints a textual
representation of the parsed rules to standard output and then exits with
a `0` return status.
If there are any syntax errors, it prints an error message to standard error
and exits with a `1` return status. On invalid input arguments the exit status
is `2`.
## Recording rules
Recording rules allow you to precompute frequently needed or computationally
expensive expressions and save their result as a new set of time series.
Querying the precomputed result will then often be much faster than executing
the original expression every time it is needed. This is especially useful for
dashboards, which need to query the same expression repeatedly every time they
To add a new recording rule, add a line of the following syntax to your rule
<new time series name>[{<label overrides>}] = <expression to record>
Some examples:
# Saving the per-job HTTP in-progress request count as a new set of time series:
job:http_inprogress_requests:sum = sum(http_inprogress_requests) by (job)
# Drop or rewrite labels in the result time series:
new_time_series{label_to_change="new_value",label_to_drop=""} = old_time_series
Recording rules are evaluated at the interval specified by the
`evaluation_interval` field in the Prometheus configuration. During each
evaluation cycle, the right-hand-side expression of the rule statement is
evaluated at the current instant in time and the resulting sample vector is
stored as a new set of time series with the current timestamp and a new metric
name (and perhaps an overridden set of labels).
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment