Postgres lag() window function
Use lag() to access values from previous rows in a result set
The lag()
function in Postgres is a window function that allows you to access values from previous rows in a result set without the need for a self-join. It's useful for comparing values between the current row and a previous row, for example, when calculating running differences, plotting trends, or doing time series analysis.
Try it on Neon!
Neon is Serverless Postgres built for the cloud. Explore Postgres features and functions in our user-friendly SQL editor. Sign up for a free account to get started.
Function signature
The lag()
function has the following forms:
lag(value any [, offset integer [, default any ]]) over (...)
value
: The value to return from the previous row. This can be a column, expression, or subquery.offset
(optional): The number of rows back from the current row to retrieve the value from. If omitted, it defaults to 1. Must be a non-negative integer.default
(optional): The value to return when the offset goes beyond the scope of the window. If omitted, it defaults to null.over (...)
: TheOVER
clause defines the window frame for the function. It can be an emptyOVER ()
, or it can include aPARTITION BY
and/orORDER BY
clause.
Example usage
Consider a table sales
that contains daily sales data for a company. We can use lag()
to compare each day's sales to the previous day's sales.
WITH sales AS (
SELECT date '2023-01-01' AS sale_date, 1000 AS amount
UNION ALL
SELECT date '2023-01-02' AS sale_date, 1500 AS amount
UNION ALL
SELECT date '2023-01-03' AS sale_date, 1200 AS amount
UNION ALL
SELECT date '2023-01-04' AS sale_date, 1800 AS amount
)
SELECT
sale_date,
amount,
lag(amount) OVER (ORDER BY sale_date) AS prev_amount,
amount - lag(amount) OVER (ORDER BY sale_date) AS diff
FROM sales;
This query calculates the previous day's sales amount (prev_amount
) and the difference between the current day's sales and the previous day's sales (diff
). The OVER
clause specifies that the window frame should be ordered by sale_date
.
sale_date | amount | prev_amount | diff
------------+--------+-------------+-------
2023-01-01 | 1000 | |
2023-01-02 | 1500 | 1000 | 500
2023-01-03 | 1200 | 1500 | -300
2023-01-04 | 1800 | 1200 | 600
(4 rows)
You can also use lag()
to access values from rows further back by specifying an offset. For example, to compare each day's sales to the sales from the same day of the previous week:
WITH sales AS (
SELECT
sale_date,
floor(random() * 1000 + 1)::int AS amount
FROM generate_series(date '2023-01-01', date '2023-01-31', interval '1 day') AS sale_date
)
SELECT
sale_date,
amount,
lag(amount, 7) OVER (ORDER BY sale_date) AS prev_week_amount,
amount - lag(amount, 7) OVER (ORDER BY sale_date) AS diff
FROM sales
ORDER BY sale_date DESC
LIMIT 5;
This query generates random sales data for each day in January 2023 and compares each day's sales to the sales from the same day of the previous week. The lag()
function with an offset of 7 retrieves the sales amount from 7 days ago.
sale_date | amount | prev_week_amount | diff
------------------------+--------+------------------+------
2023-01-31 00:00:00+00 | 245 | 64 | 181
2023-01-30 00:00:00+00 | 736 | 789 | -53
2023-01-29 00:00:00+00 | 208 | 763 | -555
2023-01-28 00:00:00+00 | 710 | 899 | -189
2023-01-27 00:00:00+00 | 1 | 229 | -228
(5 rows)
Advanced examples
lag()
with a default value
Using When the offset in lag()
goes beyond the start of the window frame, it returns null by default. You can specify a default value to use instead, so the resulting column does not contain nulls.
WITH inventory AS (
SELECT date '2023-01-01' AS snapshot_date, 100 AS quantity
UNION ALL
SELECT date '2023-01-02' AS snapshot_date, 80 AS quantity
UNION ALL
SELECT date '2023-01-03' AS snapshot_date, 120 AS quantity
UNION ALL
SELECT date '2023-01-04' AS snapshot_date, 90 AS quantity
)
SELECT
snapshot_date,
quantity,
lag(quantity, 1, quantity) OVER (ORDER BY snapshot_date) AS prev_quantity,
quantity - lag(quantity, 1, quantity) OVER (ORDER BY snapshot_date) AS change
FROM inventory;
This query calculates the change in inventory quantity compared to the previous day. For the first row, where there is no previous quantity, it uses the current quantity as the default value, resulting in a change of 0.
snapshot_date | quantity | prev_quantity | change
---------------+----------+---------------+--------
2023-01-01 | 100 | 100 | 0
2023-01-02 | 80 | 100 | -20
2023-01-03 | 120 | 80 | 40
2023-01-04 | 90 | 120 | -30
(4 rows)
lag()
with partitioning
Using You can use lag()
with partitioning to perform calculations within groups of rows.
WITH orders AS (
SELECT 1 AS order_id, date '2023-01-01' AS order_date, 100 AS amount, 1 AS customer_id
UNION ALL
SELECT 2 AS order_id, date '2023-01-02' AS order_date, 150 AS amount, 1 AS customer_id
UNION ALL
SELECT 3 AS order_id, date '2023-01-03' AS order_date, 200 AS amount, 2 AS customer_id
UNION ALL
SELECT 4 AS order_id, date '2023-01-04' AS order_date, 120 AS amount, 1 AS customer_id
UNION ALL
SELECT 5 AS order_id, date '2023-01-05' AS order_date, 180 AS amount, 2 AS customer_id
)
SELECT
order_id,
order_date,
amount,
customer_id,
lag(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_order_date,
order_date - lag(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS days_since_last_order
FROM orders;
This query calculates the number of days since each customer's previous order. The OVER
clause partitions the data by customer_id
and orders it by order_date
within each partition.
order_id | order_date | amount | customer_id | prev_order_date | days_since_last_order
----------+------------+--------+-------------+-----------------+-----------------------
1 | 2023-01-01 | 100 | 1 | |
2 | 2023-01-02 | 150 | 1 | 2023-01-01 | 1
4 | 2023-01-04 | 120 | 1 | 2023-01-02 | 2
3 | 2023-01-03 | 200 | 2 | |
5 | 2023-01-05 | 180 | 2 | 2023-01-03 | 2
(5 rows)
Additional considerations
Correctness
The lag()
function relates each row in the result set to a previous row in the same window frame. If the window frame is not explicitly defined, the default frame is the entire result set. Make sure to specify the correct ORDER BY
and PARTITION BY
clauses to ensure the desired behavior.
Performance implications
Window functions like lag()
perform calculations across a set of rows defined by the OVER
clause. This can be computationally expensive for large datasets or complex window definitions.
To optimize performance, make sure to:
- Include an
ORDER BY
clause in theOVER
clause to avoid sorting the entire dataset. - Use partitioning (
PARTITION BY
) to divide the data into smaller chunks when possible. - Create appropriate indexes on the columns used in the
OVER
clause.
Alternative functions
- lead - Access values from subsequent rows in a result set. Similar to
lag()
but looks ahead in the partition instead of behind. first_value()
- Get the first value within a window frame.last_value()
- Get the last value within a window frame.