Photo by Tobin Rogers on Unsplash
How to Use Crosstab and Conditional Aggregation for Data Transformation in PostgreSQL
Pivot Your PostgreSQL Data: Converting Result Sets into Multiple Rows Through Crosstab and Conditional Aggregation
Pivoting a result set in PostgreSQL typically involves transforming data from a long format to a wide format, where values of a particular column become column headers with corresponding data laid out beneath them. PostgreSQL doesn't have a built-in PIVOT
function like SQL Server, but similar results can be achieved using the crosstab
function from the tablefunc
extension or through conditional aggregation.
Method 1: Using the crosstab
Function
To use the crosstab
function, you first need to enable the tablefunc
extension in your PostgreSQL database:
CREATE EXTENSION IF NOT EXISTS tablefunc;
The crosstab
function takes a SQL query as input and transforms the row data into columns. Here's a basic example:
Assume you have a sales table:
CREATE TABLE sales ( year INT, product TEXT, amount NUMERIC ); INSERT INTO sales VALUES (2021, 'Laptop', 1000), (2021, 'Tablet', 500), (2022, 'Laptop', 1500), (2022, 'Tablet', 300);
Write a query to pivot this data:
SELECT * FROM crosstab( $$ SELECT year, product, amount FROM sales ORDER BY 1,2 $$, $$ SELECT DISTINCT product FROM sales ORDER BY 1 $$ ) AS ct(year INT, laptop NUMERIC, tablet NUMERIC);
This crosstab
query transforms the sales data, turning product types into column headers and listing amounts by year under each product.
Method 2: Using Conditional Aggregation
If installing extensions is not possible, or you prefer SQL standard methods, you can achieve pivoting using conditional aggregation:
SELECT
year,
SUM(CASE WHEN product = 'Laptop' THEN amount ELSE 0 END) AS Laptop,
SUM(CASE WHEN product = 'Tablet' THEN amount ELSE 0 END) AS Tablet
FROM
sales
GROUP BY
year
ORDER BY
year;
This method uses the CASE
statement inside an aggregate function (SUM
in this case) to conditionally sum the sales amounts based on product type, grouped by year.
Method 3: Dynamic SQL for Variable Categories
If the categories (e.g., product names) are not known beforehand and can vary, you might need to construct a dynamic SQL statement:
DO $$
DECLARE
sql TEXT;
products TEXT;
BEGIN
SELECT STRING_AGG(DISTINCT FORMAT('SUM(CASE WHEN product = %L THEN amount ELSE 0 END) AS "%s"', product, product), ', ')
INTO products
FROM sales;
sql := FORMAT('SELECT year, %s FROM sales GROUP BY year ORDER BY year', products);
EXECUTE sql;
END $$;
This block of code dynamically constructs a SQL query based on unique product names in the sales
table and executes it. It's a flexible approach but should be used with caution to avoid SQL injection risks in environments where input might be controlled by users.
Each method has its strengths and applicable scenarios. The crosstab
function is powerful for known, fixed categories and is concise, but requires an extension. Conditional aggregation is more verbose but doesn't rely on additional extensions and is fully under your control for customization.