How to Use Crosstab and Conditional Aggregation for Data Transformation in PostgreSQL

Pivot Your PostgreSQL Data: Converting Result Sets into Multiple Rows Through Crosstab and Conditional Aggregation

·

3 min read

Pivoting a result set in PostgreSQL typically involves transforming data from a long format to a wide format, where values of a particular column become column headers with corresponding data laid out beneath them. PostgreSQL doesn't have a built-in PIVOT function like SQL Server, but similar results can be achieved using the crosstab function from the tablefunc extension or through conditional aggregation.

Method 1: Using the crosstab Function

To use the crosstab function, you first need to enable the tablefunc extension in your PostgreSQL database:

CREATE EXTENSION IF NOT EXISTS tablefunc;

The crosstab function takes a SQL query as input and transforms the row data into columns. Here's a basic example:

  1. Assume you have a sales table:

     CREATE TABLE sales (
       year INT,
       product TEXT,
       amount NUMERIC
     );
    
     INSERT INTO sales VALUES
     (2021, 'Laptop', 1000),
     (2021, 'Tablet', 500),
     (2022, 'Laptop', 1500),
     (2022, 'Tablet', 300);
    
  2. Write a query to pivot this data:

     SELECT * FROM crosstab(
       $$ SELECT year, product, amount FROM sales ORDER BY 1,2 $$,
       $$ SELECT DISTINCT product FROM sales ORDER BY 1 $$
     ) AS ct(year INT, laptop NUMERIC, tablet NUMERIC);
    

This crosstab query transforms the sales data, turning product types into column headers and listing amounts by year under each product.

Method 2: Using Conditional Aggregation

If installing extensions is not possible, or you prefer SQL standard methods, you can achieve pivoting using conditional aggregation:

SELECT
  year,
  SUM(CASE WHEN product = 'Laptop' THEN amount ELSE 0 END) AS Laptop,
  SUM(CASE WHEN product = 'Tablet' THEN amount ELSE 0 END) AS Tablet
FROM
  sales
GROUP BY
  year
ORDER BY
  year;

This method uses the CASE statement inside an aggregate function (SUM in this case) to conditionally sum the sales amounts based on product type, grouped by year.

Method 3: Dynamic SQL for Variable Categories

If the categories (e.g., product names) are not known beforehand and can vary, you might need to construct a dynamic SQL statement:

DO $$
DECLARE
  sql TEXT;
  products TEXT;
BEGIN
  SELECT STRING_AGG(DISTINCT FORMAT('SUM(CASE WHEN product = %L THEN amount ELSE 0 END) AS "%s"', product, product), ', ')
  INTO products
  FROM sales;

  sql := FORMAT('SELECT year, %s FROM sales GROUP BY year ORDER BY year', products);

  EXECUTE sql;
END $$;

This block of code dynamically constructs a SQL query based on unique product names in the sales table and executes it. It's a flexible approach but should be used with caution to avoid SQL injection risks in environments where input might be controlled by users.

Each method has its strengths and applicable scenarios. The crosstab function is powerful for known, fixed categories and is concise, but requires an extension. Conditional aggregation is more verbose but doesn't rely on additional extensions and is fully under your control for customization.