This article is part of our Academy Course titled PostgreSQL Database Tutorial.
In this course, we provide a compilation of PostgreSQL tutorials that will help you set up and run your own database management system. We cover a wide range of topics, from installation and configuration, to custom commands and datatypes. With our straightforward tutorials, you will be able to get your own projects up and running in minimum time. Check it out here!
In our previous article we explained how to use JOINs to create more advanced SELECT queries. However, there are instances when using this technique to retrieve data from two or more tables does not satisfy our requirements or makes the query difficult to read – for example, if we need several JOINs or a subquery to return the desired information.
To solve this, standard SQL (note that this is not something exclusive to PostgreSQL), introduced the concept of Common Table Expressions (best known as CTE for short) in order to simplify this type of queries. In this article we will explain what CTEs are and how to use them.
1.Definition of Common Table Expressions (CTE)
Formally speaking, a CTE is a temporary result set that is created through the use of a WITH clause and is valid only during the execution of a given query. Another distinguishing feature of a CTE is that it can either reference itself (recursive CTE) or not (non-recursive CTE), providing the flexibility that common queries do not provide. A recursive CTE is often used when a calculation needs to be reported as part of the final result set, whereas a non-recursive one is usually utilized for a regular query. Additionally, its definition -meaning the fields it returns- is not stored as a separate database object.
All of these new concepts will better sink in as we illustrate them through examples, so let’s begin.
2.Non-recursive Common Table Expressions
As usual, we will use the World_db database we installed in the first article of this series. To begin, let’s consider the following query:
SELECT A.name "City", A.district "District", B.name "Country", C.language "Language" FROM city A JOIN country B ON A.countrycode=B.code JOIN countrylanguage C ON A.countrycode=C.countrycode WHERE A.name='Rosario' AND C.isofficial='TRUE';
As you can probably guess by now, it will return the city name, the district, the country, and the official language where the city name is Rosario. If you look carefully, this query uses 2 JOINs – not a bad thing in itself, but the readability certainly could use some improvements.
Our first example of a Common Table Expression will be rather basic but does the job of introducing the concept:
WITH t AS ( SELECT A.name City, A.district District, A.countrycode CountryCode, B.name Country FROM city A JOIN country B ON A.countrycode=B.code) SELECT t.City, t.District, t.Country, C.language FROM t JOIN countrylanguage C on t.CountryCode = C.countrycode WHERE t.City='Rosario' AND C.isofficial='TRUE';
Before we go into PostgreSQL and run the above query, let’s split it into two parts to explain what is happening.
Step 1 – Define the CTE using the WITH clause. For simplicity, we will name the CTE as t, but you can use other name if you want.
WITH t AS ( SELECT A.name City, A.district District, A.countrycode CountryCode, B.name Country FROM city A JOIN country B ON A.countrycode=B.code)
If we were to do a
SELECT * FROM t; at this point, we would get all the cities with their corresponding district and country. You may well be saying to yourself, “Then I don’t see what’s the point in using CTEs” – but wait, Step 2 will shed some light on the why.
Step 2 – Select the fields from the CTE and perform a JOIN with another table. As the CTE can be considered a temporary result set, we can perform JOINs on other tables. However, in this case we can use the more descriptive names given by the CTE instead of the original table names (are you seeing the readability improvements already?). Since both the city and country tables contain a field called name, the CTE allows us to refer to the city and country names as City and Country instead.
SELECT t.City, t.District, t.Country, C.language FROM t JOIN countrylanguage C on t.CountryCode = C.countrycode WHERE t.City='Rosario' AND C.isofficial='TRUE';
As you can see in Fig. 1, the result is identical to the original query:
Recursive Common Table Expressions
A recursive CTE references itself usually via a WITH clause referring to its own output. To better illustrate through an example, we are going to create a new database and table named College and CollegeClasses, respectively, and populate the former with dummy data as follows:
CREATE TABLE CollegeClasses ( ClassID serial PRIMARY KEY, ClassDescription VARCHAR NOT NULL, ClassParentID INT ); INSERT INTO CollegeClasses ( ClassDescription, ClassParentID ) VALUES ('Calculus 1', NULL), ('Algebra 1', 1), ('Analytic Geometry', 1), ('Physics 1', 1), ('Statistics', 1), ('Algebra 2', 2), ('Discrete Math', 2), ('Programming 1', 2), ('Programming 2', 2), ('Advanced Geometry', 3), ('Control systems', 3), ('English as a Second Language 1', 3), ('Literature', 3), ('Physics 2', 4), ('Calculus 2', 4), ('Graphs and Math', 7), ('English as a Second Language 2', 7), ('Basic algorithms', 8), ('Advanced algorithms', 8), ('Programming with C', 8);
In this case we’re interested in retrieving a list of classes and their children down to a given level. For example, we will start with Algebra 1 (ClassID=2) and descend down to the last class that depends on it:
WITH RECURSIVE classes AS ( SELECT ClassID, ClassParentID, ClassDescription FROM CollegeClasses WHERE ClassID = 2 UNION SELECT e.ClassID, e.ClassParentID, e.ClassDescription FROM CollegeClasses e INNER JOIN classes s ON s.ClassID = e.ClassParentID ) SELECT * FROM classes;
This query, as in the previous section, deserves a detailed explanation. Let’s begin by saying a recursive CTE consists of 4 components:
#1 – A non-recursive query. In this case, it is a query to retrieve the CollegeClass information where ClassID=2:
SELECT ClassID, ClassParentID, ClassDescription FROM CollegeClasses WHERE ClassID = 2
#2 – The UNION or UNION ALL operator. Any of these operators allows us to combine one or more result sets into a single one. The choice of one above the other will depend on whether you want to avoid duplicates (if any) or return them, respectively.
#3 – The recursive term. Note that the classes temporary table references itself in this part of the CTE:
SELECT e.ClassID, e.ClassParentID, e.ClassDescription FROM CollegeClasses e INNER JOIN classes s ON s.ClassID = e.ClassParentID
#4 – The final statement, which is executed once the iterations in Part 3 have finished. In this case,
SELECT * FROM classes;
That said, let’s take a look at the result of the query (see Fig. 2) and examine it to see if it meets our expectations:
As we can see, the result set begins with ClassID=2, and shows all its children (ClassID=6, 7, 8, and 9). Then it shows all the children of this last set as well.
In this article we have explained how to create recursive and non-recursive Common Table Expressions in PostgreSQL. As you pursue the study of this topic, keep in mind that using CTEs is not a matter of improving performance, but readability and maintainability.
Hope it helps!