There is a probability that you would need to fetch only unique rows from the table. That is where the DISTINCT clause comes into action. In this tutorial, we will learn how can we use the PostgreSQL DISTINCT clause in multiple ways to fetch unique records. So, let’s get started!
Also Read: How to Create a Database in PostgreSQL
Introduction to SELECT DISTINCT Clause in PostgreSQL
You all know the SELECT clause. It is used to fetch the records from the rows. But there might be chances that the records are identical or some columns contain duplicate values, such as city, state, product category etc.
In this case, if you want to list only unique records out of all these duplicate records, we use the DISTINCT clause.
The DISTINCT clause can be applied to a single column as well as multiple columns. We will see how it works on a group of columns in this tutorial.
Before going any further make sure you have a basic knowledge of the SELECT clause, we have a separate tutorial on PostgreSQL SELECT statements if you’d like to learn about that.
Now let’s see syntax and examples of the DISTINCT clause.
Syntax of PostgreSQL DISTINCT Clause
As said, the DISTINCT clause can be used on a single column as well as multiple columns. You can also specify a DISTINCT clause only for a single column even if you mention multiple columns in the SELECT statement. Let’s see the syntax for each of these.
To apply a DISTINCT clause on a single column, use the following syntax:
SELECT DISTINCT columnName
FROM tableName;
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Here, the above query will return only unique values from the specified column.
If you want to return the unique records based on multiple columns use the following syntax:
SELECT DISTINCT col1, col2
FROM tableName;
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Here, the query will make the combination of col1 and col2, and accordingly return the unique records.
col1 | col2 |
---|---|
Red | Blue |
Red | Orange |
In the above table, col1 contains duplicate values. However, when we make the combination of col1 and col2, the pairs make unique records. In this case, the pairs are red-blue and red-orange, which are unique. Therefore, we will get both records.
Now let’s say you have to mention multiple column names but you want to return the unique records based on a particular column. In this case, you can use the DISTINCT ON clause as shown below:
SELECT DISTINCT ON (col1) colAlias, col2
FROM tableName;
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Here, Let’s assume the table contains the above col1 and col2 records.
Now, we have applied the DISTINCT clause on the col1, therefore the output will contain only a single record whichever comes first by the order. In this case, the output will be red-blue. The output may change based on the order of the records. If you sort the table in descending order based on col1 and col2, the output will be red-orange.
However, if you use the DISTINCT ON clause on col2, you will get both records in the output. This is because, col2 contains only unique values, and the duplication of col1 doesn’t matter here.
Note that, when specifying the ORDER BY clause with the DISTINCT ON clause, make sure the DISTINCT ON expression matches the initial order of the ORDER BY expression, else it will throw an error.
Examples of PostgreSQL DISTINCT Clause
Let’s create a table and insert some data into it so that we can perform operations with the DISTINCT clause on the table.
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
city VARCHAR(50) NOT NULL,
country VARCHAR(50) NOT NULL
);
INSERT INTO customers (name, city, country)
VALUES ('John', 'New York', 'USA'),
('Jane', 'London', 'UK'),
('Bob', 'Paris', 'France'),
('Alice', 'New York', 'USA'),
('Tom', 'Paris', 'France'),
('Emma', 'London', 'UK'),
('Peter', 'Sydney', 'Australia'),
('Alex', 'Sydney', 'Australia');
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Here, we have created a table ‘customers’. The table contains the customer’s names, their city and country. Note that, the city and country column contains duplicate values.
1. PostgreSQL DISTINCT Clause On Single Column
Now let’s return only unique city names from the customers table.
SELECT DISTINCT city FROM customers;
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Here, we only return a single column value, therefore the order clause is not needed as long as you want the output in any order.
2. PostgreSQL DISTINCT Clause On Multiple Columns
Now let’s apply the DISTINCT clause on multiple columns. Here, we will return the customers names and their cities which are unique.
SELECT DISTINCT name, city FROM customers ;
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Here, even if the city names are identical in the table, the combination of name and city makes that record unique. Therefore, the output will contain the duplicate city name but the combination of columns creates a unique record. Check the output:
As you can see, all the records are unique by the combination of the columns.
PostgreSQL DISTINCT ON Clause
The DISTINCT ON clause can be used on a single column when multiple columns are specified in the SELECT clause. Note that, the ORDER BY clause can affect the output of the DISTINCT ON clause. We will understand this by taking an example.
There are identical city names as well as country names. However, customers names for these duplicate records are unique.
Let’s execute the below query to generate an output by the unique cities.
SELECT DISTINCT ON (city) city, country, name
FROM customers ORDER BY city;
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
Here, we are using the ORDER BY clause to sort the records in ascending order by the city names.
Notice the highlighted area and compare it with the next example’s output.
SELECT DISTINCT ON (city) city, country, name
FROM customers ORDER BY city DESC;
Code language: PostgreSQL SQL dialect and PL/pgSQL (pgsql)
The previous query returns the result in the ascending order of the city names. Therefore, customer name Peter comes first. However, when we sort the result in descending order, the first record which comes is the record of Alex.
I hope you got the point of using the ORDER BY clause with the DISTINCT ON clause. It is a good practice to use the ORDER BY clause with it, so that you can always predict the result and bring the desired output.
Conclusion
In this tutorial, we learned the DISTINCT clause with multiple examples. There are so many things that you might have learned in this tutorial including the DISTINCT ON clause. I hope you found this tutorial interesting. If you did, don’t forget to share it with your friends!
Reference
SELECT clause official documentation