Using window functions correctly is a fantastic addition to SQL and may significantly simplify your life. In this article, we’ll talk about the variations between a GROUP BY and a PARTITION BY. Starting from the very beginning, we’ll gradually advance you to the point where you can conduct further independent research.
While writing a complex report in SQL, the PARTITION BY and GROUP BY clauses are frequently utilised. Returning the data itself is often helpful (and even necessary), but more intricate calculations are frequently needed. GROUP BY and PARTITION BY are useful in this situation. They are extremely similar in that, they both do grouping, but there are also significant distinctions. In this article, we’ll examine these variations.
Also Read: Difference Between GROUP BY and ORDER BY
GROUP BY Clause in MySQL
SQL queries employ the
GROUP BY clause to define groups based on specified criteria. These standards typically appear as categories in reports.
Examples of grouping criteria include:
- group all workers according to their annual salary.
- group all trains according to their starting station.
- group incomes and expenses by month
- group students according to the class they are registered in.
GROUP BY clause is used, data is transformed into a new result set in which the original records are divided into various groups based on the criteria we offer.
On these groups, we can carry out certain additional operations or computations, most of which are closely related to aggregate operations. In a nutshell, aggregate functions are used to aggregate our data, so we lose the original specifics in the query result during the process. Although there are numerous aggregate functions, COUNT, SUM, AVG, MIN, and MAX are the most often used ones.
Let’s consider the following example. Consider the below table named TableMark with the following values:
The syntax for
GROUP BY clause is:
SELECT expression1, expression2, ... expression_n,
GROUP BY expression1, expression2, ... expression_n;Code language: SQL (Structured Query Language) (sql)
Here, expressions can be column names.
Let’s run the following Query,
SELECT SUM(Marks) AS marktotal, firstname from TableMark
GROUP BY ID, firstName;Code language: SQL (Structured Query Language) (sql)
The output will be:
In our main table, we have 7 rows, when we apply
GROUP BY ID, the server groups the result based on ID. As we can see, in simple terms, here GROUP BY usually reduces the number of rows returned by rolling them up and calculating
SUM for each row.
PARTITION BY Clause in MySQL
PARTITION BY clause is a subclause of the
OVER clause. The result set of a query is partitioned using the
PARTITION BY clause. To arrange table rows into different categories, we use this clause. Each partition is handled separately when using the window function, and each partition’s results are recalculated. To calculate aggregated values, the
PARTITION BY clause is used in combination with the
OVER() and windows methods. By using a PARTITION BY, the row-level data are kept and are not collapsed.
The syntax for
PARTITION BY clause is:
window_function ( expression ) OVER (
PARTITION BY expression1, expression2, ...
Code language: SQL (Structured Query Language) (sql)
Here, order_clause and frame_clause can be optional. And the expression1, expression2, etc., can only refer to the columns derived by the
FROM clause. They cannot refer to expressions or aliases in the select list.
PARTITION BY in our example table by using the below query,
SELECT SUM(Marks) OVER (PARTITION BY ID) AS marktotal, firstName
FROM TableMark;Code language: SQL (Structured Query Language) (sql)
Here, we can see it divided the result set into partitions and original row-level details are preserved unlike
Another important thing to remember is that if you omit the
PARTITION BY clause, the entire result set is handled as a single partition.
Although we typically apply a
GROUP BY, there are many instances where a
PARTITION BY might be a superior option. In some cases, you could use a
GROUP BY to mimic a
PARTITION BY by employing subqueries, but these can result in highly complex searches. Let’s summarize everything with the most significant commonalities and differences:
- Similarities: Both are employed to produce aggregate values.
- Differences: Using a
GROUP BYclause collapses original rows, whereas the
PARTITION BYclause allows us to produce aggregated values while preserving the original values. Also, the
PARTITON BYis combined with
OVER()and windows functions to add some more functionalities.