Using window functions correctly is a fantastic addition to SQL and may significantly simplify your life. In this article, we’ll talk about the variations between a GROUP BY and a PARTITION BY. Starting from the very beginning, we’ll gradually advance you to the point where you can conduct further independent research.
While writing a complex report in SQL, the PARTITION BY and GROUP BY clauses are frequently utilised. Returning the data itself is often helpful (and even necessary), but more intricate calculations are frequently needed. GROUP BY and PARTITION BY are useful in this situation. They are extremely similar in that, they both do grouping, but there are also significant distinctions. In this article, we’ll examine these variations.
Also Read: Difference Between GROUP BY and ORDER BY
GROUP BY Clause in MySQL
SQL queries employ the GROUP BY
clause to define groups based on specified criteria. These standards typically appear as categories in reports.
Examples of grouping criteria include:
- group all workers according to their annual salary.
- group all trains according to their starting station.
- group incomes and expenses by month
- group students according to the class they are registered in.
When the GROUP BY
clause is used, data is transformed into a new result set in which the original records are divided into various groups based on the criteria we offer.
On these groups, we can carry out certain additional operations or computations, most of which are closely related to aggregate operations. In a nutshell, aggregate functions are used to aggregate our data, so we lose the original specifics in the query result during the process. Although there are numerous aggregate functions, COUNT, SUM, AVG, MIN, and MAX are the most often used ones.
Let’s consider the following example. Consider the below table named TableMark with the following values:
ID | firstName | lastName | Marks |
001 | Jack | Ray | 48 |
023 | Alex | Snow | 37 |
025 | Ann | Kim | 49 |
001 | Jack | Ray | 50 |
023 | Alex | Snow | 47 |
077 | Lina | Kim | 41 |
001 | Jack | Ray | 49 |
The syntax for GROUP BY
clause is:
SELECT expression1, expression2, ... expression_n,
aggregate_function (aggregate_expression)
FROM tables
WHERE conditions
GROUP BY expression1, expression2, ... expression_n;
Code language: SQL (Structured Query Language) (sql)
Here, expressions can be column names.
Let’s run the following Query,
SELECT SUM(Marks) AS marktotal, firstname from TableMark
GROUP BY ID, firstName;
Code language: SQL (Structured Query Language) (sql)
The output will be:
marktotal | firstName |
147 | Jack |
84 | Alex |
49 | Ann |
41 | Lina |
In our main table, we have 7 rows, when we apply GROUP BY
ID, the server groups the result based on ID. As we can see, in simple terms, here GROUP BY usually reduces the number of rows returned by rolling them up and calculating SUM
for each row.
PARTITION BY Clause in MySQL
The PARTITION BY
clause is a subclause of the OVER
clause. The result set of a query is partitioned using the PARTITION BY
clause. To arrange table rows into different categories, we use this clause. Each partition is handled separately when using the window function, and each partition’s results are recalculated. To calculate aggregated values, the PARTITION BY
clause is used in combination with the OVER()
and windows methods. By using a PARTITION BY, the row-level data are kept and are not collapsed.
The syntax for PARTITION BY
clause is:
window_function ( expression ) OVER (
PARTITION BY expression1, expression2, ...
order_clause
frame_clause
)
Code language: SQL (Structured Query Language) (sql)
Here, order_clause and frame_clause can be optional. And the expression1, expression2, etc., can only refer to the columns derived by the FROM
clause. They cannot refer to expressions or aliases in the select list.
Let’s apply PARTITION BY
in our example table by using the below query,
SELECT SUM(Marks) OVER (PARTITION BY ID) AS marktotal, firstName
FROM TableMark;
Code language: SQL (Structured Query Language) (sql)
marktotal | firstName |
147 | Jack |
147 | Jack |
147 | Jack |
84 | Alex |
84 | Alex |
49 | Ann |
41 | Lina |
Here, we can see it divided the result set into partitions and original row-level details are preserved unlike GROUP BY
.
Another important thing to remember is that if you omit the PARTITION BY
clause, the entire result set is handled as a single partition.
Conclusion
Although we typically apply a GROUP BY
, there are many instances where a PARTITION BY
might be a superior option. In some cases, you could use a GROUP BY
to mimic a PARTITION BY
by employing subqueries, but these can result in highly complex searches. Let’s summarize everything with the most significant commonalities and differences:
- Similarities: Both are employed to produce aggregate values.
- Differences: Using a
GROUP BY
clause collapses original rows, whereas thePARTITION BY
clause allows us to produce aggregated values while preserving the original values. Also, thePARTITON BY
is combined withOVER()
and windows functions to add some more functionalities.
Reference
https://stackoverflow.com/questions/2404565/sql-difference-between-partition-by-and-group-by