MySQL GROUP BY vs PARTITION BY

GroupBy Vs PartionBY

Using window functions correctly is a fantastic addition to SQL and may significantly simplify your life. In this article, we’ll talk about the variations between a GROUP BY and a PARTITION BY. Starting from the very beginning, we’ll gradually advance you to the point where you can conduct further independent research.

While writing a complex report in SQL, the PARTITION BY and GROUP BY clauses are frequently utilised. Returning the data itself is often helpful (and even necessary), but more intricate calculations are frequently needed. GROUP BY and PARTITION BY are useful in this situation. They are extremely similar in that, they both do grouping, but there are also significant distinctions. In this article, we’ll examine these variations.

Also Read: Difference Between GROUP BY and ORDER BY

GROUP BY Clause in MySQL

SQL queries employ the GROUP BY clause to define groups based on specified criteria. These standards typically appear as categories in reports.

Examples of grouping criteria include:

  • group all workers according to their annual salary.
  • group all trains according to their starting station.
  • group incomes and expenses by month
  • group students according to the class they are registered in.

When the GROUP BY clause is used, data is transformed into a new result set in which the original records are divided into various groups based on the criteria we offer.

On these groups, we can carry out certain additional operations or computations, most of which are closely related to aggregate operations. In a nutshell, aggregate functions are used to aggregate our data, so we lose the original specifics in the query result during the process. Although there are numerous aggregate functions, COUNT, SUM, AVG, MIN, and MAX are the most often used ones.

Let’s consider the following example. Consider the below table named TableMark with the following values:

IDfirstNamelastNameMarks
001JackRay48
023AlexSnow37
025AnnKim49
001JackRay50
023AlexSnow47
077LinaKim41
001JackRay49
Table: TableMark

The syntax for GROUP BY clause is:

SELECT expression1, expression2, ... expression_n, 
       aggregate_function (aggregate_expression)
FROM tables
WHERE conditions
GROUP BY expression1, expression2, ... expression_n;Code language: SQL (Structured Query Language) (sql)

Here, expressions can be column names.

Let’s run the following Query,

SELECT SUM(Marks) AS marktotal, firstname from TableMark
GROUP BY ID, firstName;Code language: SQL (Structured Query Language) (sql)

The output will be:

marktotalfirstName
147Jack
84Alex
49Ann
41Lina
Table: Output – GROUP BY

In our main table, we have 7 rows, when we apply GROUP BY ID, the server groups the result based on ID. As we can see, in simple terms, here GROUP BY usually reduces the number of rows returned by rolling them up and calculating SUM for each row.

PARTITION BY Clause in MySQL

The PARTITION BY clause is a subclause of the OVER clause. The result set of a query is partitioned using the PARTITION BY clause. To arrange table rows into different categories, we use this clause. Each partition is handled separately when using the window function, and each partition’s results are recalculated. To calculate aggregated values, the PARTITION BY clause is used in combination with the OVER() and windows methods. By using a PARTITION BY, the row-level data are kept and are not collapsed.

The syntax for PARTITION BY clause is:

window_function ( expression ) OVER (
    PARTITION BY expression1, expression2, ...
    order_clause
    frame_clause
)
Code language: SQL (Structured Query Language) (sql)

Here, order_clause and frame_clause can be optional. And the expression1, expression2, etc., can only refer to the columns derived by the FROM clause. They cannot refer to expressions or aliases in the select list.

Let’s apply PARTITION BY in our example table by using the below query,

SELECT SUM(Marks) OVER (PARTITION BY ID) AS marktotal, firstName 
FROM TableMark;Code language: SQL (Structured Query Language) (sql)
marktotalfirstName
147Jack
147Jack
147Jack
84Alex
84Alex
49Ann
41Lina
Table: Output – PARTITION BY

Here, we can see it divided the result set into partitions and original row-level details are preserved unlike GROUP BY.

Another important thing to remember is that if you omit the PARTITION BY clause, the entire result set is handled as a single partition.

Conclusion

Although we typically apply a GROUP BY, there are many instances where a PARTITION BY might be a superior option. In some cases, you could use a GROUP BY to mimic a PARTITION BY by employing subqueries, but these can result in highly complex searches. Let’s summarize everything with the most significant commonalities and differences:

  • Similarities: Both are employed to produce aggregate values.
  • Differences: Using a GROUP BY clause collapses original rows, whereas the PARTITION BY clause allows us to produce aggregated values while preserving the original values. Also, the PARTITON BY is combined with OVER() and windows functions to add some more functionalities.

Reference

https://stackoverflow.com/questions/2404565/sql-difference-between-partition-by-and-group-by