Hierarchical Data and How to Query It in SQL

Add A Heading (19)

Hierarchical data in SQL is data represented in a tree-like structure. The hierarchical data is a parent-child structure where each child has one parent. Each node can have one, more, or zero child nodes. The hierarchical type of data is very common among file systems, product categories, etc. This article is based on the methods to handle the hierarchical data in SQL. Let’s see all the methods in detail. 

Introduction to Hierarchical Data in SQL

As we discussed in the introduction, the hierarchical data is nothing but a tree-like structure. This tree-like structure involves different types of nodes like parent/ root node, child node, etc. Let’s visualize the structure of data for better understanding. 

Hierarchical Data Representation
Hierarchical Data Representation

In the above structure, A is the root node, and B, and C are the child nodes. D, E, and F are the child nodes of B and, G and H are the child nodes of C. I have no child node in this example. Let’s see some methods to represent hierarchical data using SQL query. 

Also Read: MySQL Adjacency List Model For Managing Hierarchical Data

Methods to Query Hierarchical Data in SQL

Method 1: Using Recursive Common Table Expression [CTE]

Recursive CTEs use things that are hierarchical in structure, starting from the root nodes (s) asking them the same questions and then answering every child node. Ultimately, this leads to a multi-dimensional structure being flattened to a one-dimensional table with every node numbered as level 1 Node. This approach leads you to seamlessly query and drill down through the data hierarchies, which in return, makes it easier to reveal associated relationships and perform operations in the hierarchy. Let’s see the query first.

Two main parts of this query are the Anchor member and the Recursive member. 

  • Anchor Member: The initial query that specifies the root node(s) of the hierarchy. This part specifies the base case for recursion.
  • Recursive Member: The part that joins the CTE with itself, recursively traversing the hierarchy. It defines how to move from one level of the hierarchy to the next.
CREATE TABLE Employee_table (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    manager_id INT,  -- Reference to the parent (manager) record
    FOREIGN KEY (manager_id) REFERENCES Employee_table(id)
);

INSERT INTO Employee_table (id, name, manager_id) VALUES
(1, 'A', NULL),          
(2, 'B', 1),            
(3, 'C', 1),               
(4, 'D', 2),           
(5, 'E', 3); 

WITH RECURSIVE OrgChart AS (
    SELECT id, name, manager_id, 0 AS level
    FROM Employee_table
    WHERE manager_id IS NULL  -- Starting with the CEO
    UNION ALL
    SELECT e.id, e.name, e.manager_id, oc.level + 1
    FROM Employee_table e
    INNER JOIN OrgChart oc ON e.manager_id = oc.id
)
SELECT id, name, manager_id, level
FROM OrgChart;
Code language: SQL (Structured Query Language) (sql)

Query Explanation:

  • In the first stage of the CTE ROOT nodes (manager_id IS NULL), which are employees that do not supervise anyone, are being selected.
  • The recursive part of CTE is its ability to merge the Employees table with the OrgChart CTE, representing the manager_id foreign key relationship around the org. chart head.
  • Section the columns from the OrgChart CTE containing the ID, name, manager ID, and node level of each node of the organizational hierarchy.
  • The recursive member is the pop of the Employee table and the CTE org chart based on the manager_id foreign key.
  • It repeatedly visits the hierarchy by progressing from each level one by one. 
  • In every round, it chooses up to a k number of current nodes, which are also parents of the child nodes (employees) previously selected in the round.
  • It is incremented by 1 in each recursion so that the level defines the view of the tree to the user of as many levels as possible.
Hierarchical Data Representation Using Recursive Common Table Expression
Hierarchical Data Representation Using Recursive Common Table Expression

Method 2: Using Nested Set Model

The Nested Set Model is a technique for representing hierarchical data in SQL databases. It employs a technique similar to nested parentheses to encode the hierarchical relationships between nodes in a tree-like structure. The Nested Set Model represents each node as an interval in a linear ordering of the tree.

CREATE TABLE Employee_table (
    id INT PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE EmployeeHierarchy (
    ancestor_id INT,
    descendant_id INT,
    PRIMARY KEY (ancestor_id, descendant_id),
    FOREIGN KEY (ancestor_id) REFERENCES Employee_table(id),
    FOREIGN KEY (descendant_id) REFERENCES Employee_table(id)
);
INSERT INTO Employee_table (id, name) VALUES
(1, 'A'),
(2, 'B'),
(3, 'C'),
(4, 'D'),
(5, 'E');
INSERT INTO EmployeeHierarchy (ancestor_id, descendant_id) VALUES
(1, 1),  
(2, 2),  
(3, 3),  
(4, 4),  
(5, 5),  
(1, 2),  
(1, 3),  
(2, 4),  
(3, 5); 

SELECT * FROM EmployeeHierarchy;

Code language: SQL (Structured Query Language) (sql)
Hierarchical Data Representation Using Nested Set Model
Hierarchical Data Representation Using Nested Set Model

Summary

In this article, we have seen the methods to operate hierarchical data in SQL. Hierarchical data is widely used in different domains where data is represented as a tree-like structure. The basics of hierarchical data and diagrammatic representation are also provided for better understanding. The Recursive Common Table Expression method and Nested Set Model are employed to represent the hierarchical data in SQL. Hope you will understand the concept and enjoy the article. 

Reference

https://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database