How to Use SQL Server HierarchyID Through Easy Examples

{coding}Sight
13 min readSep 24, 2020

--

Do you still hold on to the parent/child design, or would like to try something new, like SQL Server hierarchyID? Well, it is really new because hierarchyID has been a part of SQL Server since 2008. Of course, the novelty itself is not a persuasive argument. But note that Microsoft added this feature to represent one-to-many relationships with multiple levels in a better way.

You may wonder what difference it makes and which benefits you get from using hierarchyID instead of the usual parent/child relationships. If you never explored this option, it might be surprising for you.

The truth is, I didn’t explore this option since it was released. However, when I finally did it, I found it a great innovation. It is a better-looking code, but it has much more in it. In this article, we are going to find out about all those excellent opportunities.

However, before we dive into the peculiarities of using SQL Server hierarchyID, let’s clarify its meaning and scope.

What is SQL Server HierarchyID?

SQL Server hierarchyID is a built-in data type designed to represent trees, which are the most common type of hierarchical data. Each item in a tree is called a node. In a table format, it is a row with a column of hierarchyID data type.

Usually, we demonstrate hierarchies using a table design. An ID column represents a node, and another column stands for the parent. With the SQL Server HierarchyID, we only need one column with a data type of hierarchyID.

When you query a table with a hierarchyID column, you see hexadecimal values. It is one of the visual images of a node. Another way is a string:

‘/’ stands for the root node;

‘/1/’, ‘/2/’, ‘/3/’ or ‘/n/’ stand for the children — direct descendants 1 to n;

‘/1/1/’ or ‘/1/2/’ are the “children of children — “grandchildren.” The string like ‘/1/2/’ means that the first child from the root has two children, which are, in their turn, two grandchildren of the root.

Here’s a sample of what it looks like:

Unlike other data types, hierarchyID columns can take advantage of built-in methods. For example, if you have a hierarchyID column named RankNode, you can have the following syntax:

RankNode.<methodname>.

SQL Server HierarchyID Methods

One of the available methods is IsDescendantOf. It returns 1 if the current node is a descendant of a hierarchyID value.

You can write code with this method similar to the one below:

SELECT
r.RankNode
,r.Rank
FROM dbo.Ranks r
WHERE r.RankNode.IsDescendantOf(0x58) = 1

Other methods used with hierarchyID are the following:

  • GetRoot — the static method that returns the root of the tree.
  • GetDescendant — returns a child node of a parent.
  • GetAncestor — returns a hierarchyID representing the nth ancestor of a given node.
  • GetLevel — returns an integer that represents the depth of the node.
  • ToString — returns the string with the logical representation of a node. ToString is called implicitly when the conversion from hierarchyID to the string type occurs.
  • GetReparentedValue — moves a node from the old parent to the new parent.
  • Parse — acts as the opposite of ToString. It converts the string view of a hierarchyID value to hexadecimal.

SQL Server HierarchyID Indexing Strategies

To ensure that queries for tables using hierarchyID run as fast as possible, you need to index the column. There are two indexing strategies:

DEPTH-FIRST

In a depth-first index, the subtree rows are closer to each other. It suits queries like finding a department, its subunits, and employees. Another example is a manager and its employees are stored closer together.

In a table, you can implement a depth-first index by creating a clustered index for the nodes. Further, we perform one of our examples, just like that.

BREADTH-FIRST

In a breadth-first index, the same level’s rows are closer together. It suits queries like finding all the manager’s directly reporting employees. If most of the queries are similar to this, create a clustered index based on (1) level and (2) node.

It depends on your requirements if you need a depth-first index, a breadth-first, or both. You need to balance between the importance of the queries type and the DML statements you execute on the table.

SQL Server HierarchyID Limitations

Unfortunately, using hierarchyID can’t resolve all issues:

  • SQL Server can’t guess what the child of a parent is. You have to define the tree in the table.
  • If you don’t use a unique constraint, the generated hierarchyID value won’t be unique. Handling this problem is the developer’s responsibility.
  • Relationships of a parent and child nodes are not enforced like a foreign key relationship. Hence, before deleting a node, query for any descendants existing.

Visualizing Hierarchies

Before we proceed, consider one more question. Looking at the result set with node strings, do you find the hierarchy visualizing hard for your eyes?

For me, it’s a big yes because I am not getting younger.

For this reason, we are going to use Power BI and Hierarchy Chart from Akvelon along with our database tables. They will help to display the hierarchy in an organizational chart. I hope it will make the job easier.

Now, let’s get down to business.

Uses of SQL Server HierarchyID

You can use HierarchyID with the following business scenarios:

  • Organizational structure
  • Folders, subfolders, and files
  • Tasks and subtasks in a project
  • Pages and subpages of a website
  • Geographical data with countries, regions, and cities

Even if your business scenario is similar to the above, and you rarely query across the hierarchy sections, you don’t need hierarchyID.

For example, your organization processes payrolls for employees. Do you need to access the subtree to process someone’s payroll? Not at all. However, if you process commissions of people in a multi-level marketing system, it can be different.

In this post, we use the portion of the organizational structure and the chain of command on a cruise ship. The structure was adapted from the organizational chart from here. Take a look at it in Figure 4 below:

Now you can visualize the hierarchy in question. We use the below tables throughout this post:

  • Vessels — is the table standing for the cruise ships’ list.
  • Ranks — is the table of crew ranks. There we establish hierarchies using the hierarchyID.
  • Crew — is the list of the crew of each vessel and their ranks.

The table structure of each case is as follows:

CREATE TABLE [dbo].[Vessel](
[VesselId] [int] IDENTITY(1,1) NOT NULL,
[VesselName] [varchar](20) NOT NULL,
CONSTRAINT [PK_Vessel] PRIMARY KEY CLUSTERED
(
[VesselId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Ranks](
[RankId] [int] IDENTITY(1,1) NOT NULL,
[Rank] [varchar](50) NOT NULL,
[RankNode] [hierarchyid] NOT NULL,
[RankLevel] [smallint] NOT NULL,
[ParentRankId] [int] -- this is redundant but we will use this to compare
-- with parent/child
) ON [PRIMARY]
GO
CREATE UNIQUE NONCLUSTERED INDEX [IX_RankId] ON [dbo].[Ranks]
(
[RankId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE UNIQUE CLUSTERED INDEX [IX_RankNode] ON [dbo].[Ranks]
(
[RankNode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Crew](
[CrewId] [int] IDENTITY(1,1) NOT NULL,
[CrewName] [varchar](50) NOT NULL,
[DateHired] [date] NOT NULL,
[RankId] [int] NOT NULL,
[VesselId] [int] NOT NULL,
CONSTRAINT [PK_Crew] PRIMARY KEY CLUSTERED
(
[CrewId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Crew] WITH CHECK ADD CONSTRAINT [FK_Crew_Ranks] FOREIGN KEY([RankId])
REFERENCES [dbo].[Ranks] ([RankId])
GO
ALTER TABLE [dbo].[Crew] CHECK CONSTRAINT [FK_Crew_Ranks]
GO
ALTER TABLE [dbo].[Crew] WITH CHECK ADD CONSTRAINT [FK_Crew_Vessel] FOREIGN KEY([VesselId])
REFERENCES [dbo].[Vessel] ([VesselId])
GO
ALTER TABLE [dbo].[Crew] CHECK CONSTRAINT [FK_Crew_Vessel]
GO

Inserting Table Data with SQL Server HierarchyID

The first task in using hierarchyID thoroughly is to add records into the table with a hierarchyID column. There are two ways to do it.

Using Strings

The quickest way to insert data with hierarchyID is to use strings. To see this in action, let’s add some records to the Ranks table.

INSERT INTO dbo.Ranks
([Rank], RankNode, RankLevel)
VALUES
('Captain', '/',0)
,('First Officer','/1/',1)
,('Chief Engineer','/2/',1)
,('Hotel Director','/3/',1)
,('Second Officer','/1/1/',2)
,('Second Engineer','/2/1/',2)
,('F&B Manager','/3/1/',2)
,('Chief Housekeeping','/3/2/',2)
,('Chief Purser','/3/3/',2)
,('Casino Manager','/3/4/',2)
,('Cruise Director','/3/5/',2)
,('Third Officer','/1/1/1/',3)
,('Third Engineer','/2/1/1/',3)
,('Asst. F&B Manager','/3/1/1/',3)
,('Asst. Chief Housekeeping','/3/2/1/',3)
,('First Purser','/3/3/1/',3)
,('Asst. Casino Manager','/3/4/1/',3)
,('Music Director','/3/5/1/',3)
,('Asst. Cruise Director','/3/5/2/',3)
,('Youth Staff Director','/3/5/3/',3)

The above code adds 20 records to the Ranks table.

As you can see, the tree structure has been defined in the INSERT statement above. It is discernible easily when we use strings. Besides, SQL Server converts it to the corresponding hexadecimal values.

Using Max(), GetAncestor(), and GetDescendant()

Using strings suits the task of populating the initial data. In the long run, you need the code to handle insertion without providing strings.

To do this task, get the last node used by a parent or ancestor. We accomplish it by using the functions MAX() and GetAncestor(). See the sample code below:

-- add a bartender rank reporting to the Asst. F&B Manager
DECLARE @MaxNode HIERARCHYID
DECLARE @ImmediateSuperior HIERARCHYID = 0x7AD6
SELECT @MaxNode = MAX(RankNode) FROM dbo.Ranks r
WHERE r.RankNode.GetAncestor(1) = @ImmediateSuperior
INSERT INTO dbo.Ranks
([Rank], RankNode, RankLevel)
VALUES
('Bartender', @ImmediateSuperior.GetDescendant(@MaxNode,NULL),
@ImmediateSuperior.GetDescendant(@MaxNode, NULL).GetLevel())

Below are the points taken from the above code:

  • First, you need a variable for the last node and the immediate superior.
  • The last node can be acquired using MAX() against RankNode for the specified parent or immediate superior. In our case, it’s the Assistant F&B Manager with a node value of 0x7AD6.
  • Next, to ensure no duplicate child appears, use @ImmediateSuperior.GetDescendant(@MaxNode, NULL). The value in @MaxNode is the last child. If it’s not NULL, GetDescendant() returns the next possible node value.
  • Last, GetLevel() returns the level of the new node created.

Querying Data

After adding records to our table, it’s time to query it. 2 ways to query data are available:

The query for Direct Descendants

When we look for the employees directly reporting to the manager, we need to know two things:

  • The node value of the manager or parent
  • The level of the employee under the manager

For this task, we can use the code below. The output is the list of the crew under the Hotel Director.

-- Get the list of crew directly reporting to the Hotel Director
DECLARE @Node HIERARCHYID = 0x78 -- the Hotel Director's node/hierarchyid
DECLARE @Level SMALLINT = @Node.GetLevel()
SELECT
a.CrewName
,a.DateHired
,b.Rank
,b.RankLevel
,c.VesselName
,(SELECT Rank FROM dbo.Ranks WHERE RankNode = b.RankNode.GetAncestor(1)) AS ReportsTo
FROM dbo.Crew a
INNER JOIN dbo.Ranks b ON a.RankId = b.RankId
INNER JOIN dbo.Vessel c ON a.VesselId = c.VesselId
WHERE b.RankNode.IsDescendantOf(@Node)=1
AND b.RankLevel = @Level + 1 -- add 1 for the level of the crew under the
-- Hotel Director

The result of the above code is as follows in Figure 5:

Query for Subtrees

Sometimes, you also need to list the children and the children’s children down to the bottom. To do this, you need to have the hierarchyID of the parent.

The query will be similar to the previous code but without the need to get the level. See the code example:

-- Get the list of the crew under the Hotel Director down to the lowest level
DECLARE @Node HIERARCHYID = 0x78 -- the Hotel Director's node/hierarchyid
SELECT
a.CrewName
,a.DateHired
,b.Rank
,b.RankLevel
,c.VesselName
,(SELECT Rank FROM dbo.Ranks WHERE RankNode = b.RankNode.GetAncestor(1)) AS ReportsTo
FROM dbo.Crew a
INNER JOIN dbo.Ranks b ON a.RankId = b.RankId
INNER JOIN dbo.Vessel c ON a.VesselId = c.VesselId
WHERE b.RankNode.IsDescendantOf(@Node)=1

The result of the above code:

Moving Nodes with SQL Server HierarchyID

Another standard operation with hierarchical data is moving a child or an entire subtree to another parent. However, before we proceed, please note one potential problem:

Potential Problem

  • First, moving nodes involve I/O. How frequent you move nodes can be the deciding factor if you use hierarchyID or the usual parent/child.
  • Second, moving a node in a parent/child design updates one row. At the same time, when you move a node with hierarchyID, it updates one or more rows. The number of rows affected depends on the hierarchy level depth. It can turn into a significant performance problem.

Solution

You may handle this problem with your database design.

Let’s consider the design we used here.

Instead of defining the hierarchy on the Crew table, we defined it in the Ranks table. This approach differs from the Employee table in the AdventureWorks sample database, and it offers the following advantages:

  • The crew members move more often than the ranks in a vessel. This design will reduce the movements of nodes in the hierarchy. As a result, it minimizes the problem defined above.
  • Defining more than one hierarchy in the Crew table is more complicated, as two vessels need two captains. The result is two root nodes.
  • If you need to display all ranks with the corresponding crew member, you can use a LEFT JOIN. If no one is on board for that rank, it shows an empty slot for the position.

Now, let’s move on to the objective of this section. Add child nodes under the wrong parents.

To visualize what we are about to do, imagine a hierarchy like the one below. Note the yellow nodes.

Move a Node with No Children

Moving a child node requires the following:

  • Define the hierarchyID of the child node to move.
  • Define the old parent’s hierarchyID.
  • Define the new parent’s hierarchyID.
  • Use UPDATE with GetReparentedValue() to move the node physically.

Start by moving a node with no children. In the example below, we move the Cruise Staff from under the Cruise Director to under the Asst. Cruise Director.

-- Moving a node with no child nodeDECLARE @NodeToMove HIERARCHYID
DECLARE @OldParent HIERARCHYID
DECLARE @NewParent HIERARCHYID
SELECT @NodeToMove = r.RankNode
FROM dbo.Ranks r
WHERE r.RankId = 24 -- the cruise staff
SELECT @OldParent = @NodeToMove.GetAncestor(1)SELECT @NewParent = r.RankNode
FROM dbo.Ranks r
WHERE r.RankId = 19 -- the assistant cruise director
UPDATE dbo.Ranks
SET RankNode = @NodeToMove.GetReparentedValue(@OldParent,@NewParent)
WHERE RankNode = @NodeToMove

Once the node is updated, a new hex value will be used for the node. Refreshing my Power BI connection to SQL Server — it will change the hierarchy chart as shown below:

In Figure 8, the Cruise staff no longer reports to the Cruise Director — it is changed to report to the Assistant Cruise Director. Compare it with Figure 7 above.

Now, let’s proceed to the next stage and move the Head Waiter to the Assistant F&B Manager.

Move a Node with Children

There is a challenge in this part.

The thing is, the previous code won’t work with a node with even one child. We remember that moving a node requires to update one or more children nodes.

Further, it doesn’t end there. If the new parent has an existing child, we might bump into duplicate node values.

In this example, we have to face that problem: the Asst. F&B Manager has a Bartender child node.

Ready? Here’s the code:

-- Move a node with at least one child
DECLARE @NodeToMove HIERARCHYID
DECLARE @OldParent HIERARCHYID
DECLARE @NewParent HIERARCHYID
SELECT @NodeToMove = r.RankNode
FROM dbo.Ranks r
WHERE r.RankId = 22 -- the head waiter
SELECT @OldParent = @NodeToMove.GetAncestor(1) -- head waiter's old parent
--> asst chief housekeeping
SELECT @NewParent = r.RankNode
FROM dbo.Ranks r
WHERE r.RankId = 14 -- the assistant f&b manager
DECLARE children_cursor CURSOR FOR
SELECT RankNode FROM dbo.Ranks r
WHERE RankNode.GetAncestor(1) = @OldParent;
DECLARE @ChildId hierarchyid;
OPEN children_cursor
FETCH NEXT FROM children_cursor INTO @ChildId;
WHILE @@FETCH_STATUS = 0
BEGIN
START:
DECLARE @NewId hierarchyid;
SELECT @NewId = @NewParent.GetDescendant(MAX(RankNode), NULL)
FROM dbo.Ranks r WHERE RankNode.GetAncestor(1) = @NewParent; -- ensure
--to get a new id in case there's a
--sibling
UPDATE dbo.Ranks
SET RankNode = RankNode.GetReparentedValue(@ChildId, @NewId)
WHERE RankNode.IsDescendantOf(@ChildId) = 1;

IF @@error <> 0 GOTO START -- On error, retry
FETCH NEXT FROM children_cursor INTO @ChildId;
END
CLOSE children_cursor;
DEALLOCATE children_cursor;

In the above code example, the iteration starts as the need to transfer the node down to the child at the last level.

After you run it, the Ranks table will be updated. And again, if you want to see the changes visually, refresh the Power BI report. You will see the changes similar to the one below:

Benefits of Using SQL Server HierarchyID vs. Parent/Child

To convince anyone to use a feature, we need to know the benefits.

Thus, in this section, we will compare statements using the same tables as those from the beginning. One will use hierarchyID, and the other one will use the parent/child approach. The result set will be the same for both approaches. We expect it for this exercise as that one from Figure 6 above.

Now that the requirements are precise let’s examine the benefits thoroughly.

Simpler to Code

See the code below:

-- List down all the crew under the Hotel Director using hierarchyID
SELECT
a.CrewName
,a.DateHired
,b.Rank
,b.RankLevel
,c.VesselName
,d.RANK AS ReportsTo
FROM dbo.Crew a
INNER JOIN dbo.Vessel c ON a.VesselId = c.VesselId
INNER JOIN dbo.Ranks b ON a.RankId = b.RankId
INNER JOIN dbo.Ranks d ON d.RankNode = b.RankNode.GetAncestor(1)
WHERE a.VesselId = 1
AND b.RankNode.IsDescendantOf(0x78)=1

This sample needs hierarchyID value only. You can change the value at will without changing the query.

Now, compare the statement for the parent/child approach producing the same result set:

-- List down all the crew under the Hotel Director using parent/child
SELECT
a.CrewName
,a.DateHired
,b.Rank
,b.RankLevel
,c.VesselName
,d.Rank AS ReportsTo
FROM dbo.Crew a
INNER JOIN dbo.Vessel c ON a.VesselId = c.VesselId
INNER JOIN dbo.Ranks b ON a.RankId = b.RankId
INNER JOIN dbo.Ranks d ON b.RankParentId = d.RankId
WHERE a.VesselId = 1
AND (b.RankID = 4) OR (b.RankParentID = 4 OR b.RankParentId >= 7)

What do you think? The code samples are almost the same except one point.

The WHERE clause in the second query will not be flexible to adapt if a different subtree is required.

Make the second query generic enough, and the code will be longer. Yikes!

Faster Execution

According to Microsoft, “subtree queries are significantly faster with hierarchyID” compared to parent/child. Check if it’s true — https://codingsight.com/how-to-use-sql-server-hierarchyid-through-easy-examples/.

Subscribe to our digest to get SQL Server industry insides!

--

--

{coding}Sight

Awesome blog focused on databases and Microsoft, .NET and cloud technologies. http://codingsight.com/