Overuse of Common Table Expressions

Facebooktwittergoogle_plusredditpinterestlinkedinmail


Let’s explore common table expressions.  Like we have shown with scalar-valued functions, the CTE may be useful, but can quickly hinder performance if not used responsibly.

What is a CTE?

Microsoft’s definition of a CTE: Specifies a temporary named result set, known as a common table expression (CTE). This is derived from a simple query and defined within the execution scope of a single SELECT, INSERT, UPDATE, or DELETE statement. This clause can also be used in a CREATE VIEW statement as part of its defining SELECT statement. A common table expression can include references to itself. This is referred to as a recursive common table expression.

What does that mean?

A CTE is essentially a defined derived table.  Derived tables are the tables which are created on the fly with the help of the Select statement.  When we defined the derived table as a CTE, it makes the final query certainly more readable.

Defining this as a CTE has one strong advantage.  The CTE has the ability to reference itself, also known as recursion.  This is certainly a powerful tool when attempting to find the results of a parent-child relationship.

For the purposes of this post, I will recognize that recursion is an advantage but the performance issues that we will address the common practice of using a CTE without the need for recursion.

Comparing CTE to Derived Table

Using the AdventureWorks2012 database, let’s take a look at the execution plan for a simple CTE querying the SalesOrderHeaderEnlarged table.

simple_cte_execution_plan

Now, let’s take a look at the execution plan for the same query written as a derived table.

 

simple_dt_execution_plan

Notice how both queries give you the same execution plan.  A CTE really is a definable derived table that may be used multiple times throughout the query.  This is the double edged sword.  Although the query may be more readable, and you can reference the CTE repeatedly, the end result is referencing the same derived table multiple times.

CTE Example

Above I have shown a simple CTE.  The SalesOrderHeaderEnlarged table is called once.  Defined like that, with smaller recordsets, a CTE is harmless.  In a situation like this, if necessary, the CTE can be used to query itself.  What happens if the CTE is referenced multiple times?  Let’s take a look at a more complex CTE.  In this query, we are looking for the net sales, order quantity, sales quota, quota differential and average quantity per sale.  We are using three CTEs.  The second and third reference the previous two.  The final query looks like a relatively simply join with four objects in the from clause.

See the statistics below.  To get 17 records in results, the duration was nearly 4 seconds with 72194 logical reads.

Text Data Duration CPU Logical Reads
CTE 3956.8800000 8880 72194

Let’s take a look at the statistics IO breakdown.  Notice how the majority of the 72194 reads come from the SalesOrderDetailEnlarged and SalesOrderHeaderEnlarged tables.

 

Below we are showing a cross section of the execution plan, along with other offenders, I wanted to show the biggest.  Notice how SalesOrderDetailEnlarged and SalesOrderHeaderEnlarged are being queried more than once in the same manor for the same data.  This is showing you the cause for the high reads.  Our goal as database programmers is to get the data with the database doing as little work as possible.  In this case, that goal is not being met.

CTE_CTE_Execution_Plan_001

 

Derived Table in place of the CTE

In this case, I have altered the CTE query to use the derived table directly rather than the CTE.   Let’s take a look at the code.  Certainly more lines of code.  We have gone from 43 to 52 lines of code.  I’m aware I use a lot of spacing, but it is consistent.  Even given that, are we really doing more?

The answer is no.  Let’s look at the basic statistics.  The duration is roughly the same, the CPU is roughly the same.  The logical reads are roughly the same and, as you see below, the execution plan is exactly the same.

Text Data Duration CPU Logical Reads
CTE 3956.8800000 8880 72194
Derived Table 3953.0510000 8421 72198

Now, let’s look at the statistics IO output.  Again, the majority of the 72198 reads come from the SalesOrderDetailEnlarged and SalesOrderHeaderEnlarged tables.

 

Let’s take a look at the execution plan.  No, I did not simply copy and reuse the same image.  They are the same execution plan.  Again, we are seeing the worst offenders executed multiple times.  Notice the parallelism as well.  SQL Server has determined that it needs to use more than one core to complete this task.  For a well written query, parallelism is a welcomed advantage.  CTEs do not always translate to a well written query.

CTE_DT_Execution_Plan_001


By rewriting the code as derived tables, we can observe through the source code, the statistics and the execution plan that CTEs are simply derived tables put into a more readable code.

CTE Alternatives

Now that we see what CTE are doing behind the scene, what alternatives do we have?  Each database developer may have different answers, but the most common would be table variables or temporary tables. Of course, this is debatable. We will explore that debate in the future.

Table Variable CTE Alternative

Let’s take a look at the code re-written to use table variables.

 

I’m all about pretty and readable code.  We’re defining the table variables, the columns and data types explicitly.  This allows the next person to see this code have a better understanding of what is going on.

Are there other benefits?  Yes there are.  Let’s take a look at the overall statistics.  Our duration isn’t much less at 3.5 milliseconds, but both our CPU and Logical Reads are less than half of what the CTE/Derived table are.

Text Data Duration CPU Logical Reads
CTE 3956.8800000 8880 72194
Derived Table 3953.0510000 8421 72198
Table Variable 3579.8140000 3573 35972

We can see that as well from the statistics IO output.  You’ll notice two stark differences.  The first is the amount of logical reads on SalesOrderDetailEnlarged and SalesOrderHeaderEnlarged.  Both are roughly 50% of what they were in the CTE/Derived Table version of the query.

If you look closely, there is one more difference.  The Scan Count.  For our main tables, the scan count is one.  For the CTE/Derived Table, the Scan Count is EIGHT.  This scan count shows that parallel processing was determined to be necessary for SQL Server to complete its task.  Table Variables to not allow parallel processing.

Now, let’s see the improvements from the execution plan below in several screenshots.

The most costly is the insert into @totalSales, replacing the CTETotalSales.  The rest also use the table variables to replace the CTE.   The end result is at least a 50% improvement in performance.

CTE_TV_Execution_Plan_001_Into_@totalSales

CTE_TV_Execution_Plan_001_Into_@targetDiff

CTE_TV_Execution_Plan_001_Into_@perUnit

CTE_TV_Execution_Plan_001_Into_FinalSelect

 

 

Temporary Table Alternative

Temporary tables are my go to choice.   The biggest advantages are the ability to create non-unique clustered/non-clustered indexes and the ability to use statistics in the execution plan.  The biggest difference that you will see in the query and execution plan is that table variables to not use parallelism.  I’ve mentioned that parallel execution and query quality are not mutually exclusive.  A well written query can use parallel execution thus taking advantage of the extra power.

Let’s take a look at the query rewritten using temporary tables.  The biggest difference, at least in the code, is the ability to add the non-unique clustered indexes to the table.

You will see that with the temporary table, the CPU and Logical Reads are little higher than Table Variables, but the duration is 50% of the CTE/Derived Table and 57% of the Table Variable.  The reason is in the parallel execution.

Text Data Duration CPU Logical Reads
CTE 3956.8800000 8880 72194
Derived Table 3953.0510000 8421 72198
Table Variable 3579.8140000 3573 35972
Temp Table 2042.6220000 4993 37112

Examining the statistics IO below, you will see the scan counts are higher for the temp tables than the table variables, but still less than the CTE/Derived Table.  The reads for our main tables are roughly the same, but slightly high.  The additional reads come from populating our temporary tables with the indexes.

 

Review the execution plan below.  The glaring difference is the use of parallelism couple with the ability to use the clustered indexes within the temporary tables.

CTE_TT_Execution_Plan_001_Into_#totalSales

CTE_TT_Execution_Plan_001_Into_#targetDiff

CTE_TT_Execution_Plan_001_Into_#perUnit

CTE_TT_Execution_Plan_001_Into_FinalSelect

Conclusion

As I have demonstrated through this series, reuse of code may make life a little easier during development, but may have adverse affects on the performance of the queries.  In this post, we’ve seen that CTE are nothing more than a derived table(with the exception of the recursive ability).  CTE may make coding a bit easier for that final join and the developer does not have to take the extra step to define the tables, but the ending result is not optimal.

Play around with your code to see what will work best for your situation.  The end goal is the make the database do the least amount of work for the desired results.  In my experience, temporary tables provide the best benefit over table variables and certainly over CTE.

More coming soon!

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *