Power and Deception of CTEs

This is a reprint of an article of mine that appeared on SQLServerCentral.com in December 2008

Introduction

A customer of mine had a performance problem with a stored procedure that uses Common Table Expressions (CTEs). The performance had gone from sub second to 10-12 secs, and the approach that was being taken to diagnose and solve the problem was seemingly going awry.


Background

OK I’ve simplified it a bit here, but essentially there are 2 tables that we are interested in. The first is the offers table

 offerid (PK)                int
 offerdescription          varchar
 offercategory              int
 offervalue                  money
 retailerid (FK to retailer table)  int

non-clustered index idx_offer_1 (offervalue)
non-clustered index idx_offer_2 (offercategory)
non-clustered index idx_offer_3 (retailerid)

and the retailers table which has about 500 rows as shown with this structure:

retailerid (PK)            int
retailername              varchar
retailerpopularity     int  1-10 for top ten 0 for others, where 10 is the most popular

non-clustered index idx_retailer_1 (retailerpopularity)

Problem

As mentioned earlier the customer was using a CTE within a stored procedure to perform paging based on retailerpopularity descending, offervalue ascending as shown below

...
 @offercategory int pass in as a parameter
 ...
 declare @startrow int
 declare @numrows int
 with cte (
    offerid,
    offerdescription,
    offercategory,
    offervalue,
    retailerid,
    retailername,
    retailerpopularity,
    row)
 as
 select
     offer.offerid,
     offer.offerdescription,
     offer.offervalue,
     retailer.retailerid,
     retailername,
     retailerpopularity,
     row_number() over
          (order by  retailer.retailerpopularity desc, offervalue asc) as row
 from
     offer join retailer on offer.retailerid = retailer.retailerid
 where
     (@offercategory is null or offercategory = @offercategory)

select * from cte
 where row between @startrow and @startrow+@numrows-1
 ...

Now this isn’t a discussion about the best way to do paging, or even the use of the row_number() function, lets just accept the code for what it is. Customer is more bothered about his website grinding to a halt, than listening to me banging on about best practices, blah, blah…

So it all seems quite straightforward; the CTE generates ordered, rownumbered rows from a join between offer and retailer, filtered by the supplied offercategory, or if none supplied, all the rows; then you select from CTE the rows you want

The customer thought everything was set up ok – PKs on the tables, indexes on the FK, indexes on the columns used to order the CTE, then he noticed that a similar query he had been running in dev/test/qa/whatever was running sub second, rather than the 10-12 secs that this one was doing. It was a *slightly* different query, but neverless was comparable, he thought… The QA query is shown below

 ...
 @offercategory int pass in as a parameter
 ...
 declare @startrow int
 declare @numrows int
 with cte (
offerid,
     offerdescription,
     offercategory,
     offervalue,
     retailerid,
     retailername,
     retailerpopularity,
     row)
 as
 select
     offer.offerid,
     offer.offerdescription,
     offer.offervalue,
     retailer.retailerid,
     retailername,
     retailerpopularity,
     row_number() over (order by  retailer.retailerpopularity desc, offervalue asc) as row
 from
     offer join retailer on offer.retailerid = retailer.retailerid
 where
     (@offercategory is null or offercategory = @offercategory)
     and retailer.retailerpopularity= 0

select * from cte
     where row between @startrow and @startrow+@numrows-1
 ...

the one difference was the extra where clause ‘and retailer.retailerpopularity= 0‘, which the customer thought would only possibly add to the processing time as this was a filter.  The live query had no filter, so should take less time?!!?

The majority of the retailers had a retailerpopularity value of ‘0’, so the number of rows in each of the different queries would be about the same. To reassure himself, he ran the 2 different underlying queries, substituting null for the parameter @offercategory….

On the live server:

 select
offer.offerid,
     offer.offerdescription,
     offer.offervalue,
     retailer.retailerid,
     retailername,
     retailerpopularity,
     row_number() over (order by  retailer.retailerpopularity desc, offervalue asc)as row
 from
     offer join retailer on offer.retailerid = retailer.retailerid
 where
     (null is null or offercategory = null)

returned 1,000,000 rows, (as expected), with a duration of 9 secs

On the QA server the query was:

 select
     offer.offerid,
     offer.offerdescription,
     offer.offervalue,
     retailer.retailerid,
     retailername,
     retailerpopularity,
     row_number() over (order by  retailer.retailerpopularity desc, offervalue asc) as row
 from
     offer join retailer on offer.retailerid = retailer.retailerid
 where
     (null is null or offercategory = null)
     and retailer.retailerpopularity= 0

returned 999,980 rows (see I told you the majority had a ‘0’ popularity), with a duration of 9 secs and now he was confused….

Both queries ran in the same time, and returned (essentially) the same number of rows – so why did the original QA proc run in less than 1 second?


Here is the deception!

I must admit, most of the developers I have spoken to about CTEs (and when I say developer I mean web developer, C#, .Net – not database developers), view them as a sort of temporary structure, a bit like declaring a temp table, or a table variable, then populating them and selecting from them. So in that paradigm, the code reads like that, i.e. create a temp structure, use it.  In this case it is assumed that the declaration of the CTE is the same as invoking it, and explains why so many times I have been asked ‘why can you only reference the CTE once? Why can’t I select from it multiple times?’. Typical procedural mindset again!

The reality is that the CTE is only invoked when it is referenced, so in our example, when we execute the code:

select * from cte
 where row between @startrow and @startrow+@numrows-1

And when you realise that, it becomes clear why you can’t reference it again. Selecting from the CTE 1 second later, 5 seconds later, 3 hours later, MAY result in a different result set, and that would cause problems for consistency and integrity!  If it was a true temporary structure(#table or @table variable) and you were still in the procedure call 5 hours later (apart from that being one hell of a long running query!!), then you could keep selecting over and over again, safe in the knowledge that the data would remain consistent.

But anyway back to the story… how does this explain the query times? Well, in the live procedure, the CTE (when invoked) does need to fully build all of the data, order it by retailerpopularity descending and offervalue ascending to determine the rownumbers, which it can only do once the tables have been joined and all matching rows found.  So in a sense it does behave in the way the customer thought (build once, query later), hence the high duration.

But in the QA proc, the optimizer has been clever!  Because of the added where clause ‘and retailer.retailerpopularity= 0‘, the CTE can effectively ignore this in the order by clause as all rows will have the same value, so the data needs only be ordered by offervalue, which has, coincidentally, an index (idx_offer_1), so the data can be retrieved using the index!  Which means the query can use this in its plan, and essentially sort the data as it ‘builds’ the CTE, so if you want 10 rows from position 450, it can use the index to jump straight to these.


Here is the power!

Because the CTE is invoked when it is referenced, it can utilise indexes at that point.

From BOL:

A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query.

Perhaps a very good candidate for paging? But as this example shows, only if indexes from one table are used.

In reality, they act pretty much like a derived table (or in-line view), but have the further ability to be self referencing, and hence are good for handling recursive and/or hierarchical problems, it’s just (I guess) that it can be read wrong by the ‘create once – read many’ procedural types!

Leave a Reply

Your email address will not be published. Required fields are marked *