Trusting Constraints - Riley Waterhouse Limited

In my last post, I showed how a check constraint can be used by the optimizer to completely rule out any data access, however there are some rules to abide by.

Lets recap

We created a check constraint on Person.Person table in AdventureWorks2014 so that when we queried for any rows with a ModifiedDate earlier than 1 January 2006, the Query Optimizer knew that no rows could be returned, and gave us an empty result set much quicker.

A problem can arise though when we temporarily disable a check constraint. Why would we want to do that, how do we do that, and what is the problem it causes? Read on!

So lets check that we have the constraint on the table and it works as expected

if object_id('Person.ModifiedDateConstraint','C') is not null  
    alter table Person.Person drop constraint ModifiedDateConstraint;
go
alter table Person.Person with check
add constraint ModifiedDateConstraint check (ModifiedDate > '1 jan 2006');


--turn actual execution plan on
select
   BusinessEntityID ,
   Title ,
   FirstName ,
   LastName
from Person.Person
where ModifiedDate < '1 jan 2006'

so all good – the optimizer is just checking against the constant.

Now imagine we need to load this table with a lot of data, quickly. One step might be to disable any check constraints so that the data loads faster.

--turn off constraint
alter table Person.Person nocheck constraint ModifiedDateConstraint

then we load some data

insert into Person.Person
        ( BusinessEntityID ,
          PersonType ,
          NameStyle ,
          Title ,
          FirstName ,
          MiddleName ,
          LastName ,
          Suffix ,
          EmailPromotion ,
          AdditionalContactInfo ,
          Demographics ,
          rowguid ,
          ModifiedDate
        )
values  (1,'EM',0,'Mr','Kevan','','Riley',0,null,null,newid(), getdate())

and re-enable the constraint

--turn constraint back on
alter table Person.Person check constraint ModifiedDateConstraint

Now in this simple example, we can see that I used getdate() to populate the ModifiedDate field, and that is still in line with the rule we set in the check constraint. So what happens when we run the original query

--rerun query
set statistics io on;
select
   BusinessEntityID ,
   Title ,
   FirstName ,
   LastName
from Person.Person
where ModifiedDate < '1 jan 2006'

BusinessEntityID Title FirstName LastName
------------------------------------------------
(0 row(s) affected) 

Table 'Person'. Scan count 1, logical reads 3819, physical reads 0, 
     read-ahead reads 0, lob logical reads 0, lob physical reads 0, 
     lob read-ahead reads 0.

The optimizer has reverted back to the clustered index scan, and we have table access as is shown in the set statistics io output. What?! Why?

Whilst the constraint was disabled, any data could have been put into the table, including a ModifiedDate that was earlier than 1 Jan 2006. When we re-enabled the constraint, all we are doing is putting the rule back on so that any further updates to the table will have to adhere to the constraint. SQL Server did not go back over all of the data that was already in the table to make sure that I hadn’t broken any rules. So essentially, SQL no longer trusts that the constraint has been applied for every row, and when the optimizer is asked to generate an execution plan for that query – it HAS to read every row – just like we had before we applied the constraint in the first place.

What can we do?

We need to validate all the data when we reapply the constraint, and that is done by the adding of with check into the SQL command

--turn constraint back on, with check
alter table Person.Person with check check constraint ModifiedDateConstraint

--rerun query
set statistics io on;
select
BusinessEntityID ,
Title ,
FirstName ,
LastName
from Person.Person
where ModifiedDate &amp;lt; '1 jan 2006'

Back to good times!

Luckily for us, SQL tells us when it no longer trusts a constraint, with the wonderfully-named is_not_trusted column on sys.check_constraints

select [name], is_not_trusted from sys.check_constraints where name = 'ModifiedDateConstraint' 

name                         is_not_trusted 
---------------------------- -------------- 
ModifiedDateConstraint       0 

(1 row(s) affected)

0 = false => not (is not trusted) = is trusted!

And this gives us a way of checking all the check constraints for the database

select
*
from sys.check_constraints
where is_not_trusted = 1

So try it out for yourself, run the queries from above and check is_not_trusted after you disable the constraint. You’ll find it immediately gets set to 1, without even running the additional query to add any data.

--turn off constraint
alter table Person.Person nocheck constraint ModifiedDateConstraint


select
[name], is_not_trusted
from sys.check_constraints
where name = 'ModifiedDateConstraint'


name                         is_not_trusted
---------------------------- --------------
ModifiedDateConstraint       1

(1 row(s) affected)

Checking for trusted constraints is one of the checks I run whenever I perform a database server health check, and if they have been turned off inadvertently by a process like above, then you can realise some performance gains simply by turning them back on.

Lets recap

What can we do?

Leave a Reply Cancel reply