Closed answer technical tests and when ‘it depends’ isn’t an option

Many months ago when I was working as a SQL Server contractor, I was approached by a recruiter who was looking to fill a contract role, and as part of the initial screening, the client had asked all potential candidates to answer a simple 4-question, true-or-false SQL quiz.

When I read the question, and the instructions for answering, it made me smile. Then grimace. I couldn’t believe that such an approach was being used to screen candidates.

Lets have a look at the question …

The question

You have a SQL Server table Customer with a million rows and no indexes.
The columns are Id and Name
You run SELECT Id,Name FROM CUSTOMER


True or False Questions
1. Another user will be able to UPDATE every row in the table whilst your query is running
2. Another user will be able to SELECT every row from the table whilst your query is running
3. The query execution plan will contain a Bookmark lookup
4. Creating an index on Id would help the query run quicker

I find it very difficult to answer true or false for all of those questions, as I’ll go through in a moment, but I did so assuming what I thought the questioner was asking for. I also added a few caveats.

The recruiter replied with the answers that he had been supplied with, which confirmed what I thought the questioner was implying

1. FALSE
2. TRUE
3. FALSE
4. FALSE

Analyse this

So let’s break these down.

1. Another user will be able to UPDATE every row in the table whilst your query is running – expected answer FALSE
What is being asked for here, is if you understand that an UPDATE query requires an exclusive lock, and that that lock is not compatible with the shared lock that’s being used by the SELECT. So whilst ‘your query is running’, the UPDATE cannot change the data. But that assumes one BIG assumption. That assumes we are running in an isolation level where that behaviour is true. Now we all know that the default isolation level, READ COMMITTED, uses locks exactly like this and yes, I would agree, that answer is correct IF we assume the default isolation level. But nowhere in the question does it say that this is a standard default install, heck it doesn’t state what version!
So I can envisage a scenario where the UPDATE and SELECT would happily run together – READ COMMITTED SNAPSHOT ISOLATION. That’s the default isolation level in Azure SQL.
So whats the answer? “It depends”

2. Another user will be able to SELECT every row from the table whilst your query is running – expected answer TRUE
Again this is a lock question. Do you understand that a SELECT query will take a shared lock, which is compatible with other shared locks? That’s what drives this one. I agree with the answer here.

3. The query execution plan will contain a Bookmark lookup – expected answer FALSE
This is a test to see if you understand, or in this case, can second guess how the optimizer is going to resolve this query. Interestingly the operator that used to be called ‘Bookmark lookup’ was renamed in SQL 2005 SP1 into the 2 types: Key Lookup and RID Lookup – so you could argue that in any version since 2005, the execution plan will never contain a Bookmark lookup, regardless of the query! But presumably this is a technical question and not one of version-based trivia. A lookup (of any description) is only going to happen if the data is accessed via a non-clustered index, and the query requires more columns than are covered in the non-clustered index. Then the execution plan would have a lookup to the base table (key lookup for a clustered index, RID lookup for a heap). We’ve been told there are no indexes on the table. The only possible access is a table scan. So no, there cannot be a bookmark lookup.

4. Creating an index on Id would help the query run quicker – expected answer FALSE
Another execution plan related question. The premise here is that because we are asking for all the columns, and all the rows, then the optimizer is likely to still choose a table scan than any other method. However I can think of a number of scenarios where adding an index *might* make this quicker. (Not even getting into the discussion of what they mean by ‘quicker’)

First scenario, lets assume the table has had a lot of data modifications, resulting in the heap becoming massively fragmented. On my PC this table currently uses 4465 pages to store the data. If after heavy DELETEs, INSERTs and UPDATEs, lets assume the data is so spread out that each row is on one page – making a total of 1,000,000 pages. Now lets create a clustered-index on Customers(ID) – this will reduce the fragmentation and the data will likely be stored on approx 5000 pages – the amount of IO required to read this index will now be significantly less than when the table was a fragmented heap and will certainly be ‘quicker’ than the original query.

Second scenario – same as above but create a covering non-clustered index. Ok the question states “Creating an index on Id”, but I can get round this by creating an index just on the key field ID, and then including the name column. Technically this only creates an index on Id, as the included columns are included at the leaf level. You can’t use the index against a predicate on name. Bending the rules? Maybe, but without clarification it does fit the question. The table remains a heap, but the optimizer is more likely to choose the smaller, covering index instead. Again the reduced IO will make the query quicker.

Third scenario – regardless of the fragmentation of the table, we could create a covering non-clustered index (using same rule bending excuse as 2nd scenario) on a different filegroup, that resides on a different part of the IO subsystem. Lets assume the table is currently on magnetic, 7200 rpm spinning disks and that we create the index on fast SSDs. Now, granted, the optimizer doesn’t care less about the physical implementation of storage and as such ‘costs’ the IO the same, but with a index hint on the query, forcing the engine to choose the SSD based index, we can make the query quicker.

Fourth scenario – same as above but create a clustered index on the SSDs (essentially moving the table to a faster disk system) – in this case no index hint is needed.

Mind reading

What I find strange is that someone can ask 2 questions regarding locks, which are the method by which SQL Server manages concurrency and is inextricably linked to isolation levels, but without mentioning what isolation level should be considered. Anyone who answers the questions correctly (without guessing) has to have considered the isolation level. The other 2 questions concern data access operators in execution plans.

Boiling down the ‘test’ to 4 true/false answers gives you a 1 in 8 chance of getting 100% by purely guessing. It’s got to be much better to ask for an answer and a reasoning. After all, with so many questions about SQL Server, the answer is ‘it depends’.

And no, I never pursued the contract……

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.