Building a string of values in T-SQL

It is often far too easy to experience some behaviour in SQL Server, and then extrapolate your belief from that. One example that is seen frequently on forums is the simplistic case of concatenating values into a scalar variable, however the logic that is commonly assumed is false.

Take the example

USE tempdb
GO

IF OBJECT_ID('CharsTable','U') IS NOT NULL DROP TABLE CharsTable

CREATE TABLE CharsTable (letter CHAR(1))

INSERT INTO CharsTable SELECT 'a'
INSERT INTO CharsTable SELECT 'b'
INSERT INTO CharsTable SELECT 'c'
INSERT INTO CharsTable SELECT 'd'
INSERT INTO CharsTable SELECT 'e'
INSERT INTO CharsTable SELECT 'f'

DECLARE @string2 AS VARCHAR(1000)
--query 1
SET @string2 = ''
SELECT @string2 = @string2 + letter FROM CharsTable
SELECT @string2

this returns

-------------
abcdef

(1 row(s) affected)

and from that you might believe that concatenation of multiple values is that simple. We defined a scalar variable, and SQL looped over the values returned by the select and ‘built’ the string up.

You can even change the order in which the string gets ‘built’.

--query 2
SET @string2 = ''
SELECT @string2 = @string2 + letter FROM CharsTable ORDER BY letter DESC
SELECT
@string2

returns

------------
fedcba

(1 row(s) affected)

All of which goes towards strengthening the belief that this is a good way of concatenating strings. But unfortunately it is not.

Try this :

--query 3
SET @string2 = ''
SELECT @string2 = @string2 + LTRIM(letter) FROM CharsTable ORDER BY LTRIM(letter)
SELECT @string2

almost identical query but with an additional check to make sure that we are stripping out any leading spaces. This now gives us

--------------
f

(1 row(s) affected)

What has happened? Why is this now just returning a single value? Where has the concatenation gone?

Execution plans to the rescue!

To understand the behaviour we can take a look at the execution plans for each of these queries.

The first query is a scan on the table and a Compute Scalar.

The expression in the Compute Scalar is

CONVERT_IMPLICIT(varchar(1000),[@string2]+[tempdb].[dbo].[CharsTable].[letter],0)

and this is the ‘concatenation’ happening.

The second query, with the ORDER BY has a slightly different execution plan with a Sort operator as you would expect

and the Compute Scalar has the same expression value as previously.

CONVERT_IMPLICIT(varchar(1000),[@string2]+[tempdb].[dbo].[CharsTable].[letter],0)

Now for the third query, where everything went seemingly awry.

We have the same operators as are being used in the second query, but importantly being executed in a different order. The Compute Scalar is now being executed before the Sort. Also the Compute Scalar now has 2 expressions.

ltrim([tempdb].[dbo].[CharsTable].[letter])

CONVERT_IMPLICIT(varchar(1000),[@string2]+ltrim([tempdb].[dbo].[CharsTable].[letter]),0)

The first being the function ltrim() on the values from CharsTable, the second being the string concatenation.

This explains (in this case) why we only get a single value as the result. In fact this is the result we should have been expecting as we know that SQL does not loop over the rows, it processes in a set based fashion. Think of the steps as being:

  1. Initialize the @String variable to ”
  2. select all the rows from CharsTable and apply the ltrim() function to the values
  3. string concatenate the 6 rows to the variable

It is this third step that we need to think about – how do you add a string to a set of rows? You may expect SQL to throw an error, but it does not, it handles it gracefully and simply sets the string to be the last value in the row set, which in the case where we have an order by, is ‘f’.

The behaviour we see in queries 1 and 2 is simply a trick. There is no guarantee that the output will always be the ordered, concatenated string. Given enough rows in the base table and you will likely see all kinds of differing output. By adding the function to the query, we changed the way SQL built the execution plan, and actually forced it down the route that we would expect.

This behaviour is by design, the comment in this Connect item sums this up perfectly:

“.. we do not guarantee that @var = @var + will produce the concatenated value for any statement that affects multiple rows. The right-hand side of the expression can be evaluated either once or multiple times during query execution …”

Doing it the right way

So how do we get the values concatenated in query 3? If order is not needed then the safest way is to use

SET @string2 = ''
SELECT @string2 = COALESCE(@string2, LTRIM(letter)) + letter FROM CharsTable
SELECT @string2

and if you need the order preserved, then use the well known XML concatenation ‘trick’

SET @string2 = ''
SELECT @string2 = (SELECT '' + LTRIM(letter) FROMCharsTable
ORDER BY LTRIM(letter) ASC FOR XML path ( '' ) )
SELECT @string2

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.