Friday 7 January 2011

Working With SPARSE Columns

From SQL Server 2008, we have the concept of SPARSE columns. SPARSE is a column property that you can set on any nullable column within a table, either when you CREATE or ALTER the table. If you do this, then a NULL value that is stored in a SPARSE column, will take up no space.

This has both advantages and disadvantages. The biggest advantage, is that it allows you to store more columns in less space. This has the side effect of increasing the maximum number of columns that can be stored in a table. This number increases from 1024 to 30,000, of which 1024 can be non-SPARSE. However, you are still limited to 8060-bytes per page, so not all columns for a single row could store data, and in fact, due to extra vector overhead, the page byte limit is effectively reduced to 8019 bytes. You can still store SPARSE columns that hold variable length data off-row, but remember that this occurs a 24-byte overhead in a page.

The main disadvantage, is that tables that contain SPARSE columns cannot be compressed, but you also get extra overhead for columns that actually hold a non-NULL value. The official recommendation is that you only use the SPARSE column property, for columns that have 90% NULLs or higher. This is not enforced in any way however, so you can actually use it on any nullable column.

So lets have a look at how to create a table with SPARSE columns…

CREATE TABLE dbo.SparseColumns
       (
       ID int NOT NULL IDENTITY,
       NonSparse1 int NULL,
       Sparse1 int SPARSE  NULL,
       Sparse2 int SPARSE  NULL,
       SparseColumns xml COLUMN_SET FOR ALL_SPARSE_COLUMNS  NULL
       )

…You may notice that I have created an XML column with the COLUMN_SET property configured. This creates a column (similar to a computed column) that will allow me to UPDATE or SELECT SPARSE columns. This makes life easier, but does change the behaviour of a SELECT * statement, which will return all of the SPARSE columns in the table as an XML document.

So, we can still manipulate the table in the traditional way, for example…

INSERT INTO SparseColumns (NonSparse1,Sparse1,Sparse2)
VALUES (1,1,1),
(2,NULL,2),
(3,3,NULL)

…But we can also manipulate the SPARSE columns indirectly, through the COLUMN_SET, as in the example below…

INSERT INTO SparseColumns(SparseColumns)
VALUES('<Sparse1>4</Sparse1><Sparse2>4</Sparse2>')

…Now, check out the results if we run a SELECT * against the table…


SELECT * FROM SparseColumns

ID            NonSparse1          SparseColumns
1              1                               <Sparse1>1</Sparse1><Sparse2>1</Sparse2>
2              2                              <Sparse2>2</Sparse2>
3              3                              <Sparse1>3</Sparse1>
4              NULL                       <Sparse1>4</Sparse1><Sparse2>4</Sparse2>

…However, this does not stop us selecting the individual columns in the normal way…

SELECT nonSparse1, Sparse1, Sparse2
FROM SparseColumns

nonSparse1        Sparse1                Sparse2
1                            1                             1
2                            NULL                      2
3                            3                             NULL
NULL                     4                              4

No comments:

Post a Comment