Saturday 2 April 2011

What Are Allocation Units?

I found out this week that a couple of my colleagues were unfamiliar with Allocation Units, so I thought I would knock-up a quick post explaining what they were, and how you can use them to optimize performance. So first off - what are they?

Well, traditionally, SQL Server has had a fixed limit of 8060-Bytes for a row and 8000-Bytes for a column, as a single row can not span more than one page. So, in more recent versions of SQL Server, it has been possible to push some of this data off-row, in order to let developers overcome this fixed limitation.

So now, every index, including the clustered index, or heap can have it's data split across multiple allocation units. There are 3 types of allocation unit. The first is for your standard data, and this is called the IN_ROW_DATA Allocation Unit. This is the allocation unit that will hold the majority of your data, in most cases. The second Allocation Unit is know as the LOB_DATA allocation unit. This is used to store the older type LOBs (Large Object Blocks) such as Text, NText and Image. The final Allocation Unit, which is the most recent addition, is the ROW_OVERFLOW_DATA Allocation Unit. Also know as the SLOB (Small-LOB) Allocation Unit, this is used to store newer Large Objects, such as Varchar, NVarchar, SQLVariant (although this should be avoided!) and the .NET based data-types, such as Geospatial data-types.

So, are there three allocation units per table? Oh no! There is a lot more than that! Every index can be partitioned, with a maximum of 1000 partitions, and in SQL Server 2008, you can have up to 1000, indexes, and the Allocation Units propagate down to the partition level, so it is actually possible to have 3 Million Allocation Units. In SQL Server 2005, the limit is 750,000 Allocation Units, because you are limited to 250 indexes per-table.

By default, LOBs (Text, NText and Image) are stored out of row. You can move them in row, by changing the value of 'Text In Row' by using sp_tableoption. Obviously, this will only bring in LOBs that will physically fit on the page. (i.e. They still can not be more than 8000-Bytes each, and 8060-Bytes in total), and you can also specify the maximum size for values to bring in-row. If you do not set this limit, the default is 256-Bytes.

By Default, SLOBs are stored in-row (providing they will fit in the row, using the same calculations as above) but you can push them off-row by setting the 'Large Value Types Out Of Row' option using sp_tableoption. Unlike the 'Text In Row' option, it is not possible to specify a custom size limit. Either all rows that will fit are stored in-row, or all values are stored out of row. The size limit is 8000-Bytes, and this can not be changed.

So the next question, is when should you push data off-row? Well, before we tackle that question, we need to think about how much overhead is caused by pushing data off-row, and this calculation is fairly simple. If LOBs are stored off row, a 16-Byte pointer is stored in the row. If SLOBs are pushed off-row, then a 24-Byte pointer is stored in-row. This can get slightly confusing however, because if you specify the (MAX) operator for a Varchar or NVarchar, then SQL treats it as a LOB, rather than a SLOB, and only requires a 16-Byte pointer.

So the first, and most obvious considerations are (from an SLOB perspective) are 1) Are the majority of SLOBs less than 24-Bytes? If so, there would be no benefit in pushing them off-row. In fact, it would be detrimental. 2) Are the SLOBs more than 8000-Bytes? If so, you have no choice, but to store them off row. 3) Do your SLOBs add-up to more than 8060-Bytes? If so, again, you have no choice, they must be off-row.

But what about the cases in the middle? What about if your SLOBs are all between 24 and 8000-Bytes, and together, they all fit within the 8060-Byte limit? How should you store them? Well, it depends on how you are using your data. If these SLOBs are going to be frequently queried, then you should store them in-row. If you do not, then you are increasing IO, and also forcing that IO to be random, as opposed to sequential, and random IO is not your friend!

If on the other hand, the SLOBs are essentially just a data store, and you are only querying them infrequently, but the other columns in the table are being queried, then you should store the SLOBs off-row. This is because you will be able to fit more rows on a page, and this give you two big advantages. Firstly, queries that need to read this table will have to read less pages (because there will be less pages to read). And the second reason is simply that you will use less memory. If you read less pages, then less pages will be read into the Buffer Cache, and memory pressure will be reduced.

No comments:

Post a Comment