Sunday 24 April 2011

Including Current Date In a Flat File Name - SSIS

There was a thread on MSDN Forums, which paraphrased, asked "How can I include the current Year and Month in a file name when I create a Flat File in SSIS"

Although this is a relatively straight forward task, because SSIS is so visual, it can be difficult to explain how to do things in Forum threads, so I decided to produce this short walk through video. Hope you enjoy, and find it useful...




Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

Find a User's Effective Permissions

There was a recent post on the MSDN forums, that paraphrased, asked, how do I find out what permissions a User has to all Objects, at all levels, including those granted by their role memberships, etc.

Now, I am assuming that this is a fairly common security auditing task, so I scanned Google to dig out a script that would do it, but to my amazement, I could not find one, so I decided to write one. It is a pretty simple script, using the fn_my_permissions function and a couple of well known system tables.

In order to run it, you will need to be able to impersonate the login that you are running it for, and of course, you will also need the appropriate permissions to the system tables.

Feel free to modify the script, but please give due credit. This will work with 2005 and 2008, but I actually wrote it in Denali, so it is defiantly forward-compatible. Enjoy...

EXECUTE AS LOGIN = 'INSERT LOGIN NAME'
 SELECT t.name,  a.subentity_name, a.permission_name
 FROM sys.objects t
 CROSS APPLY fn_my_permissions(QUOTENAME(t.name), 'OBJECT') a
 UNION ALL
 SELECT d.name, a.subentity_name, a.permission_name
 FROM sys.databases d
 CROSS APPLY fn_my_permissions(QUOTENAME(d.name), 'DATABASE') a
 UNION ALL
 SELECT @@SERVERNAME, a.subentity_name, a.permission_name
 FROM fn_my_permissions(NULL, 'SERVER') a
 ORDER BY t.name

 REVERT


Find my book, Pro SQL Server Administration on Amazon -

America


United Kingdom

Saturday 23 April 2011

Head In The Clouds

In Paul Randal's recent SQLSkills Insider e-mail, he talked briefly about the SQL Azure, and how he had not seen his Clients express a large appetite for it "yet", and that inspired me to write this post, (which I have been meaning to write for a couple of weeks), to share my thoughts on Azure.

So firstly, I just want to state my position on Cloud Computing in general, and that for the most part is very positive. The opportunity to access your applications from any part of the world, with consistent performance and a massively reduced Total Cost Of Ownership has to be exciting, and there are many applications, such as Office and even Exchange, that I feel are a perfect fit for the Cloud.

SQL on the other hand is a different matter. Does it really fit in the Cloud, and if it does, what place does it hold? Will it ever be able to host Enterprise class systems, or will in fall into the category of Virtulization, which is great in the most part, but does not fit well for SQL, especially for Enterprise Solutions?

Well, because of these concerns, and because of the current immaturity of Azure, until recently, I have shyed away from playing around with this app, and kept my focus on traditional SQL, until...

I recently spent several months working on a project to plan the up-scaling of an established 2TB local system to global 32TB Enterprise class system. The obvious choice for this system was a FastTrack Data warehouse Appliance, and I had a fantastic opportunity to do some POC work at Microsoft HQ, (Which I will blog about soon). The POC was successful, and I thought it was a done-deal, but then an instruction came from a very senior level, that we should be investigating putting the application in the Cloud.

Now my gut reaction here was wow! An opportunity to build an Enterprise class database in the Cloud, but within seconds (and luckily before speaking) my professional head suppressed my geek head and I articulated the answer that it would not be practical because of the 50GB limit in Azure. This led to the suggestion that we could shard the data across multiple instances, which is true, but seemed like trying to plug a square hole with a round peg. I knew at that point that the idea was unlikely to go further, and we would probably stay with FastTrack Solution, but I felt it prudent to subscribe to Azure and have a play with it.

So when playing with Azure, the thing that surprised me was how limited the product currently is. A lot of the limitations surround Administration, and make perfect sense, for example, when your database is in the Cloud, why would you need to worry about High Availability? There is already a 4 9s SLA with MS, just by signing up. Some of the other limitations, however, are just a bit strange. For example, you can not create Heap Tables, only Tables with Clustered Indexes. Don't get me wrong, I'm all in favour of having a CI on all tables, but it just felt like a strange limitation. I also found that quite a few useful tools, such as Service Broker are not supported, and even my favourite %%PhysLoc%% query would not compile!

According to a contact of mine, there are currently only about 20 people in the Azure product team, and as I already mentioned, the offering is still fairly immature, so I fully expect that a lot of functionality will appear over the next few months and years, so from a technical perspective, I don't feel that we are quite ready to put many systems in the Cloud "yet", but we may be soon. I also suspect that the size restrictions will be relaxed as bandwidth improves.

My main question mark around Azure (thinking Medium-Term) is more political. If you are storing client data, especially if it is from a sensitive sector, such as public service data, I can not imagine those Clients, when asking the regular question "Where exactly are you storing my data", liking the answer "Oh, it's in the Cloud..."

Now I know that this stance is silly because I would be willing to bet large amounts of money that data is much more secure in Microsoft's Cloud, than it is in the vast majority of Data Centres, but what I'm not sure, is how long it will take, or how to even bring-about that cultural change that would be required, especially as the people who will need to be convinced are likely to be non-technical.

My other question mark, is a rather self-centred one, and revolves around (if I'm honest with myself), humans not liking change, because from a corporate point of view, it is actually a positive...

According to my contact, Microsoft's medium-term SQL Server policy will be to release features first to the Cloud, because they can easily be rolled back if problematic. (This is a concern in itself for the Cloud because it will always be the most up-to-date but possibly also the most unstable platform), then to release to Appliances. (FastTrack is already on v3, PDW is out and OLTP and BI appliances are on the way), and then finally to release to the traditional market.

Now this leads me to assume, that within the medium-term, large SQL projects will either be hosted in the Cloud, or on Appliances, and both of these environments are largely pre-configured and pre-optimized, so where does that leave skilled SQL professionals, who make a living out of performing those complex optimizations that will no longer be required?

Again, this is probably a silly stance, because there are bound to be a whole different set of issues that will require highly skilled professionals. For example, just because Windows 7 is a damn site easier to work with and configure than DOS 6.1, does not mean that there is no market for Desktop Support Technicians, or Network Engineers, but it does mean that more people are able to support it, which means that the market rates are dramatically lower than they were 15 years ago.

Maybe I'm an eternal pessimist, but it is certainly food for thought? If you have any thoughts on the subject, please feel free to leave comments...


Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

Wednesday 13 April 2011

SQL Server Southampton User Group 13/04/2011

I was speaking at the Southampton SQL UG tonight, and as promised, here are links to the code samples that I used, plus some more in depth articles on the subjects I spoke about. Thanks to everybody for a good night...!

Shrinking Databases

Why not to shrink a database here
The world's worst maintenance plan? here

Using %%physloc%%

How to find what page your data is stored on here
Logical Order NOT Physical Order here

Allocation Units

What Are Allocation Units here
Querying Allocation Units here


Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

Wednesday 6 April 2011

The Curious Case Of The Phantom Last Log Backup!

I was extremely "busy" at work the other day. So much so, I was sat reading Paul Randal's Blog, when I came across an interesting article on how to tell if your database is really in Full Recovery, see here.

The general premise is that when you switch from Simple to Full recovery model, the model does not change until the log backup chain has been established, with a backup. This is called pseudo-simple mode, and can be recognised by a database that is in Full recovery model showing a NULL value for last_log_backup_lsn in sys.database_recovery_staus

I was a little board, so I though that I would have a look and make sure that our OLTP databases were really in Full Recovery Model, and they were, but much to my surprise I found that some of our VLDs that were in Simple Recovery Model were showing a non-NULL value in the last_log_backup_lsn column in sys.database_recovery_status catalogue view.

This confused me and I ask Paul Randal if there was a pseudo-full mode, but there is not, so why was there a last_log_backup_lsn? Well, intrigued, I kept an eye on the transaction logs of the databases involved, buy using SELECT * FROM sys.fn_dblog(NULL,NULL) which shows all log records in the active VLFs.

Day 1 - 7436 entries, day 2 - 762 entries.

This was good news! It meant that the log had been truncated, so the databases were behaving as if the were in Simple recovery model. This was a relief, as my initial fear was that they were behaving as though they were in Full, and of course, we were not taking Transaction Log backups, as we thought they were in Simple!

So the next test - Was this record of a last LSN only in the sys.database_recovery_status catalogue view or was it embedded more internally? To check this, I ran DBCC PAGE on the boot page of the database. The database ID happened to be 8, so the syntax was

DBCC TRACEON(3604)
DBCC PAGE(8,1,9,3)

Here, I turned on trace flag 3604, so that the DBCC output would be to the messages window, rather than the event log. The parameters for DBCC Page represent the Database ID, The file number, the page number and then the level of detail. The database boot page is always at page 9 of file 1 see here for more info on system pages.

The results showed that the boot page also contained the lsn and time stamp of the last log backup, so the mystery continued. Where did it come from? Unfortunately, it was a production system that I was looking at, so I was not able to play around with it too much, but after a few hours of trying different things, that I new had occurred on the production system, I managed to reproduce the symptom, with the following script:


CREATE DATABASE [lsn_test] ON  PRIMARY
USE [master]
GO
ALTER DATABASE [lsn_test] SET RECOVERY FULL WITH NO_WAIT
GO


The results were as expected...

name recovery_model_desc last_log_backup_lsn
lsn_test FULL NULL

So now...

BACKUP DATABASE [lsn_test]
TO  DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER=MSSQL\Backup\lsn_test.bak'
WITH NOFORMAT, NOINIT,  NAME = N'lsn_test-Full Database Backup',
SKIP, NOREWIND, NOUNLOAD,  STATS = 10
GO


Again, results as expected...

name recovery_model_desc last_log_backup_lsn
lsn_test FULL 25000000006200064

Next, was to switch the Database to Simple...

USE [master]
GO
ALTER DATABASE [lsn_test] SET RECOVERY SIMPLE WITH NO_WAIT
GO


Once more, predictable results...

name recovery_model_desc last_log_backup_lsn
lsn_test SIMPLE NULL

So now lets backup the database again...

BACKUP DATABASE [lsn_test] TO  DISK = N'C:\Program Files\Microsoft SQL=erver\MSSQL10.MSSQLSERVER\MSSQL\Backup\lsn_test.bak'
WITH NOFORMAT, NOINIT, 
NAME = N'lsn_test-Full Database Backup', SKIP, NOREWIND, NOUNLOAD,  ST=TS = 10
GO


Once more, the correct results...

name recovery_model_desc last_log_backup_lsn
lsn_test SIMPLE NULL

So finally, what happens if we restore the database...?

RESTORE DATABASE [lsn_test] FROM  DISK = N'C:\Program Files\Microsoft =QL Server\MSSQL10.MSSQLSERVER\MSSQL\Backup\lsn_test.bak'
WITH  FILE = 2,  NOUNLOAD,  STATS = 10
GO


Ah, this doesn't look so straight forward...!

name recovery_model_desc last_log_backup_lsn
lsn_test SIMPLE 25000000011900001
So basically, SQL has picked up the last log backup LSN from the restore, but it is meaningless, and essentially a bug. So in the end, it turned out not to be a major issue, but it was certainly interesting, tracking it down, and a big thanks to Paul Randal for helping me get my head around it!


Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

Saturday 2 April 2011

How To Query Allocation Units

Following my post explaining Allocation Units see here I though that I would do a quick follow-up post, demonstrating how you can interrogate these structures. There are two ways of doing this.

The first (and documented way) is to use the catalogue view called sys.allocation units. This view will display one row per allocation unit, for every table and index. This can be joined back to sys.tables, via sys.partition, in order to retrieve the table name, as demonstrated below...

SELECT t.name, au.* FROM sys.allocation_units au
INNER JOIN sys.partitions p
     ON au.container_id = p.partition_id
     INNER JOIN sys.tables t
          ON p.object_id = t.object_id
ORDER BY t.name


...A portion of the output (run against AdaventureWorks2008) is detailed below...


 ...This query is interesting, and shows you how large each of the allocation units is, in terms of pages. What is even more interesting, however, is the undocumented way to interrogate allocation units. This is with the use of sys.system_internals_allocation_units. Let re-run the last query, using this catalogue view...

SELECT t.name, au.* FROM sys.system_internals_allocation_units au
INNER JOIN sys.partitions p
     ON au.container_id = p.partition_id
     INNER JOIN sys.tables t
          ON p.object_id = t.object_id
ORDER BY t.name


...Lets have a look at the results of this query...



...The results of this query are even more interesting. Here, alongside the page counts, we can also see the location of the first IAM Page in the IAM chain, the indexe's root page, and the first page of the index. We can crack these results by using the sys.fn_physloc_formatter() function, as below...

SELECT
     t.name,
     sys.fn_PhysLocFormatter(au.first_page) AS 'First Page',
     sys.fn_PhysLocFormatter(au.root_page) AS 'Root Page',
     sys.fn_PhysLocFormatter(au.first_iam_page) AS 'First IAM Page'
FROM sys.system_internals_allocation_units au
INNER JOIN sys.partitions p
     ON au.container_id = p.partition_id
     INNER JOIN sys.tables t
          ON p.object_id = t.object_id
ORDER BY t.name


...The results are shown below...



...Pretty cool hey? Be warned, however, that the sys.system_internals_allocation_units catalogue view is both undocumented and unsupported! This means that I can not vouch that the page locations it gives you will always be accurate. It is, however, a useful and interesting starting point.

Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

What Are Allocation Units?

I found out this week that a couple of my colleagues were unfamiliar with Allocation Units, so I thought I would knock-up a quick post explaining what they were, and how you can use them to optimize performance. So first off - what are they?

Well, traditionally, SQL Server has had a fixed limit of 8060-Bytes for a row and 8000-Bytes for a column, as a single row can not span more than one page. So, in more recent versions of SQL Server, it has been possible to push some of this data off-row, in order to let developers overcome this fixed limitation.

So now, every index, including the clustered index, or heap can have it's data split across multiple allocation units. There are 3 types of allocation unit. The first is for your standard data, and this is called the IN_ROW_DATA Allocation Unit. This is the allocation unit that will hold the majority of your data, in most cases. The second Allocation Unit is know as the LOB_DATA allocation unit. This is used to store the older type LOBs (Large Object Blocks) such as Text, NText and Image. The final Allocation Unit, which is the most recent addition, is the ROW_OVERFLOW_DATA Allocation Unit. Also know as the SLOB (Small-LOB) Allocation Unit, this is used to store newer Large Objects, such as Varchar, NVarchar, SQLVariant (although this should be avoided!) and the .NET based data-types, such as Geospatial data-types.

So, are there three allocation units per table? Oh no! There is a lot more than that! Every index can be partitioned, with a maximum of 1000 partitions, and in SQL Server 2008, you can have up to 1000, indexes, and the Allocation Units propagate down to the partition level, so it is actually possible to have 3 Million Allocation Units. In SQL Server 2005, the limit is 750,000 Allocation Units, because you are limited to 250 indexes per-table.

By default, LOBs (Text, NText and Image) are stored out of row. You can move them in row, by changing the value of 'Text In Row' by using sp_tableoption. Obviously, this will only bring in LOBs that will physically fit on the page. (i.e. They still can not be more than 8000-Bytes each, and 8060-Bytes in total), and you can also specify the maximum size for values to bring in-row. If you do not set this limit, the default is 256-Bytes.

By Default, SLOBs are stored in-row (providing they will fit in the row, using the same calculations as above) but you can push them off-row by setting the 'Large Value Types Out Of Row' option using sp_tableoption. Unlike the 'Text In Row' option, it is not possible to specify a custom size limit. Either all rows that will fit are stored in-row, or all values are stored out of row. The size limit is 8000-Bytes, and this can not be changed.

So the next question, is when should you push data off-row? Well, before we tackle that question, we need to think about how much overhead is caused by pushing data off-row, and this calculation is fairly simple. If LOBs are stored off row, a 16-Byte pointer is stored in the row. If SLOBs are pushed off-row, then a 24-Byte pointer is stored in-row. This can get slightly confusing however, because if you specify the (MAX) operator for a Varchar or NVarchar, then SQL treats it as a LOB, rather than a SLOB, and only requires a 16-Byte pointer.

So the first, and most obvious considerations are (from an SLOB perspective) are 1) Are the majority of SLOBs less than 24-Bytes? If so, there would be no benefit in pushing them off-row. In fact, it would be detrimental. 2) Are the SLOBs more than 8000-Bytes? If so, you have no choice, but to store them off row. 3) Do your SLOBs add-up to more than 8060-Bytes? If so, again, you have no choice, they must be off-row.

But what about the cases in the middle? What about if your SLOBs are all between 24 and 8000-Bytes, and together, they all fit within the 8060-Byte limit? How should you store them? Well, it depends on how you are using your data. If these SLOBs are going to be frequently queried, then you should store them in-row. If you do not, then you are increasing IO, and also forcing that IO to be random, as opposed to sequential, and random IO is not your friend!

If on the other hand, the SLOBs are essentially just a data store, and you are only querying them infrequently, but the other columns in the table are being queried, then you should store the SLOBs off-row. This is because you will be able to fit more rows on a page, and this give you two big advantages. Firstly, queries that need to read this table will have to read less pages (because there will be less pages to read). And the second reason is simply that you will use less memory. If you read less pages, then less pages will be read into the Buffer Cache, and memory pressure will be reduced.