Sunday 19 December 2010

Shredding XML with OPENXML & Nodes Part III

In Parts I & II, I looked at the different ways of shredding XML into relational data. The only question that remains now, is when should you use each one?
Well, firstly, there is a scenario in which the nodes method is the only way forward; and that is if you want to shred data from multiple rows within a table. For example, imagine you have created a table using the following code...


CREATE TABLE Products(
ProductID   int IDENTITY(1,1),
ProductDescription      nvarchar(100),
ProductReviews    XML)
ON [PRIMARY]

…If this table was holding the following data…

clip_image001

…How would you go about extracting a relational list of comments? Well you could not use the OPENXML function, because you would need to call it once for each row, so the trick is, to use the Nodes method, with the T-SQL CROSS APPLY operator.
If you ran the following code…


SELECT ProductDescription,
       TempCol.value('text()[1]', 'NVARCHAR(500)') Review
FROM Products
CROSS APPLY ProductReviews.nodes('/Review/Comments') AS
TempTable(TempCol)

…You would see this output…

clip_image002

…Here, we are using the nodes and value methods in the same way as we were in Part II. The only differences are that we are now implementing CROSS APPLY, which is calling the nodes method against every row in the result set, and passing the results into our temporary table structure.
Secondly, that because we are reading data from an element, as opposed to an attribute, we are using the text() function, instead of using @attribute.
But what about performance? Are there any performance considerations to take into account, if we are shredding a single variable, and hence have the choice of either Nodes or OPENXML?
Well, if we run the OPENXML code from Part I again, but this time prefixing it with the following statement…


SET STATISTICS TIME ON
GO

We will see the following statistics in the Messages Tab of the Results Pane…


…Giving us a total execution time of 540ms, with a total CPU time of 20ms.
If we ran the equivalent code using the Nodes method however, (see Part II) we would get the following statistics…   

clip_image004

…Here, you will notice a total execution time of just 113ms, but at the expense of 90ms CPU time. (Over 4 times quicker, but at a cost of more than 4 times the pressure on the processor sub-system.)
If you scale these statistics out to a larger amount of data, and consider realistic pressure on the processor from other processes, the result is that the if you are only returning a small number of columns, from an XML document, nodes will usually perform better. However, the more columns you are returning, the more Value method statements you will be performing, and the more pressure you will put on the processor, so as the number of columns returned increases, you will see OPENXML start to catch up, and eventually overtake the Nodes method on execution time. The more existing pressure there is on the processor, the faster this will happen.


Find my book, Pro SQL Server Administration on Amazon -

America

United Kingdom

No comments:

Post a Comment