Efficient Paging in SQL Server via LINQ

by kevin 7/6/2008 12:00:00 PM

UPDATE: I've included a videocast with this blog post. Let me know what you think. 

A few days ago, my buddy Justin Etheredge wrote a blog post about Efficient Paging in SQL Server. I was thinking about how transparent Language Integrated Query (LINQ) makes paging and I thought I'd blog about it. Two of the more interesting extension methods offered by LINQ are Skip() and Take(). You can use these extension methods to skip rows at the beginning of the query result and take only those you want to return. Sounds like paging to me. I wonder if Skip() and Take() used in combination with LINQ to SQL behave as efficiently as Justin's example? Let's take a look. Consider the following LINQ query:

var db = new AdventureWorksDataContext();
var query = from p in db.SalesOrderHeaders
  where p.SalesTerritory.Name.Equals( "Northeast" )
  select new {
    p.Contact.FirstName,
    p.Contact.LastName,
    TotalSales = p.SalesOrderDetails.Sum(
      o => o.OrderQty * o.UnitPrice )
};

This small example uses the AdventureWorks SalesOrderHeaders as the input sequence and shapes the output sequence to include the associated Contact's name parts and the total value of each order. The total value is computed as the OrderQty times the UnitPrice for each associated item in the SalesOrderDetails table. There is a filter placed on the query to restrict the results to orders placed in the 'Northeast' territory. This simple query shows how easy it is to filter, perform arithmetic and use the relationships in a LINQ to SQL data context to traverse table relationships. What does this query look like when it's compiled for execution on SQL Server?

SELECT
  [t2].[FirstName],
  [t2].[LastName],
  (
    SELECT SUM([t4].[value])
    FROM
    (
        SELECT
          (CONVERT(Decimal(29,4),[t3].[OrderQty])) * [t3].[UnitPrice] AS [value],
          [t3].[SalesOrderID]
        FROM [Sales].[SalesOrderDetail] AS [t3]
    ) AS [t4]
    WHERE [t4].[SalesOrderID] = [t0].[SalesOrderID]
  ) AS [TotalSales]
FROM [Sales].[SalesOrderHeader] AS [t0]
LEFT OUTER JOIN [Sales].[SalesTerritory] AS [t1]
  ON [t1].[TerritoryID] = [t0].[TerritoryID]
INNER JOIN [Person].[Contact] AS [t2]
  ON [t2].[ContactID] = [t0].[ContactID]
WHERE [t1].[Name] = @p0

You can see the territory filter applied as a WHERE clause. Note that even when a string literal is used in the C# code, LINQ to SQL still passes filtering variables as parameters. In this case, the territory name 'Northeast' is passed as a variable named @p0. This is always a good practice because it helps to thwart the injection of potentially malicious T-SQL into your query. We can see another interesting feature of LINQ to SQL in the T-SQL that is created called projection. Because the C# code shown above shapes the output sequence to only a few required columns, the LINQ to SQL engine is smart enough to T-SQL shape the query to return only what's needed. Projection often improves query performance and always improves transportation speed on the wire.

Finally, notice that the third column projected into the output sequence, i.e. the sum of each order's value, is instatiated as a two-part, nested sub-SELECT operation in the T-SQL statement. The inner SELECT does the math on the order quantity and price. The containing SELECT aggregates the line item totals and them filters them to the rows selected by the outer query. Nicely done, LINQ! Now, we see that this is a long list, returning thousands of rows. If this query is meant for human consumption, we should break it into smaller chunks to make it easier to handle. How do we do that in LINQ? Add this to the C# code shown before.

var _pageNum = 3;
var _pageSize = 20;
query = query.Skip((_pageNum - 1) * _pageSize).Take(_pageSize);

This modification uses the Skip() and Take() extension methods to skip 40 rows and take the next 20 rows. In other words, at 20 results per page, this query now returns the 3rd page. Though the magic of deferred execution, we can add the Skip() and Take() extentions at any time before we begin iterating over the result set. This comes in handy when you want to enable paging for human consumption but to disable it for B2B or ETL scenarios. Is the paged T-SQL query shown here efficient though? You tell me. Here the T-SQL that is produced:

SELECT
  [t6].[FirstName],
  [t6].[LastName],
  [t6].[value] AS [TotalSales]
FROM
(
  SELECT
    ROW_NUMBER() OVER
    (
      ORDER BY
        [t5].[FirstName],
        [t5].[LastName],
        [t5].[value]
    ) AS [ROW_NUMBER],
    [t5].[FirstName],
    [t5].[LastName],
    [t5].[value]
  FROM
  (
    SELECT
      [t2].[FirstName],
      [t2].[LastName],
      (
        SELECT
          SUM([t4].[value])
        FROM
        (
          SELECT
            (CONVERT(Decimal(29,4),[t3].[OrderQty])) * [t3].[UnitPrice] AS [value],
            [t3].[SalesOrderID]
          FROM [Sales].[SalesOrderDetail] AS [t3]
        ) AS [t4]
        WHERE [t4].[SalesOrderID] = [t0].[SalesOrderID]
      ) AS [value], [t1].[Name]
      FROM [Sales].[SalesOrderHeader] AS [t0]
      LEFT OUTER JOIN [Sales].[SalesTerritory] AS [t1]
        ON [t1].[TerritoryID] = [t0].[TerritoryID]
      INNER JOIN [Person].[Contact] AS [t2]
        ON [t2].[ContactID] = [t0].[ContactID]
    ) AS [t5]
    WHERE [t5].[Name] = @p0
  ) AS [t6]
WHERE [t6].[ROW_NUMBER] BETWEEN @p1 + 1 AND @p1 + @p2
ORDER BY [t6].[ROW_NUMBER]

If you read the query from the inside out, you'll see that on the inside it's essentially the same query that we saw before we added the paging feature. It has all the original SELECTs named t1 through t4 and is wrapped as a new result called t5. The SQL Server ROW_NUMBER() function is used to inject a row number into t5 ordered by all 3 projected columns. That looks a lot like the query Justin showed us in his blog post. Very efficient! The new result containing the row numbers is named t6.

Finally, the t6 result is filtered by a starting row number and ending row number using two new variables @p1 and @p2. For page 3 paged in 20 row chunks as shown above, these variables would have the values 40 and 59, respectively. LINQ to SQL injects these starting and ending row number parameters whenever you use Skip() and Take() together. Well, it almost always does that. If you happen to specify Skip(0), it reverts to the behavior that Take() uses without Skip() which is to use SQL Server's TOP() function instead. LINQ to SQL sure knows how to sweet talk SQL Server, don't you think?

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , ,

C# | LInQ | ORM | Software Development | SQL Server | SQL Server 2008

Powered by BlogEngine.NET 1.3.1.0
Theme by Mads Kristensen


Kevin's on Twitter / FriendFeed

W. Kevin Hazzard Welcome to Kevin Hazzard's Blog. Kevin is a Software Architect, Professor and Microsoft MVP specializing in C#, WCF, Silverlight and IronPython.

View Kevin Hazzard's profile on LinkedIn
Microsoft MVP Award When a problem comes along, you must flip it!

Calendar

<<  July 2008  >>
MoTuWeThFrSaSu
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910

View posts in large calendar

Recent comments

Authors

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008

Sign in