got net?

Kevin Hazzard's Brain Spigot

About the author

Welcome to Kevin Hazzard's blog.
E-mail me Send mail

Recent posts

Recent comments

Authors

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2010

Efficient Paging in SQL Server via LINQ

UPDATE: I've included a videocast with this blog post. Let me know what you think. 

A few days ago, my buddy Justin Etheredge wrote a blog post about Efficient Paging in SQL Server. I was thinking about how transparent Language Integrated Query (LINQ) makes paging and I thought I'd blog about it. Two of the more interesting extension methods offered by LINQ are Skip() and Take(). You can use these extension methods to skip rows at the beginning of the query result and take only those you want to return. Sounds like paging to me. I wonder if Skip() and Take() used in combination with LINQ to SQL behave as efficiently as Justin's example? Let's take a look. Consider the following LINQ query:

var db = new AdventureWorksDataContext();
var query = from p in db.SalesOrderHeaders
  where p.SalesTerritory.Name.Equals( "Northeast" )
  select new {
    p.Contact.FirstName,
    p.Contact.LastName,
    TotalSales = p.SalesOrderDetails.Sum(
      o => o.OrderQty * o.UnitPrice )
};

This small example uses the AdventureWorks SalesOrderHeaders as the input sequence and shapes the output sequence to include the associated Contact's name parts and the total value of each order. The total value is computed as the OrderQty times the UnitPrice for each associated item in the SalesOrderDetails table. There is a filter placed on the query to restrict the results to orders placed in the 'Northeast' territory. This simple query shows how easy it is to filter, perform arithmetic and use the relationships in a LINQ to SQL data context to traverse table relationships. What does this query look like when it's compiled for execution on SQL Server?

SELECT
  [t2].[FirstName],
  [t2].[LastName],
  (
    SELECT SUM([t4].[value])
    FROM
    (
        SELECT
          (CONVERT(Decimal(29,4),[t3].[OrderQty])) * [t3].[UnitPrice] AS [value],
          [t3].[SalesOrderID]
        FROM [Sales].[SalesOrderDetail] AS [t3]
    ) AS [t4]
    WHERE [t4].[SalesOrderID] = [t0].[SalesOrderID]
  ) AS [TotalSales]
FROM [Sales].[SalesOrderHeader] AS [t0]
LEFT OUTER JOIN [Sales].[SalesTerritory] AS [t1]
  ON [t1].[TerritoryID] = [t0].[TerritoryID]
INNER JOIN [Person].[Contact] AS [t2]
  ON [t2].[ContactID] = [t0].[ContactID]
WHERE [t1].[Name] = @p0

You can see the territory filter applied as a WHERE clause. Note that even when a string literal is used in the C# code, LINQ to SQL still passes filtering variables as parameters. In this case, the territory name 'Northeast' is passed as a variable named @p0. This is always a good practice because it helps to thwart the injection of potentially malicious T-SQL into your query. We can see another interesting feature of LINQ to SQL in the T-SQL that is created called projection. Because the C# code shown above shapes the output sequence to only a few required columns, the LINQ to SQL engine is smart enough to T-SQL shape the query to return only what's needed. Projection often improves query performance and always improves transportation speed on the wire.

Finally, notice that the third column projected into the output sequence, i.e. the sum of each order's value, is instatiated as a two-part, nested sub-SELECT operation in the T-SQL statement. The inner SELECT does the math on the order quantity and price. The containing SELECT aggregates the line item totals and them filters them to the rows selected by the outer query. Nicely done, LINQ! Now, we see that this is a long list, returning thousands of rows. If this query is meant for human consumption, we should break it into smaller chunks to make it easier to handle. How do we do that in LINQ? Add this to the C# code shown before.

var _pageNum = 3;
var _pageSize = 20;
query = query.Skip((_pageNum - 1) * _pageSize).Take(_pageSize);

This modification uses the Skip() and Take() extension methods to skip 40 rows and take the next 20 rows. In other words, at 20 results per page, this query now returns the 3rd page. Though the magic of deferred execution, we can add the Skip() and Take() extentions at any time before we begin iterating over the result set. This comes in handy when you want to enable paging for human consumption but to disable it for B2B or ETL scenarios. Is the paged T-SQL query shown here efficient though? You tell me. Here the T-SQL that is produced:

SELECT
  [t6].[FirstName],
  [t6].[LastName],
  [t6].[value] AS [TotalSales]
FROM
(
  SELECT
    ROW_NUMBER() OVER
    (
      ORDER BY
        [t5].[FirstName],
        [t5].[LastName],
        [t5].[value]
    ) AS [ROW_NUMBER],
    [t5].[FirstName],
    [t5].[LastName],
    [t5].[value]
  FROM
  (
    SELECT
      [t2].[FirstName],
      [t2].[LastName],
      (
        SELECT
          SUM([t4].[value])
        FROM
        (
          SELECT
            (CONVERT(Decimal(29,4),[t3].[OrderQty])) * [t3].[UnitPrice] AS [value],
            [t3].[SalesOrderID]
          FROM [Sales].[SalesOrderDetail] AS [t3]
        ) AS [t4]
        WHERE [t4].[SalesOrderID] = [t0].[SalesOrderID]
      ) AS [value], [t1].[Name]
      FROM [Sales].[SalesOrderHeader] AS [t0]
      LEFT OUTER JOIN [Sales].[SalesTerritory] AS [t1]
        ON [t1].[TerritoryID] = [t0].[TerritoryID]
      INNER JOIN [Person].[Contact] AS [t2]
        ON [t2].[ContactID] = [t0].[ContactID]
    ) AS [t5]
    WHERE [t5].[Name] = @p0
  ) AS [t6]
WHERE [t6].[ROW_NUMBER] BETWEEN @p1 + 1 AND @p1 + @p2
ORDER BY [t6].[ROW_NUMBER]

If you read the query from the inside out, you'll see that on the inside it's essentially the same query that we saw before we added the paging feature. It has all the original SELECTs named t1 through t4 and is wrapped as a new result called t5. The SQL Server ROW_NUMBER() function is used to inject a row number into t5 ordered by all 3 projected columns. That looks a lot like the query Justin showed us in his blog post. Very efficient! The new result containing the row numbers is named t6.

Finally, the t6 result is filtered by a starting row number and ending row number using two new variables @p1 and @p2. For page 3 paged in 20 row chunks as shown above, these variables would have the values 40 and 59, respectively. LINQ to SQL injects these starting and ending row number parameters whenever you use Skip() and Take() together. Well, it almost always does that. If you happen to specify Skip(0), it reverts to the behavior that Take() uses without Skip() which is to use SQL Server's TOP() function instead. LINQ to SQL sure knows how to sweet talk SQL Server, don't you think?


Categories: C# | LINQ | ORM | Software Dev | SQL
Posted by kevin on Sunday, July 06, 2008 12:00 PM
Permalink | Comments (12) | Post RSSRSS comment feed

Comments

Muhammad Mosa Egypt

Monday, July 07, 2008 8:57 AM

Muhammad Mosa

Hey Kevin, great post, it is nice to find another way of paging, few weeks I posted about the same subject
mosesofegypt.net/post.aspx
I think that your way is much efficient!

W. Kevin Hazzard United States

Monday, July 07, 2008 3:59 PM

W. Kevin Hazzard

@Moses, Thanks. I checked out your post, too. Good work. I'll have to add your blog to my regulars.

Kevin

Mahammad Mosa Egypt

Tuesday, July 08, 2008 11:17 AM

Mahammad Mosa

Hey Kevin, I just posted another sample applying your method! Just I wrapped your code in an extension method and modified it to support sorting.
Check it out mosesofegypt.net/post.aspx
and here is the demo mosesofegypt.net/.../default.aspx

Great blog, Cheers

W. Kevin Hazzard United States

Tuesday, July 08, 2008 11:29 PM

W. Kevin Hazzard

Hey Moses, I just checked out the post on your blog. It's been a busy day. Very nice encapsulation of the paging and sorting requirements. I like the way you passed the orderby selector in. Very clean.

Kevin

logicalmind United States

Monday, July 21, 2008 2:22 PM

logicalmind

This is interesting, but there is a problem. The only way to get predictable results is by first ordering them. Such as:
var query = from p in db.SalesOrderHeaders
  where p.SalesTerritory.Name.Equals( "Northeast" )
  orderby p.Whatever ascending
  select p;

Now my next question is, how do you dynamically insert the orderby property and ascending/descending into the comprehension based on some type of input? FWIW, if you try to build the query manually the ordering (and paging) is reverted to linq to objects and is not done in sql...

logicalmind United States

Monday, July 21, 2008 2:24 PM

logicalmind

Quick followup, dynamic orderby is covered in this blog post:
blogs.msdn.com/.../...ry-with-dynamic-orderby.aspx

Kevin Hazzard, MVP United States

Monday, July 21, 2008 11:36 PM

Kevin Hazzard, MVP

Indeed @logicalmind, you must order the query to get predictable results. However, I'm not sure about your concern. If I add orderby p.SalesPersonId to the query for example, the ROW_NUMBER generation goes from this:

SELECT ROW_NUMBER() OVER (ORDER BY [t5].[FirstName], [t5].[LastName], [t5].[value]) AS [ROW_NUMBER]

to this:

SELECT ROW_NUMBER() OVER (ORDER BY [t5].[SalesPersonID]) AS [ROW_NUMBER]

In the case where I don't specify an orderby clause, all of the projected columns are assumed to be the ordering sequence. When I do specify an orderby clause, the ordering sequence is included in the internal query (t5 in this case) and then used to order the ROW_NUMBER generation as shown here.

And in the purely dynamic case, adding the orderby clause late, such as this:

query = query.OrderBy(x => x.LastName).Skip((_pageNum - 1) * _pageSize).Take(_pageSize);

I still get good T-SQL code. The ROW_NUMBER is now generated over the ordered LastName values instead like this:

SELECT ROW_NUMBER() OVER (ORDER BY [t5].[LastName]) AS [ROW_NUMBER]

So, I'm not sure I understand what you mean when you say "FWIW, if you try to build the query manually the ordering (and paging) is reverted to linq to objects and is not done in sql." If you clarify that statement, I'd be happy to address it.

The link you provided is a good one and speaks to the issue of building more complex dynamic queries, especially those that have multiple where clauses that need to be ORed instead of ANDed. This is an obvious shortcoming (and a reasonable compromise) in the LINQ query comprehension syntax.

logicalmind United States

Tuesday, July 22, 2008 10:30 AM

logicalmind

Let me clarify what I meant. Let's say this is an n-tier app and I want to wrap my db/linq calls in an API. Something like:

public IList<SalesOrderHeaders> GetSalesOrderHeaders()
{
  using (var db = new AdventureWorksDataContext())
  {
    var query = from p in db.SalesOrderHeaders select p;
    return query.ToList();
  }
}

Simple enough, but now what if we allow the user of the api to request a specific sort of the data.

public IList<SalesOrderHeaders> GetSalesOrderHeaders(SortBy sort);

There is no way that I know that you can continue to use the comprehension syntax so you must revert to building the query. Something like this:

public IList<SalesOrderHeaders> GetSalesOrderHeaders(SortBy sort);
{
  using (var db = new AdventureWorksDataContext())
  {
    var query = db.SalesOrderHeaders.OrderBy(sort.???);
    return query.ToList();
  }
}

So my question/comment was, how do you define SortBy such that whatever you stick into the OrderBy does not revert back to linq to objects. Keep in mind that SalesOrder has columns of varying types and we want to allow ordering on all of them. So something like Func<SalesOrderheaders, int> is not sufficient. It is also desirable that the user not have to build expressions to use this API(as in Moses implementation). They don't know if we're using linq or not.

Kevin Hazzard, MVP MCSD.NET United States

Wednesday, July 23, 2008 9:39 AM

Kevin Hazzard, MVP MCSD.NET

@logicalmind So, at the heart of the matter, you want to avoid a two-stage enumeration. I don't have time right now to cook up an example, but after work, I'll see if I can cook up an example, OK?

logicalmind United States

Wednesday, July 23, 2008 10:30 AM

logicalmind

The solution in the link I provided works fine. The SortBy class would simply be a bool for asc/desc and the name of the property to sort by as a string. Based on these two pieces of information the Expression is built dynamically as outlined in the link. My example doesn't show it, but the next step of the API would be:

public IList<SalesOrderHeaders> GetSalesOrderHeaders(SortBy sort, int pageStart, int pageSize);

As we know, sorted data is required for predictable paging. It was the combination of dynamic sorting and paging I was after. Without exposing linq expressions to the user of the API. In reality I already have this API and the internals are in ADO/SQL. I want to change the implementation to linq to sql without changing the API. So the bottom line question was, how do I change my existing SortBy object to return an Expression(becomes part of the dynamically generated query) rather than a lambda/delegate(reverts to linq to object).

I just wanted to clarify what I mean in my original post. No example is necessary. But if you have any ideas on how to do it as an alternative to the way in the linq and avoid linq to objects I'd love to hear it.

sbs Taiwan

Sunday, February 08, 2009 4:27 PM

sbs

what a impressive article. last days I didn' t read post like that. I am now your blog' s follower thanks for this useful blog. you are now in my bookmarks.

kevgriffin.com

Friday, October 16, 2009 11:12 AM

pingback

Pingback from kevgriffin.com

My Attempt at LINQ Pagination »  Kevin Griffin

Comments are closed