Tuesday, March 20, 2012

Paging with Temporary Tables

I am searching for information on paging large datasets, and have foundsome that involve creating temporary tables in the database. Before I head off and implement something, I have a number of issuesI'd like to bounce around here.
1. An example I found on MSDN involves creating a temporary table,copying relevant columns to the row in the temp table. Why dothis, rather add the source tables primary keys into the temp table,and do a join? Example; browsing Products Catalog which iscategorised into hierarchies. The MSDN version would have a temptable created with a incrementing field which is used for the paging,and then a number of fields are also copied from the products table tothe temp table - my question is why not simply copy the product primarykey into the temp table, and then join?
2. In real life, do people allow each user to create their owntemporary tables? If I have 1000 concurrent users, all wishing toperform a page-based browse, I would be creating 1000 new temporarytables. Do people consider default temp tables, that is, creatinga default temporary table for browsing each category in the productstable, for example?
3. Do you have any advice/tips for this type of problem?
Thanks!
JR.
1) I think I know what you are talking about. However, I would like to point you to another techniquethat I developed.
2) It depends what type of temporary table. The one that I use is memory resident and since it is only one column as such in memory, it's a very small trade off in resources. If they are creating the "physical" version of the temporary table, then yes, I would consider an idea that you are toying with. Essentially (just bear with me), you would just create a cross reference table to state that page X has these records... just like my inner join technique. I hate circles but at least you can see that I highlighted the most normalized way to put the pagings rather than create a whole new "physical" table to contain that information.
3) I can't think of anything other than what I just mentioned for the time being.
Regards,
Justin|||

You could try this. It uses a table variable. It takes 2 pareameters which should be self explanitory. I have tested it with a 50,000 row zipcode database and it seems to work quite well.
<code>
-- FOR MSSQL 2000

CREATE PROCEDURE dbo.Paging_sp

@.Page int,
@.RecsPerPage int

AS

SET NOCOUNT ON

DECLARE @.TableVar table (
ZipID int identity(1,1) PRIMARY KEY,
ZipCode varchar(5) NOT NULL,
City varchar(28) NOT NULL,
State varchar(50) NOT NULL )

Insert Into @.TableVar ( ZipCode, City, State )
Select ZIPCODE, CITY, STATE
From ZipCodes
ORDER BY STATE

DECLARE @.FirstRec int, @.LastRec int
SELECT @.FirstRec = (@.Page - 1) * @.RecsPerPage
SELECT @.LastRec = (@.Page * @.RecsPerPage + 1)

SELECT ZipID, ZipCode, City, State,
MoreRecords = (
SELECT COUNT(*)
FROM @.TableVar TI
WHERE TI.ZipID >= @.LastRec),
TotalPages = ( SELECT COUNT(*)/@.RecsPerPage + 1 FROM @.TableVar )
FROM @.TableVar
WHERE ZipID > @.FirstRec AND ZipID < @.LastRec

SET NOCOUNT OFF

GO
</code>

Of course there is always this:http://weblogs.sqlteam.com/jeffs/archive/2004/03/22/1085.aspx

|||Justin,
I thank you for your response. I am trying to consider andimplement your example. I looked around at many examples, and allseem to be confusingly different in implementation. My mainconcern is how well these implementations work with many concurrentusers, with a single web server and database server - theimplementation will not sit on a data center, so it needs to beperformant. From what I am seeing in the posts, all pagingsolutions come with their own caveats. I will get back to youJustin, regarding your own version. I am assuming I need to putit into a stored proc, but my db skills are not like they used tobe.
Thanks again,
Jr.
|||Hi thanks for this mail. I am working with Justins as a test run, but will keep this in mind.
Thank you.
Jr.
|||This is a silly question I know, but I have a Products table, aCategoryProducts table, and a Category table. To page, I willneed to do a join across these tables:
select a.ProductId, a.Name
from
soundleaf_products a,
soundleaf_categoryProducts b
where
b.categoryid = @.CategoryID and
b.productid = a.productid
This is really simple SQL, but from a performance place, I dont know ifthis is performant or not. I dont know how to test performance ofSQL, so any opinions on this would be very helpful.
Jr.
|||

I found you can speed up Justins solution by avoiding using a cursor like such. Especially on the lower page numbers
<code>
--## FIRST Create the Very Large Table

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[VeryLargeTable]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[VeryLargeTable]
GO

CREATE TABLE [dbo].[VeryLargeTable] (
[VLTID] [int] IDENTITY (1, 1) NOT NULL ,
[FirstName] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[LastName] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[OtherData] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
) ON [PRIMARY]
GO

--## SECOND Load some data
--## As we are adding 1 million records to the database..
--## please be patient as it could take some time.
--## If you already have a very large table to work from
--## you can use use that instead.

DECLARE @.x int
SET @.x = 1

WHILE @.x < 1000000
BEGIN
INSERT VeryLargeTable ( FirstName, LastName, OtherData )
VALUES ( 'John' + ' ' + CAST ( @.x AS varchar(100) ), 'Doe' + ' ' + CAST ( @.x AS varchar(100) ), 'YAK Stuff' + ' ' + CAST ( @.x AS varchar(100) ) )
SET @.x = @.x + 1
END

GO
--## FIFTH Create our Fast NON Cursor Procedure
CREATE PROCEDURE dbo.TestVeryLargePaging_Fast_New

@.Page int,
@.RecsPerPage int,
@.TotalPages int OUTPUT

as

set nocount on

SELECT @.TotalPages = ( SELECT COUNT(*)/@.RecsPerPage + 1 FROM VeryLargeTable )

create table #tempTable(pk int Primary Key)

Declare @.execStr varchar(2000)

Set @.execStr = 'Select top ' + convert(varchar,@.RecsPerPage) +
' VLTID from (Select top ' + convert(varchar,@.RecsPerPage*(@.page)) +
' VLTID from veryLargeTable Order By VLTID) A order by VLTID desc'

Insert into #tempTable
exec(@.execStr)

Select *
From veryLargeTable A
Join #tempTable B
On A.VLTID = B.pk

drop table #tempTable

set nocount off


GO
--## Execute the procedure
DECLARE @.Total int

EXEC dbo.TestVeryLargePaging_Fast_New 4, 500, @.Total OUTPUT

PRINT @.Total

GO
</code>
This procedure uses basically the same approach as Justins but without using a cursor.
JB

|||

A little help fromhttp://www.seventhnight.com/ and now it is screamin fast!
<code>
ALTER PROCEDURE dbo.TestVeryLargePaging_Fast_New

@.Page int,
@.RecsPerPage int,
@.TotalPages int OUTPUT

as

SELECT @.TotalPages = ( SELECT COUNT(*)/@.RecsPerPage + 1 FROM VeryLargeTable )

Declare @.execStr varchar(2000)

Set @.execStr = 'Select Z.* From veryLargeTable Z Inner Join (Select top ' + convert(varchar,@.RecsPerPage) +
' VLTID from (Select top ' + convert(varchar,@.RecsPerPage*(@.page)) +
' VLTID from veryLargeTable Order By VLTID) A order by VLTID desc) Y On Z.VLTID = Y.VLTID order By Z.VLTID'
exec(@.execStr)
go

</code>

|||JB, thanks for writing the above - I intend to play with what you'vewritten. Before I do I want to modify the original justinproduced, to take in a PageIndex rather than a start row. I amnot so good at SQL, and tried the following:
CREATE PROCEDURE [dbo].[PagingQueryProductsByCategory]
(
@.CategoryId int = 0,
@.PageIndex int = 1,
@.RowCount int = 10,
@.TotalRowCount int OUTPUT
)
AS
Declare @.TotalRows int, @.StartPosition int
Declare @.PK int
DECLARE @.tmpTable TABLE (
PK int NOT NULL PRIMARY KEY
)
Set RowCount @.RowCount
SELECT @.StartPosition = (((@.PageIndex - 1) * @.RowCount) + 1)
DECLARE PagingCursor CURSOR DYNAMIC READ_ONLY FOR
select a.ProductId
from
products a,
categoryProducts b
where
b.categoryid = @.CategoryID and
b.productid = a.productid
Order By a.productid
Open PagingCursor
Fetch Relative @.StartPosition From PagingCursor Into @.PK
while (@.RowCount <> 0) And (@.@.Fetch_Status = 0)
begin
Insert Into @.tmpTable (PK)
Values (@.PK)
Fetch Next From PagingCursor Into @.PK
Set @.RowCount = @.RowCount - 1
end
Close PagingCursor
Deallocate PagingCursor
Select Products.* From Products
Join @.tmpTable temp ON Products.ProductID = temp.PK
Order By Products.ProductID
Set RowCount 0
GO
If I execute it with PageIndex = 1, I get back say 10 rows. If Iexecute it with PageIndex > 1, I get nothing returned. I thinkit could be to do with the way the curso works with the Relativekeyword - I dont know how cursors work. Any advice?
Thanks.
JR.
|||

<code>
--## Make sure this query returns a LARGE resultset of ProductID's
select a.ProductId
from
products a,
categoryProducts b
where
b.categoryid = @.CategoryID and
b.productid = a.productid
Order By a.productid

-## If it doesn't tweek it until it does.


--## NEXT Ceate THIS procedure
CREATE PROCEDURE dbo.TestVeryLargePaging_Fast

@.Page int,
@.RecsPerPage int,
@.TotalPages int OUTPUT

AS

DECLARE @.FirstRec int
SELECT @.FirstRec = (@.Page -1) * @.RecsPerPage + 1

DECLARE @.RowCount int
SELECT @.RowCount = @.RecsPerPage

SELECT @.TotalPages = ( SELECT COUNT(*)/@.RecsPerPage + 1 FROM VeryLargeTable )

DECLARE @.PK int
DECLARE @.tmpTable TABLE (
PK int NOT NULL PRIMARY KEY
)

DECLARE PagingCursor CURSOR DYNAMIC READ_ONLY FOR

--## REPLACE THIS WITH YOUR RESLUT SET FROM ABOVE IF NEEDED
select a.ProductId
from
products a,
categoryProducts b
where
b.categoryid = @.CategoryID and
b.productid = a.productid
Order By a.productid


OPEN PagingCursor
FETCH RELATIVE @.FirstRec FROM PagingCursor INTO @.PK

WHILE (@.RowCount <> 0) And (@.@.Fetch_Status = 0)
BEGIN
INSERT INTO @.tmpTable (PK)
VALUES (@.PK)

FETCH NEXT FROM PagingCursor INTO @.PK
SET @.RowCount = @.RowCount - 1
END

CLOSE PagingCursor
DEALLOCATE PagingCursor


Select Products.* From Products p
Join @.tmpTable temp ON p.ProductID = temp.PK
Order By Products.ProductID

GO


--## The above procedure take 3 parameters.
--## @.Page -- This is the page number -- 1st, 2nd, third etc...set it to 1, or 2, or 3 etc...
--## @.RecsPerPage -- This is the number of results perpage that you want returned. Set it to 25 or 50 or 10 etc...
--## @.TotalPages -- This is an output parameter that tell you how many pages there are bases on the settings of @.Page and PRecsPerPage.


--## TO EXECUTE THE PROCEDURE RUN THIS AND PUT YOUR PARAMETERS IN ORDER
DECLARE @.Total int
EXEC dbo.TestVeryLargePaging_Fast 2, 50, @.Total OUTPUT
PRINT 'Total Pages: ' + CAST(@.Total as varchar(50))

GO

</code>

|||

JRieggle wrote:


I think it could be to do with the way the curso works with the Relative keyword - I dont know how cursors work. Any advice?


Cursors forces a database to keep data in one order instead of the Rows and Columns of the relational model, a Cursor with more than three FETCH can confuse SQL Server query processor, as Peter Gulutzan who tested and watched said you have asked me to do something I am not equipped to do so you think I need five loops but I am going to take twenty loops. But paging is not easy to create. Hope this helps.

|||

JBelthoff wrote:

A little help fromhttp://www.seventhnight.com/ and now it is screamin fast!
<code>
ALTER PROCEDURE dbo.TestVeryLargePaging_Fast_New

@.Page int,
@.RecsPerPage int,
@.TotalPages int OUTPUT

as

SELECT @.TotalPages = ( SELECT COUNT(*)/@.RecsPerPage + 1 FROM VeryLargeTable )

Declare @.execStr varchar(2000)

Set @.execStr = 'Select Z.* From veryLargeTable Z Inner Join (Select top ' + convert(varchar,@.RecsPerPage) +
' VLTID from (Select top ' + convert(varchar,@.RecsPerPage*(@.page)) +
'VLTID from veryLargeTable Order By VLTID) A order by VLTID desc) Y OnZ.VLTID = Y.VLTID order By Z.VLTID'
exec(@.execStr)
go

</code>


Hmm... alright! I must just test it out with some unit tests and that tonight.
The only thing that I'm a bit doubtful about "standard paging" withSeventhnight's solution is with sorting. It is generally an expensiveoperation but the real question is whose is the most expensive of themall?Smile [:)] Other than talking in terms of raw speed, there is onefeature you sacrifice if you go with his solution is that you can'thave dynamic sorting (ie: you would like to be able to sort on itemsfrom the datagrid columns).
|||I have a question regarding sorting and reallife applications. Lets say I want to browse a certain product category, and I want tooffer a number of different sorting orders (A-Z, Z-A, Price High-Low,Price Low-High, etc...like Amazon). Based on the scripts in thismessage thread, where would the ordering come in? I am trying to workout if building the SQL in the business layer will be any better thanbuilding it in a stored procedure? If building in a stored procedure,how would it be possible to support all of these different routines?
Thanks for any ideas
jr.
|||

<code>
CREATE PROCEDURE dbo.TestVeryLargePaging_Fast_New

@.Page int,
@.RecsPerPage int,
@.OrderCoumn varchar(100),
@.TotalPages int OUTPUT

as

SELECT @.TotalPages = ( SELECT COUNT(*)/@.RecsPerPage + 1 FROM VeryLargeTable )

Declare @.execStr varchar(2000)

Set @.execStr = 'Select Z.* From veryLargeTable Z Inner Join (Select top ' + convert(varchar,@.RecsPerPage) +
' VLTID from (Select top ' + convert(varchar,@.RecsPerPage*(@.page)) +
' VLTID from veryLargeTable Order By VLTID) A order by VLTID desc) Y On Z.VLTID = Y.VLTID order By ' + @.OrderCoumn
exec(@.execStr)
go


</code>

|||JB,
Just writing to ask some questions about your SQL. I rewrote it to make it easier to read:
Select Z.* From Products Z Inner Join
(Select top 20 A.ProductID
from (
Select top 20 B1.ProductID
from Products B1, categoryProducts B2
where B2.CategoryID = 1 and B1.ProductID = B2.ProductID
Order By B1.Name
) A
order by A.Name desc
) Y On Z.ProductID = Y.ProductID
order by Z.Name
Lets say the blue is the core search. In this search the Order Byuses the Name, rather than the Id. The reason I am doing thishere, is because if I use the query as it was in your previous post,indexing to a particular page, I noticed only the results in that pageare ordered, which isnt what I want. I want to order the completeset, then page. This is why I moved the Order into thecore. This then produces a table of n results. The top 20 would have been top 10 * PageIndex which is set to 2, but Icouldnt get it to work for this example, and substituted it for thepage index. This inner query will create a resultset PageIndex *RowCount long. If the result count is only 20, and the pageindexis 10, then the number of results created in the core query will be200. The outer query in green confuses me, because I dont knowwhy the order by clause is there. It seems to me that all thatneeds to be done is take the top 20 off of the inner query resultlist. However, I see the reason for the order by is to invert thelist, since the page we want is at the end, and this ordering invertsthe complete resultset. Originally this query was orderingcompletely on the ProductID until the veryu outer query (Z.ProductID isnow changed to Z.Name).
One concern I have with this query is that I need to order the itemsagain on the very outer query - is this because the indexer will matchthe IDs by going down the ProductID key field, which will be in order?If I dont have the order by Z.Name, the items come out in ProductIDorder. My last point was to say the reason I posted this is tokeep along with this thread, a version of the query that uses adifferent column to sort by.
Thanks JB,
jr.

No comments:

Post a Comment