I am getting incorrect results from my paging query, where the same results are being returned multiple times.
Here are two queries I have found to bring the same results:
select top 20 * from lookupdocuments_dbv where catname_cst='MyCategoryName' and (docid_cin not in (select top 620 docid_cin from lookupdocuments_dbv where catname_cst='MyCategoryName'))
select top 20 * from lookupdocuments_dbv where catname_cst='MyCategoryName' and (docid_cin not in (select top 640 docid_cin from lookupdocuments_dbv where catname_cst='MyCategoryName'))
When I remove the catname_cst where clause it brings back results properly (i.e. records 622-642 and 643-663).
What is wrong with my where clause that is causing identical data to be returned?
I'll try to be a bit more generic with my question.
How would you properly add a where clause in the T-SQL paging query technique I am using?
The template for the paging query I am using shows up in a few tutorials, it's a fairly known technique:
SELECT TOP rows_to_return * FROM table WHERE unique_id NOT IN (SELECT TOP row_to_start_at unique_id FROM table)
Basically it ignores the rows from Row 0 through row_to_start_at and from that starting point it selects rows until rows_to_return is reached.
So how should a where clause (i.e. where catname_cst='mycategoryname') be properly added to this paging query?
|||Do a quick google search for "custom paging + stored procedure" and you will find some sample code on how to write custom paging which is much more efficient than what you are doing.
|||You may be suggesting a tutorial (4guysfromrolla.com?) that uses temp tables and variables for current/last records. If you are, is creating a temp table (with potentially 40,000+ rows to be inserted) really more efficient than selecting the top X rows in my technique? I realize that my technique will become progressively slower as the Y (rows to be ignored) value increases, but I don't believe it should ever exceed the time required in creating a temp table that houses every single row.
I will test this out tomorrow and see how it works.
|||There is no ORDER BY to go with your TOP so you will get just 20 rows in no particular order. You might as well use SET ROWCOUNT 20 which is faster than TOP 20. Also look into using EXISTS instead of IN. IN gets internally converted into OR's and may not result in efficient query plans.
|||I will use SET ROWCOUNT, but I cannot determine how NOT EXISTS will help with filtering rows. The only result from google on the subject shares my confusion.
I'll test out the temp tables technique, but even if it works perfectly, I'll still want to know why my technique isn't working 100% of the time. Efficiency aside, there just doesn't seem to be anything wrong with the query, maybe it's the data.. the first dozen or so pages will look fine, then suddenly every so often an identical page is found, then they start coming in more frequently, and soon enough you're seeing more identical pages than non. This isn't just something I'm seeing on a site grid, I'm getting the same identical results within SQL Server. I think that based on the response and lack of responses it is not something with the query (as that kind of thing would have been pointed out, considering it should just be some simple logic error)..
The strange part is that if I perform a query such as getting the top 40 and the top 20 rows starting after results 620 and 640 respectively, 1-20 of each query will be the same, and 21-40 of the first query will contain the proper data that should be showing up in the second query..
To sum that up.. because I really want an answer:
Query 1 should get 40 results starting after result 620, so it should return 621-660.
Query 2 should get 20 results starting after result 640, so it should return 641-660.
Query 1 returns results 621-660.
Query 2 returns results 621-640.
Could I get some insight into how I would write a paging query with not exists?
|||The reason you are not getting the right results, as I mentioned above, is using TOP without ORDER BY. Its like saying "give me the top 20 records that match this criteria". There is every chance the same record may show up in the next set or an expected record may not show up at all. TOP X is incomplete by itself, although syntactically correct.
|||Thanks, I implemented a version of my technique with Order By as you suggest and it works.
No comments:
Post a Comment