I've seen this happen quite a few times and each time it makes me want to bang my head on a table. I saw it again today and it's just so frustrating yet it seems so basic. The scenario is simple:
- you have a database that has some records in it
- you want to display some records on a web page
- you want to paginate through the records
The developer throws a SQL query together like:
select * from log;
Someone throws together a pagination method that looks for the number of results in that record set and then splits it up. You throw together a nice UI to display it. To test it, you set the pagination at 20 records and put 45 records in the database, beautiful and it's all done.
Fast forward a few months... site has come to a crawl. Your log table now has 10 million records in it. To display 20 records on the screen, how many records is the database getting back for you, 20 or 10 million? Again and again, I've seen it where people return 10 million records and then throw all but 20 of them away.
The overhead of producing all those records, sending them over to your application server and then doing whatever needs to be done is of course completely pointless. It's not hard to stop this:
- paginate your SQL correctly, for example Postgresql has LIMIT and OFFSET
- use an ORM that evaluates lazily ensuring that it only queries once you've told it you want 20, for example in Django
- ZSQL Methods in Zope (I don't think many people use them anymore) have a limit of 1000 records to prevent this, if you are upping that number you know something is wrong.
- ZCatalog in Plone is generally ok since it's quite lazy, but there is an overhead. There's an options called
sort_limit("An optimization hint to tell the catalog how many results you are really interested in"), your mileage may vary there.
And finally if you have a data source that you can't paginate, you have to ask yourself if you can apply the pagination style interface to it. One example was I was asked to create a browse interface to an old LDAP installation. This version of openLDAP did not have an offset meaning that there was no easy way to do it. Ugh.
So if you hear someone say "just list all X here" or "can you just add in pagination" think about the scalability down the line.