REST collection, paging and filtering

Topics: Web Api
Jan 25, 2012 at 12:08 PM

I'm curious to see what solutions you are using to represent collections that can be paged and filtered.
Specifically how do you handle caching of each page and then updates when adding/deleting items.  In small collections this isn't an issue and you just invalidate/reload the whole collection (pages can share ETags for example), but in very large collections you don't want to reload the whole collection every time something changes.

You can accept that pages can be of different sizes but you will never have page sizes on all clients in sync with the chunked version on the server, unless you do something like a different ETag for each page?  In that case you stop thinking of a collection as a single resource, but more like connected sub-collections, and then how do you govern how much each page can grow/shrink before you have to redo the whole thing? It quickly becomes a mess :)

Personally I think the method of thinking about the whole collection as a single resource is the best one.  Each page shares a common ETag and the client cache is updated "on demand" as each page is used and validated by a conditional GET.  However this can get tricky if the client app has it's own collection and page sizes, which clients usually do.  Then you have to create and track relationships from the HTTP cache to that other collection, and so on...

There must be some good pattern to do this :)

Jan 25, 2012 at 12:38 PM
Edited Jan 25, 2012 at 12:38 PM

Interesting questions and issues.  I give thanks that (for now) my service is read-only, and I don't have to worry about the problem of adding/deleting items in collections.

Maybe also try the rest-discuss mailing list on Yahoo?

Jan 25, 2012 at 4:42 PM

I don't see how it's any different than when not using REST.

If you allow filters, you can't exactly cache each page, unless you have infinite memory or a severe limited amount of filters.

This is more related to having good paging/querying speeds.

If you are thinking of advanced filtering, you should probably use something built for that, like Lucene and Solr.

If you want to cache, per filter per page thing, then I normally just make a cache entry for each result set. Then if the data changes I just destroy the cache for that resource and they get built up as they get requested, it isn't such a big deal as long as your data doesn't constantly change.

You can also only re-cache every so often. So have the read-only from the cache and only update it every like 5-10 minutes. You can have a job that does it, or built into the query from the cache. (Making cache invalidate itself, or a service to invalidate)

Jan 25, 2012 at 5:04 PM

The main reason this is tricky is client cache, you have multiple clients caching your data at once.  For server side caching then yes you can destroy cache dependencies.

You are supposed to communicate caching instruction to the client via HTTP caching headers, everything from how long to cache it and how to validate the cache is still valid (ETags or If-Modified-Since).

Jan 25, 2012 at 5:21 PM
Edited Jan 25, 2012 at 5:22 PM

Dynamic data is not supposed to be client cached, so you are using the protocol for something that it is not supposed to be used for.

Not to mention, the browser won't cache it if it contains a querystring.

Jan 26, 2012 at 11:14 AM

The data isn't very dynamic, but is large and changes slowly.  So caching it for some time is perfectly fine.

Jan 26, 2012 at 4:13 PM
Edited Jan 26, 2012 at 4:14 PM

Actually it is dynamic. I don't mean that the data changes, I mean that the RESOURCE can be filtered. <- cachable <- cachable <- cachable$filter=foo eq 5 <-- not cachable, dynamic data