Caching: database vs. memcached?
Ben and I have been discussing how to do the caching on Miro Guide. To summarize Matt’s analysis from earlier this month, the most expensive queries we have are doing COUNT() queries over 500k records. Even with the correct indicies, that’s a little expensive when it’s happening periodically (a couple seconds with no load on the DB) and really expensive when it happens a lot (often over 20 seconds). What Matt put together when he was doing that research was a small script which does that calculation and populates a database table with it. Then the guide looks up the value in that table instead of performing the calculation.
What I’ve done is brought that idea into the world of memcached. The popularity values are calculated if they’re not in the cache, and then they’re put there and retrieved. The keys are time-sensitive; they’re only good for 5 minutes for last 24 hour popularity, and an hour for the last month’s popularity. It’s working well, and it’s making the Guide faster. When subscriptions are added, the cache is also incremented, so the cached values stay very close to the actual database values without much effort.
What we are wondering about is what people think about the different approaches, either calculating the values and putting them in a new database table and recalculating them, or putting them into the memory cache. What do you all think?