Wednesday, August 17, 2011

How to keep the Database in sync with your cache?

There are few different ways to achieve this. You can put the onus on the underlying cache to fetch the data periodically or when it determines that the data is stale. Secondly you can put the onus on the underlying database to “push” updates periodically or when the data is updated.

Read Heavy use cases

Cache -> DB

1.     The most straightforward way is to set the Time to Live (TTL) and Time to Idle (TTI) on the cache so the data will expire periodically. The next request will result in a cache miss and your application will pull the current value from the underlying database and put it into the cache.
Few things to note here are:
a.     There might be a window when data is in cache and is not in synch with the underlying database
b.     A cache miss can be interpreted as a performance hit.

This is called read-through caching.

1.     An alternate approach is to perform cache updates or invalidation periodically - use a batch process (could be scheduled using open source Quartz) running in periodic intervals to either invalidate the cache or update the cache. You could do this by using SelfPopulating Ehcache.


1.     You could also transfer the synching onus to the underlying database itself. For ex. Oracle AQ provides a way to register a call back when any database updates happen. This can be leveraged to either invalidate or update the cache store.

2.     Alternatively you could also use middleware technologies like GoldenGate, JMS to capture DB changes when they occur to "push" notifications into the Memory Store.


Write Heavy use cases

1.     There are scenario’s that require frequent updates to stored data. Every update to the cached data must invoke a simultaneous update to the database at the same time. This is the Write-through feature provided by Ehcache. However, updates to the database are almost always slower, so this slows the effective update rate to the cache and thus the performance in general. When many write requests come in at the same time, the database can easily become a bottleneck or, even worse, be killed by heavy writes in a short period of time. The Write-behind feature provided by Ehcache allows quick cache writes with ensured consistency between cache data and database. The idea is that when writing data into the cache, instead of writing the data into database at the same time, the write-behind cache saves the changed data into a queue and lets a backend thread to do the writing later. Therefore, the cache-write process can proceed without waiting for the database-write and, thus, be finished much faster. Any data that has been changed can be persisted into database eventually. In the mean time, any read from cache will still get the latest data.

In case of Terracotta, the Terracotta Server Array maintains the write-behind queue. A thread on each JVM checks the shared queue and save each data change left in the queue.

1     Finally you could also make you application update the cache and DB simultaneously. It is advisable to use transactions to perform this in the following manner:
a.     Start a transaction
b.     Update the database
c.      Update the cache
d.     Commit the transaction
Some points to remember are that your update code is directly aware of the cache and there is a performance impact since your update latency reflects both DB and cache update time.