We use Redis for sessions and for a short-lived data cache in our node.js application.
Like any component in the system, there’s a potential risk of failure, and graceful failover to a “slave” instance is a way to mitigate the impact. We use Redis Sentinel to help manage this failover process.
As the docs describe,
Redis Sentinel is a distributed system, this means that usually you want to run multiple Sentinel processes across your infrastructure, and this processes will use agreement protocols in order to understand if a master is down and to perform the failover.
Essentially, each node server has its own sentinel corresponding to each redis cluster [master and slave(s)] that it connects to. We have one redis cluster, so for N node servers, there are N sentinels. (This isn’t the only way to do it - there could be only one sentinel, or any other configuration really, but the 1:1 ratio seems to be the simplest.) Each sentinel is connected to the master and slaves to monitor their availability, as well as to the other sentinels. If the master goes down, the sentinels establish a “quorum” and agree on which slave to promote to master. They communicate this through their own pub/sub channels.
The sentinel is not a proxy - the connection to the sentinel doesn’t replace the connecton to the master - it’s a separate instance with the sole purpose of managing master/slave availability. So the app connects to the sentinel in parallel with the master connection, and listens to the chatter on the sentinel channels to know when a failover occurred. It then has to manage the reconnection to the new master on its own.
We’re using the standard node_redis library, which is robust, easy to use, and works “out of the box” for things like sessions. But a year ago, when Sentinel started to gain adoption, the best approach for adding Sentinel awareness to node_redis clients wasn’t clear, so a thread started on Github to figure it out.
One simple approach was for the application to simply hold two connections, for sentinel and master, and when the sentinel reports a failover, to reconnect the master. But the way node_redis works, any data in transit during the failover is lost. Also with this approach, the code listening to the Sentinel’s pub/sub chatter lived in the application, and wasn’t as encapsulated as we thought it should be.
So we decided to create a middle tier, a redis sentinel client, that would handle all this automatically.