Saturday, April 8, 2017

Distributed Lock using MongoDB

Distributed system needs synchronization many times. If the synchronizing processes are running within the same machine or container, then no problem; we have plenty of choices such as operating system provided semaphores, file locks etc. Synchronizing within the same process (across multiple threads, green threads etc.) is also fine, we have thread level locks, atomic integers, semaphores etc. Of course, we need to be very careful while we go for synchronization; there may deadlocks occasionally which will require us hours of debugging to uncover the root cause.

When we need to synchronize the activity of processes running across machines, then we will need distributed lock and there are systems suitable for that. Zookeeper, Etcd., Hazelcast, Redis etc. may be used for distributed locking.

In this post I will explain how we may use MongoDB for distributed locking. The locks described are exclusive locks(i.e. they are not read locks, implementing read locks using MongoDB will need some more thoughts :))


Now let us discuss how we can achieve exclusive locks using MongoDB. We will use below MongoDB document to describe a lock.
{
     _id :  "lock_id",
    acquirer: "acquirer_unique_id",
    updated: "timestamp"

}

There are two operations, lock and release lock. Also, the lock should be automatically released if the locking process is not alive, or unable connect to MongoDB for some interval.

Lock operation:
Lock("a_new_lock") ->  insert a record with _id = "a_new_lock" and acquirer="unique id for the acquirer", and expire after = current-time-stamp

The process will run a thread to update the expireafter time to current timestamp periodically.

UnLock operation:
UnLock("a_lock") -> delete the record with _id = "a_lock" and if the record's acquirer_id is same as the caller's acquirer id. Matching the acquirer_id is important, without that the process may end up releasing  a lock acquired by some other process.

Things needed to be done for MongoLock operations:
A mongo collection, let us name the collection "locks" in database distlock.
A ttl index in the collection, which will ensure that the locks eventually get released if the lock owner process goes down without releasing the lock.

Let us create a ttl index on the table locks:
db.locks.createIndex( { "updated": 1 }, { expireAfterSeconds: 600 } )


Let us create a lock with name "newlock":
db.locks.insert({_id : "newlock", aqcuirer : "myid", updated : new Date()})


Now if another process tries to get the "newlock", it will fail:

db.locks.insert({_id : "newlock", aqcuirer : "myid", updated : new Date()})

WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "E11000 duplicate key error collection: distlock.locks index: _id_ dup key: { : \"newlock\" }"
}
})

Releasing the lock is a little tricky. We shouldn't delete the lock by its id only, we should delete the lock if the lock id and acquirer id both match. Otherwise, there may be a chance that a process will end up releasing a lock acquired by another process.
Lets release the lock "newlock":

db.locks.remove({_id : "newlock", aqcuirer : "myid"})

WriteResult({ "nRemoved" : 1 })


What happens if the process dies without releasing the lock? Well, the lock will be automatically released by MongoDB after 10 minutes as the locks collection has a ttl index on it with expireAfterSeconds set to 600 seconds. While the lock acquirer is running, it should update the "updated" field every 10 seconds or so, because some long running operation may continue for more than 10 minutes (after acquiring the lock).

The above simple operations demonstrate how we may implement simple distributed exclusive lock using MongoDB. Hope it was helpful :)

The prototype implementation is available here:
https://gist.github.com/nipuntalukdar/e9c1db9a78b45266a4ccfcff0f8f24a4



1 comment:

  1. Good idea to have distributed lock. it TTL expiring we need to ensure that container that started lock completed the assigned task or if it failed to start, next container can take up that task.

    ReplyDelete