Severalnines - MongoDB

In previous posts of our MongoDB DBA series, we have covered Deployment, Configuration, Monitoring (part 1), Monitoring (part 2) and backup. Now it is time to recover MongoDB using a backup we made in the previous blog post.

As we learned in the previous blog post, there are two types of backups: logical and physical backups. The physical backups for MongoDB are basically file system backups, and they will be relatively easy to restore. We will leave them out of the scope today. Also the new Percona MongoDB Consistent Backup tool will allow you to make consistent snapshots of your MongoDB sharded clusters. This versatile tool is relatively new (beta) and still quite complex to use and restore, but it is supported in ClusterControl 1.3.2. We will cover this tool in a future blog post.

In this blog post we will cover the use cases for restoring your logical backups with MongoDB. These use cases will vary from restoring a single node, restoring a node in an existing replicaSet and seeding a new node in a replicaSet.

MongoDB backup format

We have to explain the MongoDB backup format first. When performing a mongodump, all collections within the designated databases will be dumped as BSON output. If no database is specified, MongoDB will dump all databases except for the admin, test and local databases as they are reserved for internal use.

When you are running the MMAP storage engine, the dumps should be equal to the data you have on disk, as both MMAP and the dumps are stored as BSON files. If you run WiredTiger, naturally those files would differ, as the storage engine stores them in a totally different format. Given that you export the data via a dump, the content of the data is the same.

Archive file

By default mongodump will create a directory called dump, with a directory for each database containing a BSON file per collection in that database. Alternatively you can tell mongodump to store the backup within one single archive file. The archive parameter will concatenate the output from all databases and collections into one single stream of binary data. Additionally the gzip parameter can naturally compress this archive, using gzip. In ClusterControl we stream all our backups, so we enable both the archive and gzip parameters.

Include the oplog

Similar to mysqldump with MySQL, if you create a backup in MongoDB, it will freeze the collections while dumping the contents to the backup file. As MongoDB does not support transactions you can’t make a 100% fully consistent backup, unless you create the backup with the oplog parameter. Enabling this on the backup includes the transactions from the oplog that were executing while making the backup. This behaviour is similar to Percona Xtrabackup that uses the MySQL binary log to capture all transactions during the backup process.

Restoring MongoDB from a backup

There are basically two ways you can use a BSON format dump:

Run mongod directly from the backup directory
Run mongorestore and restore the backup

Run mongod directly from a backup

A prerequisite for running mongod directly from the backup is that the backup target is a standard dump, and is not gzipped.

This command will start the MongoDB daemon with the data directory set to the backup location:

[root@node ~] mongod --datapath /backups/backup_10.10.34.12_2016-08-02_180808/

The MongoDB daemon will then check the integrity of the data directory, add the admin database, journals, collection and index catalogues and some other files necessary to run MongoDB. Obviously if you ran WiredTiger as the storage engine before, it will now run the existing collections as MMAP. But for simple data dumps or integrity checks this works fine.

Run mongorestore

A better way to restore would obviously be by restoring the node using mongorestore:

[root@node ~] mongorestore /backups/backup_10.10.34.12_2016-08-02_180808/

In case of a backup made by ClusterControl (or one that is in archive and gzipped format) the command is slightly different:

[root@node ~] mongorestore --gzip --archive=/backups/backup_10.10.34.12_2016-08-02_180808.gz

This will restore the backup into the default server settings (localhost, port 27017) and overwrite any databases in the backup that reside on this server. Now there are tons of parameters to manipulate the restore process, and we will cover some of the important ones.

Object validation

As the backup contains BSON data, you would expect the contents of the backup to be correct. However it could have been the case that the document that got dumped was malformed to begin with. Mongodump does not check the integrity of the data it dumps. So the objcheck parameter will enable object checking before it gets sent to the server. This flag is enabled by default since MongoDB version 2.4. If you are running an older version, better to enable this flag.

Oplog replay

As we described earlier, adding the oplog to your backup will enable you to perform a consistent backup and do a point-in-time-recovery. Enable the oplogReplay parameter to apply the oplog during restore process. To control how far to replay the oplog, you can define a timestamp in the oplogLimit parameter. Only transactions up until the timestamp will then be applied.

Restoring a full replicaSet from a backup

Restoring a replicaSet is not much different than restoring a single node. Either you have to set up the replicaSet first, and restore directly into the replicaSet. Or you restore a single node first, and then use this restored node to build a replicaSet.

Restore node first, then create replicaSet

We will start mongod on the first node running in the foreground:

[root@firstnode ~] mongod --dbpath /var/lib/mongodb/

Restore the node in a second terminal

[root@firstnode ~] mongorestore --gzip --archive=/backups/backup_10.10.34.12_2016-08-02_180808.gz

Now stop mongod in the first terminal, and start it again with the replicaSet enabled:

[root@firstnode ~] mongod --dbpath /var/lib/mongodb/ --replSet <set_we_restore>

Then initiate the replicaSet in the second terminal:

[root@firstnode ~] mongo --eval “rs.initiate()”

After this you can add the other nodes to the replicaSet:

[root@firstnode ~] mongo
test:PRIMARY> rs.add(“secondnode:27017”)
{ "ok" : 1 }
test:PRIMARY> rs.add(“thirdnode:27017”)
{ "ok" : 1 }

Now the second and third node will sync their data from the first node. After the sync has finished our replicaSet has been restored.

Create a ReplicaSet first, then restore

Different to the previous process, you can create the replicaSet first. First configure all three hosts with the replicaSet enabled, start up all three deamons and initiate the replicaSet on the first node:

[root@firstnode ~] mongo
test:PRIMARY> rs.initiate()
{
    "info2" : "no configuration specified. Using a default configuration for the set",
    "me" : "firstnode:27013",
    "ok" : 1
}
test:PRIMARY> rs.add(“secondnode:27017”)
{ "ok" : 1 }
test:PRIMARY> rs.add(“thirdnode:27017”)
{ "ok" : 1 }

Now that we have created the replicaSet, we can directly restore our backup into it:

mongorestore --host <replicaset>/<host1>:27017,<host2>:27017,<host3>:27017 --gzip --archive=/backups/backup_10.10.34.12_2016-08-02_180808.gz

In our opinion restoring a replicaSet this way is much more elegant. It is closer to the way you would normally set up a new replicaSet from scratch, and then fill it with (production) data.

Seeding a new node in a replicaSet

When scaling out a cluster by adding a new node in MongoDB, an initial sync of the dataset must happen. With MySQL replication and Galera, we are so accustomed to use a backup to seed the initial sync. With MongoDB this is possible, but only by making a binary copy of the data directory. If you don’t have the means to make a file system snapshot, you will have to face downtime on one of the existing nodes. The process, with downtime, is described below.

If you only have one node, you have to shut down the primary mongod:

[root@primary ~] service mongod stop

Start up the receiving side of the new node:

[root@newnode ~] cd /var/lib/mongodb
[root@newnode ~] nc -l 7000 | tar -xpf -

Make a copy of the data directory and copy this over to the new node:

[root@primary ~] tar -cf - /var/lib/mongodb | nc newnode 7000

Once the copy process has finished, start mongod on both nodes:

[root@primary ~] service mongod start
[root@newnode ~] service mongod start

And now you can add the new node to the replicaSet.

[root@primary ~] mongo
test:PRIMARY> rs.add("newnode:27017")
{ "ok" : 1 }

In the MongoDB log file, the joining process of the new node will look similar to this:

I REPL     [ReplicationExecutor] This node is newnode:27017 in the config
I REPL     [ReplicationExecutor] transition to STARTUP2
I REPL     [ReplicationExecutor] Member primary:27017 is now in state PRIMARY
I REPL     [ReplicationExecutor] syncing from: primary:27017
I REPL     [ReplicationExecutor] transition to RECOVERING
I REPL     [ReplicationExecutor] transition to SECONDARY

That’s it, you have just managed to save yourself from a long syncing process. Obviously this caused downtime for us, which is normally unacceptable for a primary. You can also use a secondary to copy your data from, but downtime on the secondary could lead the primary to lose the majority in the cluster. If you are running MongoDB in a WAN environment, this is a good reason to enable filesystem snapshots in your environment.

Seeding with a backup

So what would happen if you restore the new node from a mongodump backup instead, and then have it join a replicaSet? Restoring from a backup should in theory give the same dataset. As this new node has been restored from a backup, it will lack the replicaSetId and MongoDB will notice. As MongoDB doesn’t see this node as part of the replicaSet, the rs.add() command then will always trigger the MongoDB initial sync. The initial sync will always trigger deletion of any existing data on the MongoDB node.

If you would do such a thing, let the log file speak for itself:

I REPL     [ReplicationExecutor] This node is newnode:27017 in the config
I REPL     [ReplicationExecutor] transition to STARTUP2
I REPL     [ReplicationExecutor] Member primary:27017 is now in state PRIMARY
I REPL     [rsSync] ******
I REPL     [rsSync] creating replication oplog of size: 1514MB...
I STORAGE  [rsSync] Starting WiredTigerRecordStoreThread local.oplog.rs
I STORAGE  [rsSync] The size storer reports that the oplog contains 0 records totaling to 0 bytes
I STORAGE  [rsSync] Scanning the oplog to determine where to place markers for truncation
I REPL     [rsSync] ******
I REPL     [rsSync] initial sync pending
I REPL     [ReplicationExecutor] syncing from: primary:27017
I REPL     [rsSync] initial sync drop all databases
I STORAGE  [rsSync] dropAllDatabasesExceptLocal 2
I REPL     [rsSync] initial sync clone all databases

The replicaSetId is generated when initiating a replicaSet, and unfortunately can’t be set manually. That’s a shame as recovering from a backup (including replaying the oplog) would theoretically give us a 100% identical data set. It would be nice if the initial sync was optional in MongoDB to satisfy this use case.

ReplicaSet node recovery

It’s a different story for nodes that are already part of a replicaSet. If you have a broken node in your replicatSet you can have it fixed by issuing a resync of the node. However if you are replicating over a WAN and your database / collection is in the terrabyte range, it could take a long time to restore. This is when the seeding would come in handy as well.

Conclusion

We have explained in this blog post various ways to restore a MongoDB node or replicaSet, and what the caveats are. MongoDB backups aren’t very difficult to restore, and in worst case you can even run the database directly from the BSON files in the backup.

In the next blog post, we will start scaling out read requests with MongoDB, and discuss the things you should look out for!

Tags:

In previous posts of our “Become a MongoDB DBA” series, we covered Deployment, Configuration, Monitoring (part 1), Monitoring (part 2), backup and restore. From this blog post onwards, we shift our focus to the scaling aspects of MongoDB.

One of the cornerstones of MongoDB is that it is built with high availability and scaling in mind. Scaling can be done either vertically (bigger hardware) or horizontally (more nodes). Horizontal scaling is what MongoDB is good at, and it is not much more than spreading the workload to multiple machines. In effect, we’re making use of multiple low-cost commodity hardware boxes, rather than upgrading to a more expensive high performance server.

MongoDB offers both read- and write scaling, and we will uncover the differences of these two strategies for you. Whether to choose read- or write scaling all depends on the workload of your application: if your application tends to read more often than it writes data you will probably want to make use of the read scaling capabilities of MongoDB. Today we will cover MongoDB read scaling.

Read scaling considerations

With read scaling, we will scale out our read capacity. If you have used MongoDB before, or have followed this blog series, you may be aware that actually all reads end up on the primary by default. Regardless if your replicaSet contains nine nodes, your read requests still go to the primary. Why was this done deliberately?

In principle, there are a few considerations to make before you start reading from a secondary node directly. First of all: the replication is asynchronous, so not all secondaries will give the same results if you read the same data at the same point in time. Secondly: if you distribute read requests to all secondaries and use up too much of their capacity, if one of them becomes unavailable, the other secondaries may not be able to cope with the extra workload. Thirdly: on sharded clusters you should never bypass the shard router, as data may be out-of-date or data may have been moved to another shard. If you do use the shard router and set the read preference correctly, it may still return incorrect data due to incomplete or terminated chunk migrations.

As you have seen these are serious considerations you should make before scaling out your read queries on MongoDB. In general, unless your primary is not able to cope with the read workload it is receiving, we would advise against reading from secondaries. The price you pay for inconsistency is relatively high, compared to the benefits of offloading work from the master.

The main issue here seems to be the eventual consistency of MongoDB, so your application needs to be able to work around that. Also if you would have an application that is not bothered by stale data, analytics for instance, you could benefit greatly from using the secondaries.

Reading from a secondary

There are two things that are necessary to make reading from a secondary possible: tell the MongoDB client driver that you actually wish to read from a secondary (if possible) and tell the MongoDB secondary server that it is okay to read from this node.

Setting read preference

For the driver, all you have to do is set the read preference. When reading data you simply set the read preference to read from a secondary. Let’s go over each and every read preference and explain what it does:

primary	Always read from the primary (default)
primaryPreferred	Always read from the primary, read from secondary if the primary is unavailable
secondary	Always read from a secondary
secondaryPreferred	Always read from a secondary, read from the primary if no secondary is available
nearest	Always read from the node with the lowest network latency

It is clear the default mode is the least preferred if you wish to scale out reads. PrimaryPreferred is not much better, as it will pick 99.999% of the time the primary. Still if the primary becomes unavailable you will have a fallback for read requests.

Secondary should work fine for scaling reads, but as you leave out the primary the reads will never have a fallback if no secondary is available. SecondaryPreferred is slightly better, but the reads will hit almost all of the time the secondaries, which still causes an uneven spread of reads. Also if no secondaries are available, in most cases there will be no longer a cluster and the primary will demote itself to a secondary. Only when an arbiter is part of the cluster, the secondaryPreferred mode makes sense.

Nearest should always pick the node with the lowest network latency. Even though this sounds great from an application perspective, this will not guarantee you get an even spread in read operations. But it will work very well in multi-regions where latency is high, and delays are noticeable. In such cases, reading from the nearest node means your application will be able to serve out data with the minimum latency.

Filtering nodes with tags

In MongoDB you can tag nodes in a replicaSet. This allows you to make groupings of nodes and use them for many purposes, including filtering them when reading from secondary nodes.

An example of a replicaSet with tagging can be:

{
    "_id" : "myrs",
    "version" : 2,
    "members" : [
             {
                     "_id" : 0,
                     "host" : "host1:27017",
                     "tags" : {
                             "dc": "1",
                             "rack": "e3"
                     }
             }, {
                     "_id" : 1,
                     "host" : "host2:27017",
                     "tags" : {
                             "dc": "1",
                             "rack": "b2"
                     }
             }, {
                     "_id" : 0,
                     "host" : "host3:27017",
                     "tags" : {
                             "dc": "2",
                             "rack": "q1"
                     }
             }
    ]
}

This tagging allows us to limit our secondary to exist, for instance, in our first datacenter:

db.getMongo().setReadPref(‘secondaryPreferred’, [ { "dc": "1" } ] )

Naturally the tags can be used with all read preference modes, except Primary.

Enabling secondary queries

Apart from setting the read preference in the client driver, there is another limitation. By default MongoDB disables reads from a secondary server side, unless you specifically tell the host to allow read operations. Changing this is relatively easy, all you have to do is connect to the secondary and run this command:

myset:SECONDARY> rs.slaveOk()

This will enable reads on this secondary for all incoming connections. You can also run this command in your application, but that would then imply your application is aware it could encounter a server that did not enable secondary reads.

Reading from a secondary in a shard

It is also possible to read from a secondary node in MongoDB sharded clusters. The MongoDB shard router (mongos) will obey the read preference set in the request and forward the request to a secondary in the shard(s). This also means you will have to enable reads from a secondary on all hosts in the sharded environment.

And as said earlier: an issue that may arise with reading from secondaries on a sharded environment, is that it might be possible to receive incorrect data from a secondary. Due to the migration of data between shards, data may be in transit from one shard to another. Reading from a secondary may then return incomplete data.

Adding more secondary nodes

Adding more secondary nodes to a MongoDB replicaSet would imply more overhead for replication. However unlike MySQL, syncing the oplog on secondaries is not only limited to the primary node. MongoDB can also sync the oplog from other secondaries, as long as they are up to date with the primary. This means oplog servicing is also possible from other secondaries, and we thus automatically have “intermediate” primaries. This means theoretically that if you add more secondaries, the performance impact will be limited. Keep in mind that this also means the data trickles down to the other nodes a bit slower, as the oplog entries have to pass at least two nodes now.

Conclusion

We have described what the impact is on MongoDB when reading from its secondaries, and what caveats to be aware of. If you don’t necessarily need to scale your reads, it is better not to perform this pre-emptive optimization. However if you think your setup would benefit from offloading the primary, better work around the issues described in this post.

Tags:

In the previous blog posts, we gave a brief introduction to the ClusterControl Developer Studio and the ClusterControl Domain Specific Language. We covered some useful examples, e.g., how to extract information from the Performance Schema, how to automatically have advisors scale your database clusters and how to create an advisor that keeps an eye on the MongoDB replication lag. ClusterControl Developer Studio is free to use, and included in the community version of ClusterControl. It allows you to write your own scriptlets, advisors and alerts. With just a few lines of code, you can already automate your clusters. All advisors are open source on Github, so anyone can contribute back to the community!

In this blog post, we will make things a little bit more complex than in our previous posts. Today we will be using our MongoDB replication window advisor that has been recently added to the Advisors Github repository. Our advisor will not only check on the length of the replication window, but also calculate the lag of its secondaries and warn us if the node would be in any risk of danger. For extra complexity, we will make this advisor compatible with a MongoDB sharded environment, and take a couple of edge cases into account to prevent false positives.

MongoDB Replication window

The MongoDB replication window advisor complements the MongoDB lag advisor. The lag advisor informs us of the number of seconds a secondary node is behind the primary/master. As the oplog is limited in size, having slave lag imposes the following risks:

If a node lags too far behind, it may not be able to catch up anymore as the transactions necessary are no longer in the oplog of the primary.
A lagging secondary node is less favoured in a MongoDB election for a new primary. If all secondaries are lagging behind in replication, you will have a problem and one with the least lag will be made primary.
Secondaries lagging behind are less favoured by the MongoDB driver when scaling out reads with MongoDB, it also adds a higher workload on the remaining secondaries.

If we would have a secondary node lagging behind a few minutes (or hours), it would be useful to have an advisor that informs us how much time we have left before our next transaction will be dropped from the oplog. The time difference between the first and last entry in the oplog is called the Replication Window. This metric can be created by fetching the first and last items from the oplog, and calculating the difference of their timestamps.

Calculating the MongoDB Replication Window

mongo_replica_2:PRIMARY> db.getReplicationInfo()
{
           "logSizeMB" : 1894.7306632995605,
           "usedMB" : 501.17,
           "timeDiff" : 91223,
           "timeDiffHours" : 25.34,
           "tFirst" : "Wed Oct 12 2016 22:48:59 GMT+0000 (UTC)",
           "tLast" : "Fri Oct 14 2016 00:09:22 GMT+0000 (UTC)",
           "now" : "Fri Oct 14 2016 12:32:51 GMT+0000 (UTC)"
}

As you can see, this function has a rich output of useful information: the size of the oplog and how much of that has been used already. It also displays the time difference of the first and last items in the oplog.

It is easy to replicate this function by retrieving the first and last items from the oplog. Making use of the MongoDB aggregate function in a one single query is tempting, however the oplog does not have any indexes set on any of the fields. Running an aggregate function on a collection without indexes would require a full collection scan, which would become very slow in an oplog that has a couple of million entries.

Instead we are going to send two individual queries: fetch the first record of the oplog in forward and reverse order. As the oplog already is a sorted collection, we can naturally sort on the reverse of the collection cheaply.

mongo_replica_2:PRIMARY> use local
switched to db local
mongo_replica_2:PRIMARY> db.oplog.rs.find().limit(1);
{ "ts" : Timestamp(1476312539, 1), "h" : NumberLong("-3302015507277893447"), "v" : 2, "op" : "n", "ns" : "", "o" : { "msg" : "initiating set" } }
mongo_replica_2:PRIMARY> db.oplog.rs.find().sort({$natural: -1}).limit(1);
{ "ts" : Timestamp(1476403762, 1), "h" : NumberLong("3526317830277016106"), "v" : 2, "op" : "n",  "ns" : "ycsb.usertable", "o" : { "_id" : "user5864876345352853020",
…
}

The overhead of both queries is very low and will not interfere with the functioning of the oplog.

In the example above, the replication window would be 91223 seconds (the difference of 1476403762 and 1476312539).

Intuitively you may think it only makes sense to do this calculation on the primary node, as this is the source for all write operations. However, MongoDB is a bit smarter than just serving out the oplog to all secondaries. Even though the secondary nodes will copy entries of the oplog from the primary, for joining members it will offload the delta of transactions loading via secondaries if possible. Also secondary nodes may prefer to fetch oplog entries from other secondaries with low latency, rather than fetching them from a primary with high latency. So it would be better to perform this calculation on all nodes in the cluster.

As the replication window will be calculated per node and we like to keep our advisor as readable as possible, and we will abstract the calculation into a function:

function getReplicationWindow(host) {
  var replwindow = {};
  // Fetch the first and last record from the Oplog and take it's timestamp
  var res = host.executeMongoQuery("local", '{find: "oplog.rs", sort: { $natural: 1}, limit: 1}');
  replwindow['first'] = res["result"]["cursor"]["firstBatch"][0]["ts"]["$timestamp"]["t"];
  res = host.executeMongoQuery("local", '{find: "oplog.rs", sort: { $natural: -1}, limit: 1}');
  replwindow['last'] = res["result"]["cursor"]["firstBatch"][0]["ts"]["$timestamp"]["t"];
  replwindow['replwindow'] = replwindow['first'] - replwindow['last'];
  return replwindow;
}

The function returns the timestamp of the first and last items in the oplog and the replication window as well. We will actually need all three in a later phase.

Calculating the MongoDB Replication lag

We covered the calculation of MongoDB replication lag in the previous Developer Studio blog post. You can calculate the lag by simply subtracting the secondary optimeDate (or optime timestamp) from the primary optimeDate. This will give you the replication lag in seconds.

The information necessary for the replication lag will be retrieved from only the primary node, using the replSetGetStatus command. Similarly to the replication window, we will abstract this into a function:

function getReplicationStatus(host, primary_id) {
    var node_status = {};
    var res = host.executeMongoQuery("admin", "{ replSetGetStatus: 1 }");
    // Fetch the optime and uptime per host
    for(i = 0; i < res["result"]["members"].size(); i++)
    {
        tmp = res["result"]["members"][i];
        host_id = tmp["name"];
        node_status[host_id] = {};
        node_status[host_id]["name"] = host_id;
        node_status[host_id]["primary"] = primary_id;
        node_status[host_id]["setname"] = res["result"]["set"];
        node_status[host_id]["uptime"] = tmp["uptime"];
        node_status[host_id]["optime"] = tmp["optime"]["ts"]["$timestamp"]["t"];
    }
    return node_status;
}

We keep a little bit more information than necessary here, but you will see why in the next paragraphs.

Calculating the time left per node

Now that we have calculated both the replication window and the replication lag per node, we can calculate the time left per node where it can theoretically still catch up with the primary. So here we subtract the timestamp of the first entry in the oplog (of the primary) from timestamp of the last executed transaction (optime) of the node.

// Calculate the replication window of the primary against the node's last transaction
replwindow_node = replstatus[host]['optime'] - replwindow_primary['first'];

But we are not done yet!

Adding another layer of complexity: sharding

In ClusterControl we see everything as a cluster. Whether you would run a single MongoDB server, a replicaSet or a sharded environment with config servers and shard replicaSets: they are all identified and administered as a single cluster. This means our advisor has to cope with both replicaSet and shard logic.

In the example code above, where we calculated the time left per node, our cluster had a single primary. In the case of a sharded cluster, we have to take into account that we have one replicaSet for the config server and one for the each shard. This means we have to store per node it’s primary node and use that one in the calculation.

The corrected line of code would be:

host_id = host.hostName() + ":" + host.port();
primary_id = replstatus_per_node[host_id]['primary'];
...
// Calculate the replication window of the primary against the node's last transaction
replwindow_node = replstatus['optime'] - replwindow_per_node[primary_id]['first'];

For readability we also need to include the replicaSet name in the messages we output via our advisor. If we would not do this, it would become quite hard to distinguish hosts, replicaSets and shards on large sharded environments.

msg = "The replication window for node " + host_id + " (" + replstatus["setname"] + ") is long enough.";

Take the uptime into account

Another possible impediment with our advisor is that it will start warning us immediately on a freshly deployed cluster:

The first and last entry in the oplog will be within seconds of each other after initiating the replicaSet.
Also on a newly created replicaSet, the probability of it being immediately used would be very low, so there is not much use in alerting on a (too) short replication window in this case either.
At the same time a newly created replicaSet may also receive a huge write workload when a logical backup is restored, shortening all entries in the oplog to a very short timeframe. This especially becomes an issue if after this burst of writes, no more writes happen for a long time, as now the replication window becomes very short but also outdated.

There are three possible solutions for these problems to make our advisor more reliable:

We parse the first item in the oplog. If it contains the replicaset initiating document we will ignore a too short replication window. This can easily be done alongside parsing the first record in the oplog.
We also take the uptime into consideration. If the host’s uptime is shorter than our warning threshold, we will ignore a too short replication window.
If the replication window is too short, but the newest item in the oplog is older than our warning threshold, this should be considered a false positive.

To parse the first item in the oplog we have to add these lines, where we identify it as a new set:

var res = host.executeMongoQuery("local", '{find: "oplog.rs", sort: { $natural: 1}, limit: 1}');
replwindow['first'] = res["result"]["cursor"]["firstBatch"][0]["ts"]["$timestamp"]["t"];

if (res["result"]["cursor"]["firstBatch"][0]["o"]["msg"] == "initiating set") {
    replwindow['newset'] = true;
}

Then later in the check for our replication window we can verify all three exceptions together:

if(replwindow['newset'] == true) {
    msg = "Host " + host_id + " (" + replstatus["setname"] + ") is a new replicaSet. Not enough entries in the oplog to determine the replication window.";
    advice.setSeverity(Ok);
    advice.setJustification("");
} else if (replstatus["uptime"] < WARNING_REPL_WINDOW) {
    msg = "Host " + host_id + " (" + replstatus["setname"] + ") only has an uptime of " + replstatus["uptime"] + " seconds. Too early to determine the replication window.";
    advice.setSeverity(Ok);
    advice.setJustification("");
}
else if (replwindow['max'] < (CmonDateTime::currentDateTime().toString("%s") - WARNING_REPL_WINDOW)) {
    msg = "Latest entry in the oplog for host " + host_id + " (" + replstatus["setname"] + ") is older than " + WARNING_REPL_WINDOW + " seconds. Determining the replication window would be unreliable.";
    advice.setSeverity(Ok);
    advice.setJustification("");
}

After this we can finally warn / advise on the replication window of the node. The output would look similar to this:

Conclusion

That’s it! We have created an advisor that gathers information from various hosts / replicaSets and combines this to get an exact view of the replication status of the entire cluster. We have also shown how to make things more readable by abstracting code into functions. And the most important aspect is that we have shown how to also think bigger than just the standard replicaSet.

For completeness, here is the full advisor:

#include "common/helpers.js"
#include "cmon/io.h"
#include "cmon/alarms.h"

// It is advised to have a replication window of at least 24 hours, critical is 1 hour
var WARNING_REPL_WINDOW = 24*60*60;
var CRITICAL_REPL_WINDOW = 60*60;
var TITLE="Replication window";
var ADVICE_WARNING="Replication window too short. ";
var ADVICE_CRITICAL="Replication window too short for one hour of downtime / maintenance. ";
var ADVICE_OK="The replication window is long enough.";
var JUSTIFICATION_WARNING="It is advised to have a MongoDB replication window of at least 24 hours. You could try to increase the oplog size. See also: https://docs.mongodb.com/manual/tutorial/change-oplog-size/";
var JUSTIFICATION_CRITICAL=JUSTIFICATION_WARNING;


function main(hostAndPort) {

    if (hostAndPort == #N/A)
        hostAndPort = "*";

    var hosts   = cluster::mongoNodes();
    var advisorMap = {};
    var result= [];
    var k = 0;
    var advice = new CmonAdvice();
    var msg = "";
    var replwindow_per_node = {};
    var replstatus_per_node = {};
    var replwindow = {};
    var replstatus = {};
    var replwindow_node = 0;
    var host_id = "";
    for (i = 0; i < hosts.size(); i++)
    {
        // Find the primary and execute the queries there
        host = hosts[i];
        host_id = host.hostName() + ":" + host.port();

        if (host.role() == "shardsvr" || host.role() == "configsvr") {
            // Get the replication window of each nodes in the cluster, and store it for later use
            replwindow_per_node[host_id] = getReplicationWindow(host);

            // Only retrieve the replication status from the master
            res = host.executeMongoQuery("admin", "{isMaster: 1}");
            if (res["result"]["ismaster"] == true) {
                //Store the result temporary and then merge with the replication status per node
                var tmp = getReplicationStatus(host, host_id);
                for(o=0; o < tmp.size(); o++) {
                    replstatus_per_node[tmp[o]['name']] = tmp[o];
                }

                //replstatus_per_node = 
            }
        }
    }

    for (i = 0; i < hosts.size(); i++)
    {
        host = hosts[i];
        if (host.role() == "shardsvr" || host.role() == "configsvr") {
            msg = ADVICE_OK;

            host_id = host.hostName() + ":" + host.port();
            primary_id = replstatus_per_node[host_id]['primary'];
            replwindow = replwindow_per_node[host_id];
            replstatus = replstatus_per_node[host_id];
    
            // Calculate the replication window of the primary against the node's last transaction
            replwindow_node = replstatus['optime'] - replwindow_per_node[primary_id]['first'];
            // First check uptime. If the node is up less than our replication window it is probably no use warning
            if(replwindow['newset'] == true) {
              msg = "Host " + host_id + " (" + replstatus["setname"] + ") is a new replicaSet. Not enough entries in the oplog to determine the replication window.";
              advice.setSeverity(Ok);
              advice.setJustification("");
            } else if (replstatus["uptime"] < WARNING_REPL_WINDOW) {
                msg = "Host " + host_id + " (" + replstatus["setname"] + ") only has an uptime of " + replstatus["uptime"] + " seconds. Too early to determine the replication window.";
                advice.setSeverity(Ok);
                advice.setJustification("");
            }
            else if (replwindow['max'] < (CmonDateTime::currentDateTime().toString("%s") - WARNING_REPL_WINDOW)) {
              msg = "Latest entry in the oplog for host " + host_id + " (" + replstatus["setname"] + ") is older than " + WARNING_REPL_WINDOW + " seconds. Determining the replication window would be unreliable.";
              advice.setSeverity(Ok);
              advice.setJustification("");
            }
            else {
                // Check if any of the hosts is within the oplog window
                if(replwindow_node < CRITICAL_REPL_WINDOW) {
                    advice.setSeverity(Critical);
                    msg = ADVICE_CRITICAL + "Host " + host_id + " (" + replstatus["setname"] + ") has a replication window of " + replwindow_node + " seconds.";
                    advice.setJustification(JUSTIFICATION_CRITICAL);
                } else {
                    if(replwindow_node < WARNING_REPL_WINDOW)
                    {
                        advice.setSeverity(Warning);
                        msg = ADVICE_WARNING + "Host " + host_id + " (" + replstatus["setname"] + ") has a replication window of " + replwindow_node + " seconds.";
                        advice.setJustification(JUSTIFICATION_WARNING);
                    } else {
                        msg = "The replication window for node " + host_id + " (" + replstatus["setname"] + ") is long enough.";
                        advice.setSeverity(Ok);
                        advice.setJustification("");
                    }
                }
            }
    
            advice.setHost(host);
            advice.setTitle(TITLE);
            advice.setAdvice(msg);
            advisorMap[i]= advice;
        }
    }
    return advisorMap;
}

function getReplicationStatus(host, primary_id) {
    var node_status = {};
    var res = host.executeMongoQuery("admin", "{ replSetGetStatus: 1 }");
    // Fetch the optime and uptime per host
    for(i = 0; i < res["result"]["members"].size(); i++)
    {
        tmp = res["result"]["members"][i];
        node_status[i] = {};
        node_status[i]["name"] = tmp["name"];;
        node_status[i]["primary"] = primary_id;
        node_status[i]["setname"] = res["result"]["set"];
        node_status[i]["uptime"] = tmp["uptime"];
        node_status[i]["optime"] = tmp["optime"]["ts"]["$timestamp"]["t"];
    }
    return node_status;
}

function getReplicationWindow(host) {
  var replwindow = {};
  replwindow['newset'] = false;
  // Fetch the first and last record from the Oplog and take it's timestamp
  var res = host.executeMongoQuery("local", '{find: "oplog.rs", sort: { $natural: 1}, limit: 1}');
  replwindow['first'] = res["result"]["cursor"]["firstBatch"][0]["ts"]["$timestamp"]["t"];
  if (res["result"]["cursor"]["firstBatch"][0]["o"]["msg"] == "initiating set") {
      replwindow['newset'] = true;
  }
  res = host.executeMongoQuery("local", '{find: "oplog.rs", sort: { $natural: -1}, limit: 1}');
  replwindow['last'] = res["result"]["cursor"]["firstBatch"][0]["ts"]["$timestamp"]["t"];
  replwindow['replwindow'] = replwindow['last'] - replwindow['first'];
  return replwindow;
}

Tags:

Join us for our third ‘How to become a MongoDB DBA’ webinar on Tuesday, November 15th! In this webinar we will uncover the secrets and caveats of MongoDB scaling and sharding.

Become a MongoDB DBA - Scaling and Sharding

MongoDB offers read and write scaling out of the box: adding secondary nodes will increase your potential read capacity, while adding shards will increase your potential write capacity. However, adding a new shard doesn’t necessarily mean it will be used. Choosing the wrong shard key may also cause uneven data distribution.

There is more to scaling than just simply adding nodes and shards. Factors to take into account include indexing, shard re-balancing,replication lag, capacity planning and consistency.

Learn with this webinar how to plan your scaling strategy up front and how to prevent ending up with unusable secondary nodes and shards. Finally, we’ll show you how to leverage ClusterControl’s MongoDB scaling capabilities and have ClusterControl manage your shards.

Date, Time & Registration

Europe/MEA/APAC

Tuesday, November 15th at 09:00 GMT / 10:00 CET (Germany, France, Sweden)

North America/LatAm

Tuesday, November 15th at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)

Agenda

What are the differences in read and write scaling with MongoDB
Read scaling considerations with MongoDB
MongoDB read preference explained
How sharding works in MongoDB
Adding new shards and balance data
How to scale and shard MongoDB using ClusterControl
Live Demo

Speaker

Art van Scheppingen is a Senior Support Engineer at Severalnines. He’s a pragmatic database expert with over 16 years experience in web development. He previously worked at Spil Games as Head of Database Engineering, where he kept a broad vision upon the whole database environment: from MySQL to MongoDB, Vertica to Hadoop and from Sphinx Search to SOLR. He regularly presents his work and projects at various conferences (Percona Live, MongoDB Open House, FOSDEM) and related meetups.

We look forward to “seeing” you there!

This session is based upon the experience we have using MongoDB and implementing it for our database infrastructure management solution, ClusterControl. For more details, read through our ‘Become a MongoDB DBA’ blog series.

Tags:

So far in the “Become a MongoDB DBA” series, we covered Deployment, Configuration, Monitoring (part 1), Monitoring (part 2), backup, restore and read scaling.

In our latest post we showed how to offload read requests from the primary to secondaries. But what if your workload consists mostly out of write requests? Vertical scaling would only be a temporary solution, and with growth larger than Moore’s law, an alternative has to be found.

Read scaling is basically offloading requests to many secondary nodes in the same replicaSet, so write scaling should naturally be offloading requests to many primary nodes. MongoDB does not support multiple primary nodes in one replicaSet. However if we would shard our data, we could spread our data among multiple replicaSets. Each of these replicaSets would then handle a slice of the workload.

MongoDB supports sharding out of the box and it is relatively easy to set up, though there are a couple of considerations you need to take before sharding your data.

With that in mind, we’re starting a three part miniseries about MongoDB and sharding.

In this first part, we will cover the basics of sharding with MongoDB by setting up a sharded environment. If you wish to know more about sharding in general, please read our whitepaper on sharding with MySQL Fabric.

MongoDB sharding primer

The MongoDB sharding solution is similar to existing sharding frameworks for other major database solutions. It makes use of a typical lookup solution, where the sharding is defined in a shard-key and the ranges are stored inside a configuration database. MongoDB works with three components to find the correct shard for your data.

A typical sharded MongoDB environment looks like this:

Sharding tier

The first component used is the shard router called mongos. All read and write operations must be sent to the shard router, making all shards act as a single database for the client application. The shard router will route the queries to the appropriate shards by consulting the Configserver.

The Configserver is a special replicaSet that keeps the configuration of all shards in the cluster. The Configserver contains information about shards, databases, collections, shard keys and the distribution of chunks of data. Data gets partitioned by slicing the total dataset into smaller chunks of data, where these chunks are defined by the shard key. The shard key can be either a range or hash defined. These chunks are then distributed evenly over the total number of shards.

The router will know on which shard to place the data by finding the correct chunk in the Configserver. If the router thinks the chunk is becoming too large, it will automatically create a new chunk in the Configserver. The sharding metadata is stored in the config database, and this database is accessible via the shard router as well.

Prior to MongoDB 3.2 the Configserver used to be a total of three individual MongoDB nodes that were used to write the sharding metadata. In this setup the metadata is written and read thrice, and differences in data between nodes means inconsistent writes happened and will require manual intervention. If this happened, the balancer would no longer perform shard migrations and the shard router was no longer able to create new chunks.

Shard tier

Each replicaSet in a MongoDB sharded cluster is treated as an individual shard. Adding a shard will increase the write capacity, but also increase the sharding complexity. Each shard is an individual component in the cluster and there is no direct communication between them. Shards don’t know anything about other shards in the cluster.

MongoDB distributes its data evenly by balancing the total number of chunks on each shard. If the number of chunks is not spread evenly, a balancing process can be run to migrate chunks from one shard to another.

This balancing process typically gets started from a shard router (mongos), that thinks the data is unbalanced. The shard router will acquire and set a lock on the balancing process in the config database on the Configserver

Setting up a simple sharded cluster

We will give an example of how to setup a simple sharded cluster, on a single machine. In our test example we will only use the bare minimum of a single node per replicaSet and a single node for the configserver. In production you should always use at least two nodes per replicaSet (and an arbiter) and three nodes for the Configserver.

First we create the paths for each instance to run:

mkdir /var/lib/mongodb/cfg /var/lib/mongodb/sh1 /var/lib/mongodb/sh2 /var/lib/mongodb/sh3

The first component to start is Configserver:

mongod --fork --dbpath /var/lib/mongodb/cfg --logpath cfg.log --replSet cfg --configsvr

Once it has started up properly, we need to initialize the Configserver, as it is in reality a replicaSet:

mongo --eval 'rs.initiate({"_id":"cfg", "members":[{"_id":1, "host":"127.0.0.1:27019"}]});' --port 27019

Now we can start the shard router (mongos) and connect to the Configserver:

mongos --fork --configdb cfg/127.0.0.1:27019 --logpath mongos.log

As you may notice in the lines above, we are using a different port number for connecting to the Configserver. In a sharded cluster, by default, the Configserver binds to the 27019 port and the shard router to 27017. It is also considered bad practice to configure the shard nodes to bind to the default 27017 MongoDB port. This is done deliberately to prevent confusion around the port to connect to and then the default port will only apply to the shard router.

Next we start up three individual replicaSets that will become shards later on:

mongod --fork --dbpath /var/lib/mongodb/sh1 --logpath sh1.log --port 27001 --replSet sh1
mongod --fork --dbpath /var/lib/mongodb/sh2 --logpath sh2.log --port 27002 --replSet sh2
mongod --fork --dbpath /var/lib/mongodb/sh3 --logpath sh3.log --port 27003 --replSet sh3

And, just like the configserver, we need to initialize these replicaSets as well:

mongo --eval 'rs.initiate({"_id":"sh1", "members":[{"_id":1, "host":"127.0.0.1:27001"}]});' --port 27001
mongo --eval 'rs.initiate({"_id":"sh2", "members":[{"_id":1, "host":"127.0.0.1:27002"}]});' --port 27002
mongo --eval 'rs.initiate({"_id":"sh3", "members":[{"_id":1, "host":"127.0.0.1:27003"}]});' --port 27003

Now we can add these replicaSets as shards via mongos:

mongo --eval 'sh.addShard("sh1/127.0.0.1:27001");'
mongo --eval 'sh.addShard("sh2/127.0.0.1:27002");'
mongo --eval 'sh.addShard("sh3/127.0.0.1:27003");'

In the addShard command you have to specify the full MongoDB connect string to add the shard. In our case we only have one host per replicaSet, but had we had a real production shard, it could have looked something like this:

mongo --eval 'sh.addShard("sh1/node1.domain.com:27018,node2.domain.com:27018,node3.domain.com:27018");'

Even though we now have a fully functioning sharded cluster, writing any data via the shard routers will only store the data on the shard with the least amount of data. This is because we have not enabled the database for sharding and defined a shard key for the collection we like to shard.

Setting up a sharded cluster using ClusterControl

We have just demonstrated how to set up a very simple test cluster, and it would be quitecomplex to show how to set up a full production cluster. So instead and for greater ease,we will show how to set up a MongoDB sharded cluster using ClusterControl with its four step wizard.

The first step is to allow ClusterControl to ssh to the hosts that we are going to deploy upon.

The second step is the definition of shard routers and configservers:

Third step is defining the shards:

And in the last step we define the database specific settings, such as which version of MongoDB we will deploy:

The total time taken to fill in this wizard should be around 1 minute, and after pressing the Deploy button, ClusterControl will automatically create a newly sharded cluster.

This is a quick and easy way to get started with a MongoDB sharded cluster.

Shard key

As described in the MongoDB sharding primer paragraph, the shard key is one of the most important factors in the sharding of MongoDB. By defining the shard key, you also influence the effectiveness of the shards. The shard key is either an indexed field or part of an indexed compound field that is present in every document in your collection.

Based upon this field, the shard key defines a range of shard key values that get associated with a chunk. As we described earlier, the chunks are distributed evenly over the shards in the cluster, so this directly influences the effectiveness of the sharding.

An example of a shard key and its distribution can be seen below:

mongos> sh.status()
--- Sharding Status ---
… 
databases:
{  "_id" : "shardtest",  "primary" : "sh1",  "partitioned" : true }
    shardtest.collection
        shard key: { "_id" : 1 }
        unique: false
        balancing: true
        chunks:
            sh1    1
            sh2    2
            sh3    1
        { "_id" : { "$minKey" : 1 } } -->> { "_id" : 2 } on : sh3 Timestamp(6, 1)
        { "_id" : 2 } -->> { "_id" : 24 } on : sh1 Timestamp(5, 1)
        { "_id" : 24 } -->> { "_id" : 1516 } on : sh2 Timestamp(4, 1)
        { "_id" : 1516 } -->> { "_id" : { "$maxKey" : 1 } } on : sh2 Timestamp(6, 4)

As you can see, we defined a shard key on the identifier of the document and distributed this over three shards in total. The overview of the chunks per shard seems to be quite balanced and the ranges of the identifiers show on which shard the data resides.

Limitations of the shard key

Keep in mind that once you start sharding a collection, you can no longer change the shard key and update the values of the shard key fields.

Also, unique indexes are limited within sharding: only the _id index and the index (or compound index) on the shard key can be unique. This means you can’t shard a collection that doesn’t meet these requirements, nor place unique indexes on other fields after sharding the collection.

Influence of the shard key

As we mentioned earlier, the shard key influences the performance of the sharding. This means you can optimize your shard keys for read and/or writing data. The most determining factors for this are the cardinality, frequency and rate of change of your shard key. We will illustrate this with some examples.

Sequential writes

Assume you have an application that writes a lot of sequential data only once, for instance a click-tracking application, with the following document:

{ 
    "_id" : ObjectId("5813bbaae9e36caf3e4eab5a"), 
    "uuid" : "098f253e-6310-45a0-a6c3-940be6ed8eb4", 
    "clicks" : 4, 
    "pageurl" : "/blog/category/how-to", 
    "ts" : Timestamp(1477688235, 1) 
}

You can distribute your data in many ways with this document. Using the timestamp of the document would mean the data gets sharded on time-intervals. So your data gets routed sequentially to one shard until the chunk is full, then the router will point to the next shard until that chunk is full. As you can already conclude: this will not scale your writes. Only one single shard is performing all write operations, while the other shards are doing next to nothing in terms of write operations. This is illustrated in the picture below.

If scaling your write capacity is your concern, you could choose the UUID field as your shard key. This means data for each unique person will be sharded, so ranges of UUIDs will be created for each chunk of data. Data will naturally be written in a more distributed manner among shards than with the timestamp shard-key. There could still be unevenness if the UUIDs are not generated in a random way.

As the shard key on the UUID field scales your write operations a lot better, it may not be the desired shard key for analyzing your full dataset within a specific time range. As the shard router is unaware of the secondary index on the timestamp, each and every shard will then be consulted to retrieve the required data. Each and every shard will return the data that they have for this query, and the shard router will combine this into a single record set that can be returned to the client. Also the UUID field may suffer from a large cardinality of documents for some of the UUIDs, so data still gets distributed unevenly.

An alternative would be using the MongoDB hashed sharding strategy. A hash function will be applied on the values of a shard key, where the hash function will make the distribution among shards pseudo-random. This means two near values of the shard key are unlikely to end up in the same chunk.

Consulting one single shard, and returning a sequential part of the chunk is always more desirable than receiving data back from many shards. Also combining many result sets on the shard router into a single result set is always slower than returning the result set directly from a single shard.

Random writes

If your application writes data in a more uniform way, for instance a user on a social network, this would mean you would get both inserts and updates on your data. Suppose we have the following document:

{
    "_id" : ObjectId("5813c5a9e9e36caf3e4eab5b"),
    "uuid" : "098f253e-6310-45a0-a6c3-940be6ed8eb4",
    "firstname" : "John",
    "lastname" : "Doe",
    "email" : "john.doe@gmail.com",
    "registration_time" : Timestamp(1477690793, 1)
}

Even though the write operations of new users (inserts) are happening sequentially, users who freshen up their profile with new details will also cause writes. This means the writing of data can be a bit more evenly distributed than the sequential writes example we gave earlier. We can anticipate this and choose our shard key accordingly. Choosing the timestamp field would not make sense in this case, as updates on a user document would require a write operation to all shards to find the correct record. A better candidate would be the UUID generated for the user. This would distribute the users evenly over the shards. If your UUID is generated randomly, the inserts of new users will also be distributed evenly.

As long as you use the UUID to access the data, reading back user data is also very efficient. But similar to the click-tracking example, selecting a range of users that registered in a certain timeframe would require consulting each and every shard again. This can’t be overcome easily.

The shard key paradigm

As you can see, the shard key influences the way you read and write your data. Choosing it wrongly could make data retrieval and updating data very slow. That’s why it is so important to choose the right shard key up front.

To ease the pain, you could also store certain data multiple times in different collections, and shard your data in various ways. For instance: the ranges of users that registered in a certain timeframe could be stored in a secondary collection. This collection only contains the references to the documents in the other collection including the timestamp. This collection will then be sharded on the timestamp field. The downside is naturally that there is a double administration this way.

Enabling sharding on our cluster

Now that we have covered the theory behind the shard keys, we can enable sharding on the database in our example cluster:

mongo --eval 'sh.enableSharding("shardtest");'

Without enabling sharding on this database, we will not be able to shard the collections inside it. Once enabled, any collection within that database will end up on the primary shard. The primary shard is the shard with the least amount of data upon enabling the database for sharding. You can change the primary shard assignment afterwards with the movePrimary command.

We need to configure the shard key separately by sharding the collection:

mongo --eval 'sh.shardCollection("shardtest.collection", {"_id":});'

Our shard key is set on the identifier field of the collection. Now if we would insert many large documents of data, this would give us a nicely sharded collection:

mongo shardtest --eval 'data="a";for (i=1; i < 10000; i++) { data=data+"a"; db.collection.insert({"_id": i, "data": data}); }; '

And this is what the shard distribution looks like after inserting all 10000 documents:

mongos> sh.status()
--- Sharding Status ---
...
  databases:
    {  "_id" : "shardtest",  "primary" : "sh1",  "partitioned" : true }
        shardtest.collection
            shard key: { "_id" : 1 }
            unique: false
            balancing: true
            chunks:
                sh1    4
                sh2    4
                sh3    3
            { "_id" : { "$minKey" : 1 } } -->> { "_id" : 2 } on : sh1 Timestamp(5, 1)
            { "_id" : 2 } -->> { "_id" : 3 } on : sh1 Timestamp(1, 2)
            { "_id" : 3 } -->> { "_id" : 839 } on : sh2 Timestamp(6, 1)
            { "_id" : 839 } -->> { "_id" : 1816 } on : sh2 Timestamp(2, 3)
            { "_id" : 1816 } -->> { "_id" : 2652 } on : sh3 Timestamp(4, 1)
            { "_id" : 2652 } -->> { "_id" : 3629 } on : sh3 Timestamp(3, 3)
            { "_id" : 3629 } -->> { "_id" : 4465 } on : sh1 Timestamp(4, 2)
            { "_id" : 4465 } -->> { "_id" : 5442 } on : sh1 Timestamp(4, 3)
            { "_id" : 5442 } -->> { "_id" : 6278 } on : sh2 Timestamp(5, 2)
            { "_id" : 6278 } -->> { "_id" : 7255 } on : sh2 Timestamp(5, 3)
            { "_id" : 7255 } -->> { "_id" : { "$maxKey" : 1 } } on : sh3 Timestamp(6, 0)

You can see our collection consists of several ascending series, each on a single shard. This is due to our linear iteration of inserting documents, while the inserts only happened on a single shard for the duration of the entire range.

As we described in the theory earlier, we can enable a hashing algorithm in the shard key definition to prevent this:

mongo --eval 'sh.shardCollection("shardtest.collection", {"_id":"hashed"});'

If we now re-insert the same data, and look at the sharding distribution, we see it created very different ranges of values for the shard key:

mongos> sh.status()
--- Sharding Status ---
… 
  databases:
    {  "_id" : "shardtest",  "primary" : "sh1",  "partitioned" : true }
        shardtest.collection
            shard key: { "_id" : "hashed" }
            unique: false
            balancing: true
            chunks:
                sh1    3
                sh2    4
                sh3    3
            { "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-6148914691236517204") } on : sh3 Timestamp(4, 0)
            { "_id" : NumberLong("-6148914691236517204") } -->> { "_id" : NumberLong("-4643409133248828314") } on : sh1 Timestamp(4, 1)
            { "_id" : NumberLong("-4643409133248828314") } -->> { "_id" : NumberLong("-3078933977000923388") } on : sh1 Timestamp(3, 12)
            { "_id" : NumberLong("-3078933977000923388") } -->> { "_id" : NumberLong("-3074457345618258602") } on : sh1 Timestamp(3, 13)
            { "_id" : NumberLong("-3074457345618258602") } -->> { "_id" : NumberLong(0) } on : sh2 Timestamp(3, 4)
            { "_id" : NumberLong(0) } -->> { "_id" : NumberLong("1545352804953253256") } on : sh2 Timestamp(3, 8)
            { "_id" : NumberLong("1545352804953253256") } -->> { "_id" : NumberLong("3067091117957092580") } on : sh2 Timestamp(3, 9)
            { "_id" : NumberLong("3067091117957092580") } -->> { "_id" : NumberLong("3074457345618258602") } on : sh2 Timestamp(3, 10)
            { "_id" : NumberLong("3074457345618258602") } -->> { "_id" : NumberLong("6148914691236517204") } on : sh3 Timestamp(3, 6)
            { "_id" : NumberLong("6148914691236517204") } -->> { "_id" : { "$maxKey" : 1 } } on : sh3 Timestamp(3, 7)

As shown in the status above, the shards have the same number of chunks assigned to them. As the hashing algorithm ensures the hashed values are nonlinear, our linear inserts have now been inserted across all shards. In our test case, inserts were performed by a single thread; but in a production environment with high concurrency, the difference in write performance has improved greatly.

Conclusion

In this first part we have shown how MongoDB sharding works, and how to set up your own sharded environment. Making the choice for a shard key should not be taken lightly: it determines the effectiveness of your sharded cluster and can’t be altered afterwards.

In the next part of the sharding series we will focus on what you need to know about monitoring and maintaining shards.

Tags:

We’re live next Tuesday, November 15th, with our webinar ‘Become a MongoDB DBA - Scaling and Sharding’!

Join us and learn about the three components necessary for MongoDB sharding. We’ll also share a read scaling considerations checklist as well as tips & tricks for finding the right shard key for MongoDB.

Overall, we’ll discuss how to plan your MongoDB scaling strategy up front and how to prevent ending up with unusable secondary nodes and shards. And we’ll look at how to leverage ClusterControl’s MongoDB scaling and shards management capabilities.

Date, Time & Registration

Europe/MEA/APAC

Tuesday, November 15th at 09:00 GMT / 10:00 CET (Germany, France, Sweden)
Register Now

North America/LatAm

Tuesday, November 15th at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)
Register Now

Agenda

What are the differences in read and write scaling with MongoDB
Read scaling considerations with MongoDB
MongoDB read preference explained
How sharding works in MongoDB
Adding new shards and balance data
How to scale and shard MongoDB using ClusterControl
Live Demo

Speaker

We look forward to “seeing” you there!

Tags:

In previous posts of our “Become a MongoDB DBA” series, we covered Deployment, Configuration, Monitoring (part 1), Monitoring (part 2), backup, restore, read scaling and sharding (part 1).

In the previous post we did a primer on sharding with MongoDB. We covered not only how to enable sharding on a database, and define the shard key on a collection, but also explained the theory behind it.

Once enabled on a database and collection, the data stored will keep growing and more and more chunks will be in use. Just like any database requires management, also shards need to be looked after. Some of the monitoring and management aspects of sharding, like backups, are different than with ordinary MongoDB replicaSets. Also operations may lead to scaling or rebalancing the cluster. In this second part we will focus on the monitoring and management aspects of sharding.

Monitoring shards

The most important aspect of sharding, is monitoring its performance. As the write throughput of a sharded cluster is much higher than before, you might encounter other scaling issues. So it is key to find your next bottleneck.

Connections

The most obvious one would be the number of connections going to each primary in the shard. Any range query not using the shard key, will result in multiplication of queries going to every shard. If these queries are not covered by any (usable) index, you might see a large increase in connections going from the shard router to the primary of each shard. Luckily a connection pool is used between the shard router and the primary, so unused connections will be reused.

You can keep an eye on the connection pool via the connPoolStats command:

mongos> db.runCommand( { "connPoolStats" : 1 } )
{
    "numClientConnections" : 10,
    "numAScopedConnections" : 0,
    "totalInUse" : 4,
    "totalAvailable" : 8,
    "totalCreated" : 23,
    "hosts" : {
        "10.10.34.11:27019" : {
            "inUse" : 1,
            "available" : 1,
            "created" : 1
        },
        "10.10.34.12:27018" : {
            "inUse" : 3,
            "available" : 1,
            "created" : 2
        },
        "10.10.34.15:27018" : {
            "inUse" : 0,
            "available" : 1,
            "created" : 1
        }
    },
    "replicaSets" : {
        "sh1" : {
            "hosts" : [
                {
                    "addr" : "10.10.34.12:27002",
                    "ok" : true,
                    "ismaster" : true,
                    "hidden" : false,
                    "secondary" : false,
                    "pingTimeMillis" : 0
                }
            ]
        },
...


    "ok" : 1
}

The output of this command will give you both a combined stats and a per-host stats, and additionally per replicaSet in the cluster which host is the primary. Unfortunately this means you will have to figure out yourself which host is part of which component.

Capacity planning

Another important set of metrics to watch is the total number of chunks, the chunks per node and the available diskspace on your shards. Combined together, this should give a fairly good indication how soon it is time to scale out with another shard.

You can fetch the chunks per shard from the shard status command:

mongos> sh.status()
--- Sharding Status ---
… 
databases:
{  "_id" : "shardtest",  "primary" : "sh1",  "partitioned" : true }
    shardtest.collection
        shard key: { "_id" : 1 }
        unique: false
        balancing: true
        chunks:
            sh1    1
            sh2    2
            sh3    1

Caution has to be taken here: the shard status command does not contain valid JSON output. It can rather be explained as inconsistent formatted readable text. Alternatively you can fetch the very same information from the config database on the shard router. (it actually resides on the Configserver replicaSet)

mongos> use config
switched to db config
mongos> db.config.runCommand({aggregate: "chunks", pipeline: [{$group: {"_id": {"ns": "$ns", "shard": "$shard"}, "total_chunks": {$sum: 1}}}]})
{ "_id" : { "ns" : "test.usertable", "shard" : "mongo_replica_1" }, "total_chunks" : 330 }
{ "_id" : { "ns" : "test.usertable", "shard" : "mongo_replica_0" }, "total_chunks" : 328 }
{ "_id" : { "ns" : "test.usertable", "shard" : "mongo_replica_2" }, "total_chunks" : 335 }

This aggregate query is covered by a default index placed on the chunks collection, so this will not pose a big risk in querying it occasionally.

Non-sharded databases and collections

As we described in the previous post, non-sharded databases and collections will be assigned to a default primary shard. This means the database or collection is limited to the size of this primary shard, and if written to in large volumes, could use up all remaining disk space of a shard. Once this happens the shard will obviously no longer function. Therefore it is important to watch over all existing databases and collections, and scan the config database to validate that they have been enabled for sharding.

This short script will show you the non-sharded collections on the MongoDB command line client:

use config;
var shard_collections = db.collections.find();
var sharded_names = {};
while (shard_collections.hasNext()) {
    shard = shard_collections.next();
    sharded_names[shard._id] = 1;
}


var admin_db = db.getSiblingDB("admin");
dbs = admin_db.runCommand({ "listDatabases": 1 }).databases;
dbs.forEach(function(database) {
    if (database.name != "config") {
        db = db.getSiblingDB(database.name);
        cols = db.getCollectionNames();
        cols.forEach(function(col) {
            if( col != "system.indexes" ) {
                if( shard_names[database.name + "." + col] != 1) {
                   print (database.name + "." + col);
                }
            }
        });
    }
});

It first retrieves a list of all sharded collections and saves this for later usage. Then it loops over all databases and collections, and checks if they have been sharded or not.

Maintaining shards

Once you have a sharded environment, you also need to maintain it. Basic operations, like adding shards, removing shards, rebalancing shards and making backups ensure you keep your cluster healthy and prepared for disaster. Once a shard is full, you can no longer perform write operations on it, so it essential to scale add new shards before that happens.

Adding a shard

Adding a shard is really simple: create a new replicaSet, and once it is up and running just simply add it with the following command on one of the shard routers:

mongos> sh.addShard("<replicaset_name>/<host>:<port>")

It suffices to add one host of the replicaSet, as this will seed the shard router with a host it can connect to and detect the remaining hosts.

After this it will add the shard to the cluster, and immediately make it available for all sharded collections. This also means that after adding a shard, the MongoDB shard balancer will start balancing all chunks over all shards. Since the capacity has increased and an empty shard has appeared, this means the balancer will cause an additional read and write load on all shards in the cluster. You may want to disable the balancer, if you are adding the extra shard during peak hours. Read more in the MongoDB Balancer section further down on why this happens and how to disable the balancer in these cases.

Removing a shard

Removing a shard will not be done often, as most people scale out their clusters. But just in case you ever need it, this section will describe how to remove a shard.

It is a bit harder to remove a shard than to add a shard, as this involves removing the data as well. To remove a shard, you need to find the name of the shard first.

mongos> db.adminCommand( { listShards: 1 } )
{
    "shards" : [
{ "_id" : "sh1", "host" : "sh1/10.10.34.12:27018" },
{ "_id" : "sh2", "host" : "sh2/10.10.34.15:27018" }
    ],
    "ok" : 1
}

Now we can request MongoDB to remove it, using the adminCommand:

mongos> db.adminCommand( { removeShard: "sh2" } )
{
    "msg" : "draining started successfully",
    "state" : "started",
    "shard" : "sh2",
    "note" : "you need to drop or movePrimary these databases",
    "dbsToMove" : [ ],
    "ok" : 1
}

This will start a balancing process that will migrate all data from this shard to the remaining shards. Depending on your dataset, this could take somewhere between minutes to hours to finish. Also keep in mind that you must have enough disk space available on the remaining shards, to be able to migrate the data. If not, the balancer will stop after one of the shards is full.

To watch the progress you can run the removeShard command once more:

mongos> db.adminCommand( { removeShard: "sh2" } )
{
    "msg" : "draining ongoing",
    "state" : "ongoing",
    "remaining" : {
        "chunks" : NumberLong(2),
        "dbs" : NumberLong(0)
    },
    "note" : "you need to drop or movePrimary these databases",
    "dbsToMove" : [ notsharded ],
    "ok" : 1
}

In the output from this command you can see the attribute “dbsToMove” is an array containing the database notsharded. If the array contains databases, this means this shard is the primary shard for these databases. Before removing the shard successfully, we need to drop or move the databases first. Moving is performed with the movePrimary command:

mongos> db.runCommand( { movePrimary : "notsharded", to : "sh1" } )

Once there are no more primary databases on the shard and the balancer is done with migrating data, it will wait for you to run the removeShard command once more. It will then output the state completed and finally remove the shard:

mongos> db.adminCommand( { removeShard: "sh2" } )
{
    "msg" : "removeshard completed successfully",
    "state" : "completed",
    "shard" : "sh2",
    "ok" : 1
}

MongoDB Balancer

We have mentioned the MongoDB balancer a couple of times before. The balancer is a very basic process, that has no other task than to keep the number of chunks per collection equal on every shard. So in reality it does nothing else than move around chunks of data between shards, until it is satisfied with the balance of chunks. This means it can also work against you in some cases.

The most obvious case where it can go wrong, is if you add a new shard with a larger storage capacity than the other shards. Having a shard with more capacity than the others, means the shard router will most likely add all new chunks on the shard with the largest capacity available. This means once a new chunk has been created on the new shard, another chunk will be moved to another shard to keep the number of chunks in balance. Therefore it is advisable to give equal storage space to all shards.

Another case where it can go wrong, is if your shard router splits chunks when you insert your data randomly. If these splits happen more often on one shard than the others, this means some of the existing chunks may be moved to other shards, and range queries on these chunks work less effectively as more shards need to be touched.

There are a couple of ways to influence the balancer. The balancer could create an additional IO workload on your data nodes, so you may wish to schedule the balancer to run only in off hours:

mongos> db.settings.update(
   { _id: "balancer" },
   { $set: { activeWindow : { start : "22:00", stop : "08:00" } } },
   { upsert: true }
)

If necessary you can also disable the balancer for a single collection. This would be good to do to solve the unnecessary moving of chunks, as we described earlier. Also for collections containing archived documents, you may want to enable this. To disable the balancer on a single collection, just run the following command:

mongos> sh.disableBalancing("mydata.hugearchive")
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Also during backups you need to disable the balancer to make a reliable backup, as otherwise chunks of data will be moved between shards. To disable the balancer, just run the following command:

mongos> sh.setBalancerState(false);
WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0, "_id" : "balancer" })

Don’t forget to enable the balancer again after the backup has finished. It may also be better to use a backup tool that takes a consistent backup across all shards, instead of using mongodump. More in the next section.

Backups

We have covered MongoDB backups in one of the previous posts. Everything in that blog post applies to replicaSets. Using the ordinary backup tools will allow you to make a backup of each replicaSet in the sharded cluster. With a bit of orchestration you could have them start at the same time, however this does not give you a consistent backup of the entire sharded cluster.

The problem lies in the size of each shard and the Configserver. The bigger the shard, the longer it would take to make a backup. This means if you make a backup, the backup of the Configserver probably finishes first, and the largest shard a very long time after. There may be data missing inside the configserver backup with respect to all new entries written in the meantime to the shards, and the same applies between shards. So using conventional tools to make a consistent backup is almost impossible, unless you can orchestrate every backup to finish at the same time.

That’s exactly what the Percona MongoDB consistent backup tool solves. It will orchestrate that every backup starts at the exact same time, and once it finishes backing up one of the replicaSets, it will continue to stream the oplog of that replicaSet until the last shard has finished. Restoring such a backup requires the additional oplog entries to be replayed against the replicaSets.

Managing and monitoring MongoDB shards with ClusterControl

Adding shards

Within ClusterControl you can easily add new shards with a two step wizard, opened from the actions drop down:

Here you can define the topology of the new shard.

Once the new shard has been added to the cluster, the MongoDB shard router will use it to assign new chunks to, and the balancer will automatically balance all chunks over all the shards.

Removing shards

In case you need to remove shards, you can simply remove them via the actions drop down:

This will allow you to select the shard that you wish to remove, and the shard you wish to migrate any primary databases to:

The job that removes the shard will then perform similar actions as described earlier: it will move any primary databases to the designated shard, enable the balancer and then wait for it to move all data from the shard.

Once all the data has been removed, it will remove the shard from the UI.

Consistent backups

In ClusterControl we have enabled the support for the Percona MongoDB Consistent backup tool:

To allow ClusterControl to backup with this tool, you need to install it on the ClusterControl node. ClusterControl will then use it to orchestrate a consistent backup of the MongoDB sharded cluster. This will create a backup for each replicaSet in the cluster, including the Configserver.

Conclusion

In this second blog on MongoDB sharding, we have shown how to monitor your shards, how to add or remove shards, how to rebalance them, and how to ensure you are prepared for disaster. We also have demonstrated how these operations can be performed or scheduled from ClusterControl.

In our next blog, we will dive into a couple of good sharding practices, recommended topologies and how to choose an effective sharding strategy.

Tags:

In this blogpost we look at the recent concerns around MongoDB ransomware and security issues, and how to mitigate this threat to your own MongoDB instance.

Recently, various security blogs raised concern that a hacker is hijacking MongoDB instances and asking ransom for the data stored. It is not the first time unprotected MongoDB instances have been found vulnerable, and this stirred up the discussion around MongoDB security again.

What is the news about?

About two years ago, the university of Saarland in Germany alerted that they discovered around 40,000 MongoDB servers that were easily accessible on the internet. This meant anyone could open a connection to a MongoDB server via the internet. How did this happen?

Default binding

In the past, the MongoDB daemon bound itself to any interface. This means anyone who has access to any of the interfaces on the host where MongoDB is installed, will be able to connect to MongoDB. If the server is directly connected to a public ip address on one of these interfaces, it may be vulnerable.

Default ports

By default, MongoDB will bind to standard ports: 27017 for MongoDB replicaSets or Shard Routers, 27018 for shards and 27019 for Configservers. By scanning a network for these ports it becomes predictable if a host is running MongoDB.

Authentication

By default, MongoDB configures itself without any form of authentication enabled. This means MongoDB will not prompt for a username and password, and anyone connecting to MongoDB will be able to read and write data. Since MongoDB 2.0 authentication has been part of the product, but never has been part of the default configuration.

Authorization

Part of enabling authorization is the ability to define roles. Without authentication enabled, there will also be no authorization. This means anyone connecting to a MongoDB server without authentication enabled, will have administrative privileges too. Administrative privileges stretches from defining users to configuring MongoDB runtime.

Why is all this an issue now?

In December 2016 a hacker exploited these vulnerabilities for personal enrichment. The hacker steals and removes your data, and leaves the following message in the WARNING collection:

{
     "_id" : ObjectId("5859a0370b8e49f123fcc7da"),
     "mail" : "harak1r1@sigaint.org",
     "note" : "SEND 0.2 BTC TO THIS ADDRESS 13zaxGVjj9MNc2jyvDRhLyYpkCh323MsMq AND CONTACT THIS EMAIL WITH YOUR IP OF YOUR SERVER TO RECOVER YOUR DATABASE !"
}

Demanding 0.2 bitcoins (around $200 at this moment of writing) may not sound like a lot if you really want your data back. However in the meanwhile your website/application is not able to function normally and may be defaced, and this could potentially cost way more than the 0.2 bitcoins.

A MongoDB server is vulnerable when it has a combination of the following:

Bound to a public interface
Bound to a default port
No (or weak) authentication enabled
No firewall rules or security groups in place

The default port could be debatable. Any port scanner would also be able to identify MongoDB if it was placed under an obscured port number.

The combination of all four factors means any attacker may be able to connect to the host. Without authentication (and authorization) the attacker can do anything with the MongoDB instance. And even if authentication has been enabled on the MongoDB host, it could still be vulnerable.

Using a network port scanner (e.g. nmap) would reveal the MongoDB build info to the attacker. This means he/she is able to find potential (zero-day) exploits for your specific version, and still manage to compromise your setup. Also weak passwords (e.g. admin/admin) could pose a threat, as the attacker would have an easy point of entry.

How can you protect yourself against this threat?

There are various precautions you can take:

Put firewall rules or security groups in place
Bind MongoDB only to necessary interfaces and ports
Enable authentication, users and roles
Backup often
Security audits

For new deployments performed from ClusterControl, we enable authentication by default, create a separate administrator user and allow to have MongoDB listen on a different port than the default. The only part ClusterControl can’t setup, is whether the MongoDB instance is available from outside your network.

Single Console for Your Entire Database Infrastructure

Deploy, manage, monitor, scale your databases on the technology stack of your choice!

Install ClusterControl Book a Demo

Securing MongoDB

The first step to secure your MongoDB server, would be to place firewall rules or security groups in place. These will ensure only the client hosts/applications necessary will be able to connect to MongoDB. Also make sure MongoDB only binds to the interfaces that are really necessary in the mongod.conf:

# network interfaces
net:
      port: 27017
      bindIp : [127.0.0.1,172.16.1.154]

Enabling authentication and setting up users and roles would be the second step. MongoDB has an easy to follow tutorial for enabling authentication and setting up your admin user. Keep in mind that users and passwords are still the weakest link in the chain, and ensure to make those secure!

After securing, you should ensure to always have a backup of your data. Even if the hacker manages to hijack your data, with a backup and big enough oplog you would be able to perform a point-in-time restore. Scheduling (shard consistent) backups can easily be setup in our database clustering, management and automation software called ClusterControl.

Perform security audits often: scan for any open ports from outside your hosting environment. Verify that authentication has been enabled for MongoDB, and ensure the users don’t have weak passwords and/or excessive roles. For ClusterControl we have developed two advisors that will verify all this. ClusterControl advisors are open source, and the advisors can be run for free using ClusterControl community edition.

Will this be enough to protect myself against any threat?

With all these precautions in place, you will be protected against any direct threat from the internet. However keep in mind that any machine compromised in your hosting environment may still become a stepping stone to your now protected MongoDB servers. Be sure to upgrade MongoDB to the latest (patch) releases and be protected against any threat.

Tags:

Following the flurry of blogs, articles and social postings that have been published in recent weeks in response to the attacks on MongoDB systems and related ransomware, we thought we’d clear through the fog and provide you with 10 straightforward, tested tips on how to best secure MongoDB (from attacks and ransomware).

What is ransomware?

According to the definition, ransomware is malware that secretly installs on your computer, encrypts your files and then demands a ransom to be paid to unlock your files or not publish them publicly. The ransomware that hit MongoDB users in various forms over the past weeks applies mostly to this definition. However it is not malware that hits the MongoDB instances, but it is a scripted attack from outside.

Once the attackers have taken control over your MongoDB instance, most of them hijack your data by copying it onto their own storage. After making a copy they will erase the data on your server, and leave a database with a single collection demanding ransom for your data. In addition, some also threaten to erase the backup that they hold hostage if you don’t pay within 3 days. Some victims, who allegedly paid, in the end never received their backup.

Why is this happening

The attackers are currently scanning for MongoDB instances that are publicly available, meaning anyone can connect to these servers by a simple MongoDB connect. If the server instance does not have authentication enabled, anyone can connect without providing a username and password. Also, the lack of authentication implies there is no authorization in place, and anyone connecting will be treated as an administrator of the system. This is very dangerous as now, anyone can perform administrative tasks like changing data, creating users or dropping databases.

By default MongoDB doesn’t enable authentication in its configuration, it is assumed that limiting the MongoDB server to only listen on localhost is sufficient protection. This may be true in some use cases, but anyone using MongoDB in a multi-tenant software stack will immediately enable the MongoDB server to also listen on other network devices. If these hosts can also be accessed from the internet, this poses a big threat. This is a big shortcoming of the default installation of the MongoDB installation process.

MongoDB itself can’t be fully blamed for this, as they encourage to enable authentication and do provide a tutorial on how to enable authentication and authorization. It is the user that doesn’t apply this on their publicly accessible MongoDB servers. But what if the user never knew the server was publicly available?

Ten tips on how to secure your MongoDB deployments

Tip 1: Enable authentication

This is the easiest solution to keep unwanted people outside: simply enable authentication in MongoDB. To explicitly do this, you will need to put the following lines in the mongod.conf:

security:
    Authentication: on

If you have set the replication keyfile in the mongod.conf, you will also implicitly enable authentication.

As most current attackers are after easy-to-hijack instances, they will not attempt to break into a password protected MongoDB instance.

Tip 2: Don’t use weak passwords

Enabling authentication will not give you 100% protection against any attack. Trying weak passwords may be the next weapon of choice for the attackers. Currently MongoDB does not feature a (host) lockout for too many wrong user/password attempts. Automated attacks may be able to crack weak passwords in minutes.

Setup passwords according to good and proven password guidelines, or make use of a strong password generator.

Tip 3: Authorize users by roles

Even if you have enabled authentication, don’t just give every user an administrative role. This would be very convenient from the user perspective as they can literally perform every task thinkable, and don’t require to wait for a DBA to execute this task. But for any attacker, this is just as convenient: as soon as they have entry to one single account, they also immediately have the administrative role.

MongoDB has a very strong diversification of roles, and for any type of task an applicable role is present. Ensure that the user carrying the administrative role is a user that isn’t part of the application stack. This should slim down the chances of an account breach to result into disaster.

When provisioning MongoDB from ClusterControl, we deploy new MongoDB replicaSets and sharded clusters with a separate admin and backup user.

Tip 4: Add a replication keyfile

As mentioned before in Tip #1, enabling the replication keyfile will implicitly enable authentication in MongoDB. But there is a much more important reason to add a replication keyfile: once added, only hosts with the file installed are able to join the replicaSet.

Why is this important? Adding new secondaries to a replicaSet normally requires the clusterManager role in MongoDB. Without authentication, any user can add a new host to the cluster and replicate your data over the internet. This way the attacker can silently and continuously tap into your data.

With the keyfile enabled, the authentication of the replication stream will be encrypted. This ensures nobody can spoof the ip of an existing host, and pretend to be another secondary that isn’t supposed to be part of the cluster. In ClusterControl, we deploy all MongoDB replicaSets and sharded clusters with a replication keyfile.

Tip 5: Make backups regularly

Schedule backups, to ensure you always have a recent copy of your data. Even if the attacker is able to remove your databases, they don’t have access to your oplog. So if your oplog is large enough to contain all transactions since the last backup, you can still make a point in time recovery with a backup and replay the oplog till the moment the attacker started to remove data.

ClusterControl has a very easy interface to schedule (shard consistent) backups, and restoring your data is only one click away.

Tip 6: Run MongoDB on a non-standard port

As most attackers are only scanning for the standard MongoDB ports, you could reconfigure MongoDB to run on a different port. This would not stop attackers who perform a full port scan, they will still discover an open MongoDB instance.

Standard ports are:

27017 (mongod / mongos)
27018 (mongod in sharded environments)
27019 (configservers)
2700x (some MongoDB tutorials)

This requires the following line to be added/changed in the mongod.conf and a restart is required:

net:
   port: 17027

In ClusterControl, you can deploy new MongoDB replicaSets and sharded clusters with custom port numbers.

Tip 7: Does your application require public access?

If you have enabled MongoDB to be bound on all interfaces, you may want to review if your application actually needs external access to the datastore. If your application is a single hosted solution and resides on the same host as the MongoDB server, it can suffice by binding MongoDB to localhost.

This requires the following line to be added/changed in the mongod.conf and a restart is required:

net:
   bindIp: 127.0.0.1

In many hosting and cloud environments with multi-tenant architectures, applications are put on different hosts than where the datastore resides. The application then connects to the datastore via the private (internal) network. If this applies to your environment, you need to ensure to bind MongoDB only to the private network.

This requires the following line to be added/changed in the mongod.conf and a restart is required:

net:
   bindIp: 127.0.0.1,172.16.1.234

Tip 8: Enable firewall rules or security groups

It is a good practice to enable firewall rules on your hosts, or security groups with cloud hosting. Simply disallowing the MongoDB port ranges from outside will keep most attackers outside.

There would still be another way to get in: from the inside. If the attacker would gain access to another host in your private (internal) network, they still could access your datastore. A good example would be proxying tcp/ip requests via a http server. Add firewall rules to the MongoDB instance and deny any other host except the hosts that you know for sure need access. This should, at least, limit the number of hosts that could potentially be used to get your data. And as indicated in Tip 1: enable authentication, even if someone proxies into your private network they can’t steal your data.

Also, if your application does require MongoDB to be available on the public interface, you can limit the hosts accessing the database by simply adding similar firewall rules.

Single Console for Your Entire Database Infrastructure

Deploy, manage, monitor, scale your databases on the technology stack of your choice!

Install ClusterControl Book a Demo

Tip 9: Go Hack Yourself ! Check for external connectivity

To guarantee that you have followed all previous tips, simply test if anything is exposed from external. If you don’t have a host that is outside your hosting environment, a cloud box at any hosting provider would suffice for this check.

From external, check if you can connect to your host via telnet on the command line.

In case you did change the port number of MongoDB, use the appropriate port here.

telnet your.host.com 27017

If this command returns something similar to this, the port is open:

Trying your.host.com...
Connected to your.host.com.
Escape character is '^]'.

Another method of testing would be installing nmap on the host and testing it against your host:

[you@host ~]$ sudo yum install nmap
[you@host ~]$ nmap -p 27017 --script mongodb-databases your.host.com

If nmap is able to connect, you will see something similar to this:

PORT      STATE SERVICE REASON
27017/tcp open  unknown syn-ack
| mongodb-databases:
|   ok = 1
|   databases
|     1
|       empty = false
|       sizeOnDisk = 83886080
|       name = test
|     0
|       empty = false
|       sizeOnDisk = 83886080
|       name = yourdatabase
|     2
|       empty = true
|       sizeOnDisk = 1
|       name = local
|     2
|       empty = true
|       sizeOnDisk = 1
|       name = admin
|_  totalSize = 167772160

If you only enabled authentication, nmap is able to open the open port but not list the databases:

Starting Nmap 6.40 ( http://nmap.org ) at 2017-01-16 14:36 UTC
Nmap scan report for 10.10.22.17
Host is up (0.00031s latency).
PORT      STATE SERVICE
27017/tcp open  mongodb
| mongodb-databases:
|   code = 13
|   ok = 0
|_  errmsg = not authorized on admin to execute command { listDatabases: 1.0 }

And if you managed to secure everything from external, the output would look similar to this:

Starting Nmap 6.40 ( http://nmap.org ) at 2017-01-16 14:37 UTC
Nmap scan report for 10.10.22.17
Host is up (0.00013s latency).
PORT      STATE  SERVICE
27017/tcp closed unknown

If nmap is able to connect to MongoDB, with or without authentication enabled, it can identify which MongoDB version you are running with the mongodb-info flag:

[you@host ~]$ nmap -p 27017 --script mongodb-info 10.10.22.17

Starting Nmap 6.40 ( http://nmap.org ) at 2017-01-16 14:37 UTC
Nmap scan report for 10.10.22.17
Host is up (0.00078s latency).
PORT      STATE SERVICE
27017/tcp open  mongodb
| mongodb-info:
|   MongoDB Build info
|     javascriptEngine = mozjs
|     buildEnvironment
|       distmod =
|       target_arch = x86_64
… 
|     openssl
|       running = OpenSSL 1.0.1e-fips 11 Feb 2013
|       compiled = OpenSSL 1.0.1e-fips 11 Feb 2013
|     versionArray
|       1 = 2
|       2 = 11
|       3 = -100
|       0 = 3
|     version = 3.2.10-3.0
…
|   Server status
|     errmsg = not authorized on test to execute command { serverStatus: 1.0 }
|     code = 13
|_    ok = 0

As you can see, the attacker can identify your version, build environment and even the OpenSSL libraries it is compiled against. This enables attackers to even go beyond simple authentication breaches, and exploit vulnerabilities for your specific MongoDB build. This is why it is essential to not expose MongoDB to outside your private network, and also why you need to update/patch your MongoDB servers on a regular basis.

Tip 10: Check for excessive privileges

Even if you have implemented all tips above, it wouldn’t hurt to go through all databases in MongoDB and check if any user has excessive privileges. As MongoDB authenticates your user against the database that you connect to, it could be the case that the user also has been granted additional rights to other databases.

For example:

use mydatastore
db.createUser(
  {
    user: "user",
    pwd: "password",
    roles: [ { role: "readWrite", db: "mysdatastore" },
             { role: "readWrite", db: "admin" } ]
  }
);

In addition to a weak password and the readWrite privileges on the mydatastore database, this user also has readWrite privileges on the admin database. Connecting to mydatastore and switching the database to the admin database will not issue re-authentication. In contrary: this user is allowed to read and write to the admin database.

This is a good reason to review the privileges of your users on a regular basis. You can do this by the following command:

my_mongodb_0:PRIMARY> use mydatastore
switched to db mydatastore
my_mongodb_0:PRIMARY> db.getUsers();
[
    {
        "_id" : "mysdatastore.user",
        "user" : "user",
        "db" : "mysdatastore",
        "roles" : [
            {
                "role" : "readWrite",
                "db" : "mysdatastore"
            },
            {
                "role" : "readWrite",
                "db" : "admin"
            }
        ]
    }
]

As you need to repeat this process per database, this can be a lengthy and cumbersome exercise. In ClusterControl, we have an advisor that performs this check on a daily basis. ClusterControl advisors are open source, and these advisors are part of the free version of ClusterControl.

That’s all folks! Do not hesitate to get in touch if you have any questions on how to secure your database environment.

Tags:

MongoDB backup format

Archive file

Include the oplog

Restoring MongoDB from a backup

There are basically two ways you can use a BSON format dump:

Run mongod directly from the backup directory
Run mongorestore and restore the backup

Run mongod directly from a backup

A prerequisite for running mongod directly from the backup is that the backup target is a standard dump, and is not gzipped.

This command will start the MongoDB daemon with the data directory set to the backup location:

[root@node ~] mongod --datapath /backups/backup_10.10.34.12_2016-08-02_180808/

Run mongorestore

A better way to restore would obviously be by restoring the node using mongorestore:

[root@node ~] mongorestore /backups/backup_10.10.34.12_2016-08-02_180808/

In case of a backup made by ClusterControl (or one that is in archive and gzipped format) the command is slightly different:

[root@node ~] mongorestore --gzip --archive=/backups/backup_10.10.34.12_2016-08-02_180808.gz

Object validation

Oplog replay

Restoring a full replicaSet from a backup

Restore node first, then create replicaSet

We will start mongod on the first node running in the foreground:

[root@firstnode ~] mongod --dbpath /var/lib/mongodb/

Restore the node in a second terminal

[root@firstnode ~] mongorestore --gzip --archive=/backups/backup_10.10.34.12_2016-08-02_180808.gz

Now stop mongod in the first terminal, and start it again with the replicaSet enabled:

[root@firstnode ~] mongod --dbpath /var/lib/mongodb/ --replSet <set_we_restore>

Then initiate the replicaSet in the second terminal:

[root@firstnode ~] mongo --eval “rs.initiate()”

After this you can add the other nodes to the replicaSet:

[root@firstnode ~] mongo
test:PRIMARY> rs.add(“secondnode:27017”)
{ "ok" : 1 }
test:PRIMARY> rs.add(“thirdnode:27017”)
{ "ok" : 1 }

Now the second and third node will sync their data from the first node. After the sync has finished our replicaSet has been restored.

Create a ReplicaSet first, then restore

[root@firstnode ~] mongo
test:PRIMARY> rs.initiate()
{"info2" : "no configuration specified. Using a default configuration for the set","me" : "firstnode:27013","ok" : 1
}
test:PRIMARY> rs.add(“secondnode:27017”)
{ "ok" : 1 }
test:PRIMARY> rs.add(“thirdnode:27017”)
{ "ok" : 1 }

Now that we have created the replicaSet, we can directly restore our backup into it:

mongorestore --host <replicaset>/<host1>:27017,<host2>:27017,<host3>:27017 --gzip --archive=/backups/backup_10.10.34.12_2016-08-02_180808.gz

In our opinion restoring a replicaSet this way is much more elegant. It is closer to the way you would normally set up a new replicaSet from scratch, and then fill it with (production) data.

Seeding a new node in a replicaSet

If you only have one node, you have to shut down the primary mongod:

[root@primary ~] service mongod stop

Start up the receiving side of the new node:

[root@newnode ~] cd /var/lib/mongodb
[root@newnode ~] nc -l 7000 | tar -xpf -

Make a copy of the data directory and copy this over to the new node:

[root@primary ~] tar -cf - /var/lib/mongodb | nc newnode 7000

Once the copy process has finished, start mongod on both nodes:

[root@primary ~] service mongod start
[root@newnode ~] service mongod start

And now you can add the new node to the replicaSet.

[root@primary ~] mongo
test:PRIMARY> rs.add("newnode:27017")
{ "ok" : 1 }

In the MongoDB log file, the joining process of the new node will look similar to this:

I REPL     [ReplicationExecutor] This node is newnode:27017 in the config
I REPL     [ReplicationExecutor] transition to STARTUP2
I REPL     [ReplicationExecutor] Member primary:27017 is now in state PRIMARY
I REPL     [ReplicationExecutor] syncing from: primary:27017
I REPL     [ReplicationExecutor] transition to RECOVERING
I REPL     [ReplicationExecutor] transition to SECONDARY

Seeding with a backup

If you would do such a thing, let the log file speak for itself:

I REPL     [ReplicationExecutor] This node is newnode:27017 in the config
I REPL     [ReplicationExecutor] transition to STARTUP2
I REPL     [ReplicationExecutor] Member primary:27017 is now in state PRIMARY
I REPL     [rsSync] ******
I REPL     [rsSync] creating replication oplog of size: 1514MB...
I STORAGE  [rsSync] Starting WiredTigerRecordStoreThread local.oplog.rs
I STORAGE  [rsSync] The size storer reports that the oplog contains 0 records totaling to 0 bytes
I STORAGE  [rsSync] Scanning the oplog to determine where to place markers for truncation
I REPL     [rsSync] ******
I REPL     [rsSync] initial sync pending
I REPL     [ReplicationExecutor] syncing from: primary:27017
I REPL     [rsSync] initial sync drop all databases
I STORAGE  [rsSync] dropAllDatabasesExceptLocal 2
I REPL     [rsSync] initial sync clone all databases

ReplicaSet node recovery

Conclusion

In the next blog post, we will start scaling out read requests with MongoDB, and discuss the things you should look out for!

Tags:

Read scaling considerations

Reading from a secondary

Setting read preference

primary	Always read from the primary (default)
primaryPreferred	Always read from the primary, read from secondary if the primary is unavailable
secondary	Always read from a secondary
secondaryPreferred	Always read from a secondary, read from the primary if no secondary is available
nearest	Always read from the node with the lowest network latency

Filtering nodes with tags

In MongoDB you can tag nodes in a replicaSet. This allows you to make groupings of nodes and use them for many purposes, including filtering them when reading from secondary nodes.

An example of a replicaSet with tagging can be:

{"_id" : "myrs","version" : 2,"members" : [
             {"_id" : 0,"host" : "host1:27017","tags" : {"dc": "1","rack": "e3"
                     }
             }, {"_id" : 1,"host" : "host2:27017","tags" : {"dc": "1","rack": "b2"
                     }
             }, {"_id" : 0,"host" : "host3:27017","tags" : {"dc": "2","rack": "q1"
                     }
             }
    ]
}

This tagging allows us to limit our secondary to exist, for instance, in our first datacenter:

db.getMongo().setReadPref(‘secondaryPreferred’, [ { "dc": "1" } ] )

Naturally the tags can be used with all read preference modes, except Primary.

Enabling secondary queries

myset:SECONDARY> rs.slaveOk()

Reading from a secondary in a shard

Adding more secondary nodes

Conclusion

Tags:

MongoDB Replication window

If a node lags too far behind, it may not be able to catch up anymore as the transactions necessary are no longer in the oplog of the primary.
A lagging secondary node is less favoured in a MongoDB election for a new primary. If all secondaries are lagging behind in replication, you will have a problem and one with the least lag will be made primary.
Secondaries lagging behind are less favoured by the MongoDB driver when scaling out reads with MongoDB, it also adds a higher workload on the remaining secondaries.

Calculating the MongoDB Replication Window

mongo_replica_2:PRIMARY> db.getReplicationInfo()
{"logSizeMB" : 1894.7306632995605,"usedMB" : 501.17,"timeDiff" : 91223,"timeDiffHours" : 25.34,"tFirst" : "Wed Oct 12 2016 22:48:59 GMT+0000 (UTC)","tLast" : "Fri Oct 14 2016 00:09:22 GMT+0000 (UTC)","now" : "Fri Oct 14 2016 12:32:51 GMT+0000 (UTC)"
}

mongo_replica_2:PRIMARY> use local
switched to db local
mongo_replica_2:PRIMARY> db.oplog.rs.find().limit(1);
{ "ts" : Timestamp(1476312539, 1), "h" : NumberLong("-3302015507277893447"), "v" : 2, "op" : "n", "ns" : "", "o" : { "msg" : "initiating set" } }
mongo_replica_2:PRIMARY> db.oplog.rs.find().sort({$natural: -1}).limit(1);
{ "ts" : Timestamp(1476403762, 1), "h" : NumberLong("3526317830277016106"), "v" : 2, "op" : "n",  "ns" : "ycsb.usertable", "o" : { "_id" : "user5864876345352853020",
…
}

The overhead of both queries is very low and will not interfere with the functioning of the oplog.

In the example above, the replication window would be 91223 seconds (the difference of 1476403762 and 1476312539).

As the replication window will be calculated per node and we like to keep our advisor as readable as possible, and we will abstract the calculation into a function:

function getReplicationWindow(host) {
  var replwindow = {};
  // Fetch the first and last record from the Oplog and take it's timestamp
  var res = host.executeMongoQuery("local", '{find: "oplog.rs", sort: { $natural: 1}, limit: 1}');
  replwindow['first'] = res["result"]["cursor"]["firstBatch"][0]["ts"]["$timestamp"]["t"];
  res = host.executeMongoQuery("local", '{find: "oplog.rs", sort: { $natural: -1}, limit: 1}');
  replwindow['last'] = res["result"]["cursor"]["firstBatch"][0]["ts"]["$timestamp"]["t"];
  replwindow['replwindow'] = replwindow['first'] - replwindow['last'];
  return replwindow;
}

The function returns the timestamp of the first and last items in the oplog and the replication window as well. We will actually need all three in a later phase.

Calculating the MongoDB Replication lag

function getReplicationStatus(host, primary_id) {
    var node_status = {};
    var res = host.executeMongoQuery("admin", "{ replSetGetStatus: 1 }");
    // Fetch the optime and uptime per host
    for(i = 0; i < res["result"]["members"].size(); i++)
    {
        tmp = res["result"]["members"][i];
        host_id = tmp["name"];
        node_status[host_id] = {};
        node_status[host_id]["name"] = host_id;
        node_status[host_id]["primary"] = primary_id;
        node_status[host_id]["setname"] = res["result"]["set"];
        node_status[host_id]["uptime"] = tmp["uptime"];
        node_status[host_id]["optime"] = tmp["optime"]["ts"]["$timestamp"]["t"];
    }
    return node_status;
}

We keep a little bit more information than necessary here, but you will see why in the next paragraphs.

Calculating the time left per node

// Calculate the replication window of the primary against the node's last transaction
replwindow_node = replstatus[host]['optime'] - replwindow_primary['first'];

But we are not done yet!

Adding another layer of complexity: sharding

The corrected line of code would be:

host_id = host.hostName() + ":" + host.port();
primary_id = replstatus_per_node[host_id]['primary'];
...
// Calculate the replication window of the primary against the node's last transaction
replwindow_node = replstatus['optime'] - replwindow_per_node[primary_id]['first'];

msg = "The replication window for node " + host_id + " (" + replstatus["setname"] + ") is long enough.";

Take the uptime into account

Another possible impediment with our advisor is that it will start warning us immediately on a freshly deployed cluster:

The first and last entry in the oplog will be within seconds of each other after initiating the replicaSet.
Also on a newly created replicaSet, the probability of it being immediately used would be very low, so there is not much use in alerting on a (too) short replication window in this case either.
At the same time a newly created replicaSet may also receive a huge write workload when a logical backup is restored, shortening all entries in the oplog to a very short timeframe. This especially becomes an issue if after this burst of writes, no more writes happen for a long time, as now the replication window becomes very short but also outdated.

There are three possible solutions for these problems to make our advisor more reliable:

We parse the first item in the oplog. If it contains the replicaset initiating document we will ignore a too short replication window. This can easily be done alongside parsing the first record in the oplog.
We also take the uptime into consideration. If the host’s uptime is shorter than our warning threshold, we will ignore a too short replication window.
If the replication window is too short, but the newest item in the oplog is older than our warning threshold, this should be considered a false positive.

To parse the first item in the oplog we have to add these lines, where we identify it as a new set:

var res = host.executeMongoQuery("local", '{find: "oplog.rs", sort: { $natural: 1}, limit: 1}');
replwindow['first'] = res["result"]["cursor"]["firstBatch"][0]["ts"]["$timestamp"]["t"];

if (res["result"]["cursor"]["firstBatch"][0]["o"]["msg"] == "initiating set") {
    replwindow['newset'] = true;
}

Then later in the check for our replication window we can verify all three exceptions together:

if(replwindow['newset'] == true) {
    msg = "Host " + host_id + " (" + replstatus["setname"] + ") is a new replicaSet. Not enough entries in the oplog to determine the replication window.";
    advice.setSeverity(Ok);
    advice.setJustification("");
} else if (replstatus["uptime"] < WARNING_REPL_WINDOW) {
    msg = "Host " + host_id + " (" + replstatus["setname"] + ") only has an uptime of " + replstatus["uptime"] + " seconds. Too early to determine the replication window.";
    advice.setSeverity(Ok);
    advice.setJustification("");
}
else if (replwindow['max'] < (CmonDateTime::currentDateTime().toString("%s") - WARNING_REPL_WINDOW)) {
    msg = "Latest entry in the oplog for host " + host_id + " (" + replstatus["setname"] + ") is older than " + WARNING_REPL_WINDOW + " seconds. Determining the replication window would be unreliable.";
    advice.setSeverity(Ok);
    advice.setJustification("");
}

After this we can finally warn / advise on the replication window of the node. The output would look similar to this:

Conclusion

For completeness, here is the full advisor:

#include "common/helpers.js"
#include "cmon/io.h"
#include "cmon/alarms.h"

// It is advised to have a replication window of at least 24 hours, critical is 1 hour
var WARNING_REPL_WINDOW = 24*60*60;
var CRITICAL_REPL_WINDOW = 60*60;
var TITLE="Replication window";
var ADVICE_WARNING="Replication window too short. ";
var ADVICE_CRITICAL="Replication window too short for one hour of downtime / maintenance. ";
var ADVICE_OK="The replication window is long enough.";
var JUSTIFICATION_WARNING="It is advised to have a MongoDB replication window of at least 24 hours. You could try to increase the oplog size. See also: https://docs.mongodb.com/manual/tutorial/change-oplog-size/";
var JUSTIFICATION_CRITICAL=JUSTIFICATION_WARNING;


function main(hostAndPort) {

    if (hostAndPort == #N/A)
        hostAndPort = "*";

    var hosts   = cluster::mongoNodes();
    var advisorMap = {};
    var result= [];
    var k = 0;
    var advice = new CmonAdvice();
    var msg = "";
    var replwindow_per_node = {};
    var replstatus_per_node = {};
    var replwindow = {};
    var replstatus = {};
    var replwindow_node = 0;
    var host_id = "";
    for (i = 0; i < hosts.size(); i++)
    {
        // Find the primary and execute the queries there
        host = hosts[i];
        host_id = host.hostName() + ":" + host.port();

        if (host.role() == "shardsvr" || host.role() == "configsvr") {
            // Get the replication window of each nodes in the cluster, and store it for later use
            replwindow_per_node[host_id] = getReplicationWindow(host);

            // Only retrieve the replication status from the master
            res = host.executeMongoQuery("admin", "{isMaster: 1}");
            if (res["result"]["ismaster"] == true) {
                //Store the result temporary and then merge with the replication status per node
                var tmp = getReplicationStatus(host, host_id);
                for(o=0; o < tmp.size(); o++) {
                    replstatus_per_node[tmp[o]['name']] = tmp[o];
                }

                //replstatus_per_node =
            }
        }
    }

    for (i = 0; i < hosts.size(); i++)
    {
        host = hosts[i];
        if (host.role() == "shardsvr" || host.role() == "configsvr") {
            msg = ADVICE_OK;

            host_id = host.hostName() + ":" + host.port();
            primary_id = replstatus_per_node[host_id]['primary'];
            replwindow = replwindow_per_node[host_id];
            replstatus = replstatus_per_node[host_id];

            // Calculate the replication window of the primary against the node's last transaction
            replwindow_node = replstatus['optime'] - replwindow_per_node[primary_id]['first'];
            // First check uptime. If the node is up less than our replication window it is probably no use warning
            if(replwindow['newset'] == true) {
              msg = "Host " + host_id + " (" + replstatus["setname"] + ") is a new replicaSet. Not enough entries in the oplog to determine the replication window.";
              advice.setSeverity(Ok);
              advice.setJustification("");
            } else if (replstatus["uptime"] < WARNING_REPL_WINDOW) {
                msg = "Host " + host_id + " (" + replstatus["setname"] + ") only has an uptime of " + replstatus["uptime"] + " seconds. Too early to determine the replication window.";
                advice.setSeverity(Ok);
                advice.setJustification("");
            }
            else if (replwindow['max'] < (CmonDateTime::currentDateTime().toString("%s") - WARNING_REPL_WINDOW)) {
              msg = "Latest entry in the oplog for host " + host_id + " (" + replstatus["setname"] + ") is older than " + WARNING_REPL_WINDOW + " seconds. Determining the replication window would be unreliable.";
              advice.setSeverity(Ok);
              advice.setJustification("");
            }
            else {
                // Check if any of the hosts is within the oplog window
                if(replwindow_node < CRITICAL_REPL_WINDOW) {
                    advice.setSeverity(Critical);
                    msg = ADVICE_CRITICAL + "Host " + host_id + " (" + replstatus["setname"] + ") has a replication window of " + replwindow_node + " seconds.";
                    advice.setJustification(JUSTIFICATION_CRITICAL);
                } else {
                    if(replwindow_node < WARNING_REPL_WINDOW)
                    {
                        advice.setSeverity(Warning);
                        msg = ADVICE_WARNING + "Host " + host_id + " (" + replstatus["setname"] + ") has a replication window of " + replwindow_node + " seconds.";
                        advice.setJustification(JUSTIFICATION_WARNING);
                    } else {
                        msg = "The replication window for node " + host_id + " (" + replstatus["setname"] + ") is long enough.";
                        advice.setSeverity(Ok);
                        advice.setJustification("");
                    }
                }
            }

            advice.setHost(host);
            advice.setTitle(TITLE);
            advice.setAdvice(msg);
            advisorMap[i]= advice;
        }
    }
    return advisorMap;
}

function getReplicationStatus(host, primary_id) {
    var node_status = {};
    var res = host.executeMongoQuery("admin", "{ replSetGetStatus: 1 }");
    // Fetch the optime and uptime per host
    for(i = 0; i < res["result"]["members"].size(); i++)
    {
        tmp = res["result"]["members"][i];
        node_status[i] = {};
        node_status[i]["name"] = tmp["name"];;
        node_status[i]["primary"] = primary_id;
        node_status[i]["setname"] = res["result"]["set"];
        node_status[i]["uptime"] = tmp["uptime"];
        node_status[i]["optime"] = tmp["optime"]["ts"]["$timestamp"]["t"];
    }
    return node_status;
}

function getReplicationWindow(host) {
  var replwindow = {};
  replwindow['newset'] = false;
  // Fetch the first and last record from the Oplog and take it's timestamp
  var res = host.executeMongoQuery("local", '{find: "oplog.rs", sort: { $natural: 1}, limit: 1}');
  replwindow['first'] = res["result"]["cursor"]["firstBatch"][0]["ts"]["$timestamp"]["t"];
  if (res["result"]["cursor"]["firstBatch"][0]["o"]["msg"] == "initiating set") {
      replwindow['newset'] = true;
  }
  res = host.executeMongoQuery("local", '{find: "oplog.rs", sort: { $natural: -1}, limit: 1}');
  replwindow['last'] = res["result"]["cursor"]["firstBatch"][0]["ts"]["$timestamp"]["t"];
  replwindow['replwindow'] = replwindow['last'] - replwindow['first'];
  return replwindow;
}

Tags:

Join us for our third ‘How to become a MongoDB DBA’ webinar on Tuesday, November 15th! In this webinar we will uncover the secrets and caveats of MongoDB scaling and sharding.

Become a MongoDB DBA - Scaling and Sharding

There is more to scaling than just simply adding nodes and shards. Factors to take into account include indexing, shard re-balancing,replication lag, capacity planning and consistency.

Date, Time & Registration

Europe/MEA/APAC

Tuesday, November 15th at 09:00 GMT / 10:00 CET (Germany, France, Sweden)

North America/LatAm

Tuesday, November 15th at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)

Agenda

What are the differences in read and write scaling with MongoDB
Read scaling considerations with MongoDB
MongoDB read preference explained
How sharding works in MongoDB
Adding new shards and balance data
How to scale and shard MongoDB using ClusterControl
Live Demo

Speaker

We look forward to “seeing” you there!

Tags:

So far in the “Become a MongoDB DBA” series, we covered Deployment, Configuration, Monitoring (part 1), Monitoring (part 2), backup, restore and read scaling.

MongoDB supports sharding out of the box and it is relatively easy to set up, though there are a couple of considerations you need to take before sharding your data.

With that in mind, we’re starting a three part miniseries about MongoDB and sharding.

MongoDB sharding primer

A typical sharded MongoDB environment looks like this:

Sharding tier

Shard tier

Setting up a simple sharded cluster

First we create the paths for each instance to run:

mkdir /var/lib/mongodb/cfg /var/lib/mongodb/sh1 /var/lib/mongodb/sh2 /var/lib/mongodb/sh3

The first component to start is Configserver:

mongod --fork --dbpath /var/lib/mongodb/cfg --logpath cfg.log --replSet cfg --configsvr

Once it has started up properly, we need to initialize the Configserver, as it is in reality a replicaSet:

mongo --eval 'rs.initiate({"_id":"cfg", "members":[{"_id":1, "host":"127.0.0.1:27019"}]});' --port 27019

Now we can start the shard router (mongos) and connect to the Configserver:

mongos --fork --configdb cfg/127.0.0.1:27019 --logpath mongos.log

Next we start up three individual replicaSets that will become shards later on:

mongod --fork --dbpath /var/lib/mongodb/sh1 --logpath sh1.log --port 27001 --replSet sh1
mongod --fork --dbpath /var/lib/mongodb/sh2 --logpath sh2.log --port 27002 --replSet sh2
mongod --fork --dbpath /var/lib/mongodb/sh3 --logpath sh3.log --port 27003 --replSet sh3

And, just like the configserver, we need to initialize these replicaSets as well:

mongo --eval 'rs.initiate({"_id":"sh1", "members":[{"_id":1, "host":"127.0.0.1:27001"}]});' --port 27001
mongo --eval 'rs.initiate({"_id":"sh2", "members":[{"_id":1, "host":"127.0.0.1:27002"}]});' --port 27002
mongo --eval 'rs.initiate({"_id":"sh3", "members":[{"_id":1, "host":"127.0.0.1:27003"}]});' --port 27003

Now we can add these replicaSets as shards via mongos:

mongo --eval 'sh.addShard("sh1/127.0.0.1:27001");'
mongo --eval 'sh.addShard("sh2/127.0.0.1:27002");'
mongo --eval 'sh.addShard("sh3/127.0.0.1:27003");'

mongo --eval 'sh.addShard("sh1/node1.domain.com:27018,node2.domain.com:27018,node3.domain.com:27018");'

Setting up a sharded cluster using ClusterControl

The first step is to allow ClusterControl to ssh to the hosts that we are going to deploy upon.

The second step is the definition of shard routers and configservers:

Third step is defining the shards:

And in the last step we define the database specific settings, such as which version of MongoDB we will deploy:

The total time taken to fill in this wizard should be around 1 minute, and after pressing the Deploy button, ClusterControl will automatically create a newly sharded cluster.

This is a quick and easy way to get started with a MongoDB sharded cluster.

Shard key

An example of a shard key and its distribution can be seen below:

mongos> sh.status()
--- Sharding Status ---
…
databases:
{  "_id" : "shardtest",  "primary" : "sh1",  "partitioned" : true }
    shardtest.collection
        shard key: { "_id" : 1 }
        unique: false
        balancing: true
        chunks:
            sh1    1
            sh2    2
            sh3    1
        { "_id" : { "$minKey" : 1 } } -->> { "_id" : 2 } on : sh3 Timestamp(6, 1)
        { "_id" : 2 } -->> { "_id" : 24 } on : sh1 Timestamp(5, 1)
        { "_id" : 24 } -->> { "_id" : 1516 } on : sh2 Timestamp(4, 1)
        { "_id" : 1516 } -->> { "_id" : { "$maxKey" : 1 } } on : sh2 Timestamp(6, 4)

Limitations of the shard key

Keep in mind that once you start sharding a collection, you can no longer change the shard key and update the values of the shard key fields.

Influence of the shard key

Sequential writes

Assume you have an application that writes a lot of sequential data only once, for instance a click-tracking application, with the following document:

{"_id" : ObjectId("5813bbaae9e36caf3e4eab5a"),"uuid" : "098f253e-6310-45a0-a6c3-940be6ed8eb4","clicks" : 4,"pageurl" : "/blog/category/how-to","ts" : Timestamp(1477688235, 1)
}

Random writes

{"_id" : ObjectId("5813c5a9e9e36caf3e4eab5b"),"uuid" : "098f253e-6310-45a0-a6c3-940be6ed8eb4","firstname" : "John","lastname" : "Doe","email" : "john.doe@gmail.com","registration_time" : Timestamp(1477690793, 1)
}

The shard key paradigm

Enabling sharding on our cluster

Now that we have covered the theory behind the shard keys, we can enable sharding on the database in our example cluster:

mongo --eval 'sh.enableSharding("shardtest");'

We need to configure the shard key separately by sharding the collection:

mongo --eval 'sh.shardCollection("shardtest.collection", {"_id":});'

Our shard key is set on the identifier field of the collection. Now if we would insert many large documents of data, this would give us a nicely sharded collection:

mongo shardtest --eval 'data="a";for (i=1; i < 10000; i++) { data=data+"a"; db.collection.insert({"_id": i, "data": data}); }; '

And this is what the shard distribution looks like after inserting all 10000 documents:

mongos> sh.status()
--- Sharding Status ---
...
  databases:
    {  "_id" : "shardtest",  "primary" : "sh1",  "partitioned" : true }
        shardtest.collection
            shard key: { "_id" : 1 }
            unique: false
            balancing: true
            chunks:
                sh1    4
                sh2    4
                sh3    3
            { "_id" : { "$minKey" : 1 } } -->> { "_id" : 2 } on : sh1 Timestamp(5, 1)
            { "_id" : 2 } -->> { "_id" : 3 } on : sh1 Timestamp(1, 2)
            { "_id" : 3 } -->> { "_id" : 839 } on : sh2 Timestamp(6, 1)
            { "_id" : 839 } -->> { "_id" : 1816 } on : sh2 Timestamp(2, 3)
            { "_id" : 1816 } -->> { "_id" : 2652 } on : sh3 Timestamp(4, 1)
            { "_id" : 2652 } -->> { "_id" : 3629 } on : sh3 Timestamp(3, 3)
            { "_id" : 3629 } -->> { "_id" : 4465 } on : sh1 Timestamp(4, 2)
            { "_id" : 4465 } -->> { "_id" : 5442 } on : sh1 Timestamp(4, 3)
            { "_id" : 5442 } -->> { "_id" : 6278 } on : sh2 Timestamp(5, 2)
            { "_id" : 6278 } -->> { "_id" : 7255 } on : sh2 Timestamp(5, 3)
            { "_id" : 7255 } -->> { "_id" : { "$maxKey" : 1 } } on : sh3 Timestamp(6, 0)

As we described in the theory earlier, we can enable a hashing algorithm in the shard key definition to prevent this:

mongo --eval 'sh.shardCollection("shardtest.collection", {"_id":"hashed"});'

If we now re-insert the same data, and look at the sharding distribution, we see it created very different ranges of values for the shard key:

mongos> sh.status()
--- Sharding Status ---
…
  databases:
    {  "_id" : "shardtest",  "primary" : "sh1",  "partitioned" : true }
        shardtest.collection
            shard key: { "_id" : "hashed" }
            unique: false
            balancing: true
            chunks:
                sh1    3
                sh2    4
                sh3    3
            { "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-6148914691236517204") } on : sh3 Timestamp(4, 0)
            { "_id" : NumberLong("-6148914691236517204") } -->> { "_id" : NumberLong("-4643409133248828314") } on : sh1 Timestamp(4, 1)
            { "_id" : NumberLong("-4643409133248828314") } -->> { "_id" : NumberLong("-3078933977000923388") } on : sh1 Timestamp(3, 12)
            { "_id" : NumberLong("-3078933977000923388") } -->> { "_id" : NumberLong("-3074457345618258602") } on : sh1 Timestamp(3, 13)
            { "_id" : NumberLong("-3074457345618258602") } -->> { "_id" : NumberLong(0) } on : sh2 Timestamp(3, 4)
            { "_id" : NumberLong(0) } -->> { "_id" : NumberLong("1545352804953253256") } on : sh2 Timestamp(3, 8)
            { "_id" : NumberLong("1545352804953253256") } -->> { "_id" : NumberLong("3067091117957092580") } on : sh2 Timestamp(3, 9)
            { "_id" : NumberLong("3067091117957092580") } -->> { "_id" : NumberLong("3074457345618258602") } on : sh2 Timestamp(3, 10)
            { "_id" : NumberLong("3074457345618258602") } -->> { "_id" : NumberLong("6148914691236517204") } on : sh3 Timestamp(3, 6)
            { "_id" : NumberLong("6148914691236517204") } -->> { "_id" : { "$maxKey" : 1 } } on : sh3 Timestamp(3, 7)

Conclusion

In the next part of the sharding series we will focus on what you need to know about monitoring and maintaining shards.

Tags:

We’re live next Tuesday, November 15th, with our webinar ‘Become a MongoDB DBA - Scaling and Sharding’!

Date, Time & Registration

Europe/MEA/APAC

Tuesday, November 15th at 09:00 GMT / 10:00 CET (Germany, France, Sweden)
Register Now

North America/LatAm

Tuesday, November 15th at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)
Register Now

Agenda

What are the differences in read and write scaling with MongoDB
Read scaling considerations with MongoDB
MongoDB read preference explained
How sharding works in MongoDB
Adding new shards and balance data
How to scale and shard MongoDB using ClusterControl
Live Demo

Speaker

We look forward to “seeing” you there!

Tags:

In previous posts of our “Become a MongoDB DBA” series, we covered Deployment, Configuration, Monitoring (part 1), Monitoring (part 2), backup, restore, read scaling and sharding (part 1).

Monitoring shards

Connections

You can keep an eye on the connection pool via the connPoolStats command:

mongos> db.runCommand( { "connPoolStats" : 1 } )
{"numClientConnections" : 10,"numAScopedConnections" : 0,"totalInUse" : 4,"totalAvailable" : 8,"totalCreated" : 23,"hosts" : {"10.10.34.11:27019" : {"inUse" : 1,"available" : 1,"created" : 1
        },"10.10.34.12:27018" : {"inUse" : 3,"available" : 1,"created" : 2
        },"10.10.34.15:27018" : {"inUse" : 0,"available" : 1,"created" : 1
        }
    },"replicaSets" : {"sh1" : {"hosts" : [
                {"addr" : "10.10.34.12:27002","ok" : true,"ismaster" : true,"hidden" : false,"secondary" : false,"pingTimeMillis" : 0
                }
            ]
        },
..."ok" : 1
}

Capacity planning

You can fetch the chunks per shard from the shard status command:

mongos> sh.status()
--- Sharding Status ---
…
databases:
{  "_id" : "shardtest",  "primary" : "sh1",  "partitioned" : true }
    shardtest.collection
        shard key: { "_id" : 1 }
        unique: false
        balancing: true
        chunks:
            sh1    1
            sh2    2
            sh3    1

mongos> use config
switched to db config
mongos> db.config.runCommand({aggregate: "chunks", pipeline: [{$group: {"_id": {"ns": "$ns", "shard": "$shard"}, "total_chunks": {$sum: 1}}}]})
{ "_id" : { "ns" : "test.usertable", "shard" : "mongo_replica_1" }, "total_chunks" : 330 }
{ "_id" : { "ns" : "test.usertable", "shard" : "mongo_replica_0" }, "total_chunks" : 328 }
{ "_id" : { "ns" : "test.usertable", "shard" : "mongo_replica_2" }, "total_chunks" : 335 }

This aggregate query is covered by a default index placed on the chunks collection, so this will not pose a big risk in querying it occasionally.

Non-sharded databases and collections

This short script will show you the non-sharded collections on the MongoDB command line client:

use config;
var shard_collections = db.collections.find();
var sharded_names = {};
while (shard_collections.hasNext()) {
    shard = shard_collections.next();
    sharded_names[shard._id] = 1;
}


var admin_db = db.getSiblingDB("admin");
dbs = admin_db.runCommand({ "listDatabases": 1 }).databases;
dbs.forEach(function(database) {
    if (database.name != "config") {
        db = db.getSiblingDB(database.name);
        cols = db.getCollectionNames();
        cols.forEach(function(col) {
            if( col != "system.indexes" ) {
                if( shard_names[database.name + "." + col] != 1) {
                   print (database.name + "." + col);
                }
            }
        });
    }
});

It first retrieves a list of all sharded collections and saves this for later usage. Then it loops over all databases and collections, and checks if they have been sharded or not.

Maintaining shards

Adding a shard

Adding a shard is really simple: create a new replicaSet, and once it is up and running just simply add it with the following command on one of the shard routers:

mongos> sh.addShard("<replicaset_name>/<host>:<port>")

It suffices to add one host of the replicaSet, as this will seed the shard router with a host it can connect to and detect the remaining hosts.

Removing a shard

Removing a shard will not be done often, as most people scale out their clusters. But just in case you ever need it, this section will describe how to remove a shard.

It is a bit harder to remove a shard than to add a shard, as this involves removing the data as well. To remove a shard, you need to find the name of the shard first.

mongos> db.adminCommand( { listShards: 1 } )
{"shards" : [
{ "_id" : "sh1", "host" : "sh1/10.10.34.12:27018" },
{ "_id" : "sh2", "host" : "sh2/10.10.34.15:27018" }
    ],"ok" : 1
}

Now we can request MongoDB to remove it, using the adminCommand:

mongos> db.adminCommand( { removeShard: "sh2" } )
{"msg" : "draining started successfully","state" : "started","shard" : "sh2","note" : "you need to drop or movePrimary these databases","dbsToMove" : [ ],"ok" : 1
}

To watch the progress you can run the removeShard command once more:

mongos> db.adminCommand( { removeShard: "sh2" } )
{"msg" : "draining ongoing","state" : "ongoing","remaining" : {"chunks" : NumberLong(2),"dbs" : NumberLong(0)
    },"note" : "you need to drop or movePrimary these databases","dbsToMove" : [ notsharded ],"ok" : 1
}

mongos> db.runCommand( { movePrimary : "notsharded", to : "sh1" } )

mongos> db.adminCommand( { removeShard: "sh2" } )
{"msg" : "removeshard completed successfully","state" : "completed","shard" : "sh2","ok" : 1
}

MongoDB Balancer

There are a couple of ways to influence the balancer. The balancer could create an additional IO workload on your data nodes, so you may wish to schedule the balancer to run only in off hours:

mongos> db.settings.update(
   { _id: "balancer" },
   { $set: { activeWindow : { start : "22:00", stop : "08:00" } } },
   { upsert: true }
)

mongos> sh.disableBalancing("mydata.hugearchive")
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Also during backups you need to disable the balancer to make a reliable backup, as otherwise chunks of data will be moved between shards. To disable the balancer, just run the following command:

mongos> sh.setBalancerState(false);
WriteResult({ "nMatched" : 0, "nUpserted" : 1, "nModified" : 0, "_id" : "balancer" })

Backups

Managing and monitoring MongoDB shards with ClusterControl

Adding shards

Within ClusterControl you can easily add new shards with a two step wizard, opened from the actions drop down:

Here you can define the topology of the new shard.

Once the new shard has been added to the cluster, the MongoDB shard router will use it to assign new chunks to, and the balancer will automatically balance all chunks over all the shards.

Removing shards

In case you need to remove shards, you can simply remove them via the actions drop down:

This will allow you to select the shard that you wish to remove, and the shard you wish to migrate any primary databases to:

Once all the data has been removed, it will remove the shard from the UI.

Consistent backups

In ClusterControl we have enabled the support for the Percona MongoDB Consistent backup tool:

Conclusion

In our next blog, we will dive into a couple of good sharding practices, recommended topologies and how to choose an effective sharding strategy.

Tags:

In this blogpost we look at the recent concerns around MongoDB ransomware and security issues, and how to mitigate this threat to your own MongoDB instance.

What is the news about?

Default binding

Default ports

Authentication

Authorization

Why is all this an issue now?

In December 2016 a hacker exploited these vulnerabilities for personal enrichment. The hacker steals and removes your data, and leaves the following message in the WARNING collection:

{"_id" : ObjectId("5859a0370b8e49f123fcc7da"),"mail" : "harak1r1@sigaint.org","note" : "SEND 0.2 BTC TO THIS ADDRESS 13zaxGVjj9MNc2jyvDRhLyYpkCh323MsMq AND CONTACT THIS EMAIL WITH YOUR IP OF YOUR SERVER TO RECOVER YOUR DATABASE !"
}

A MongoDB server is vulnerable when it has a combination of the following:

Bound to a public interface
Bound to a default port
No (or weak) authentication enabled
No firewall rules or security groups in place

The default port could be debatable. Any port scanner would also be able to identify MongoDB if it was placed under an obscured port number.

How can you protect yourself against this threat?

There are various precautions you can take:

Put firewall rules or security groups in place
Bind MongoDB only to necessary interfaces and ports
Enable authentication, users and roles
Backup often
Security audits

Single Console for Your Entire Database Infrastructure

Deploy, manage, monitor, scale your databases on the technology stack of your choice!

Install ClusterControl Book a Demo

Securing MongoDB

# network interfaces
net:
      port: 27017
      bindIp : [127.0.0.1,172.16.1.154]

Will this be enough to protect myself against any threat?

Tags:

What is ransomware?

Why is this happening

Ten tips on how to secure your MongoDB deployments

Tip 1: Enable authentication

This is the easiest solution to keep unwanted people outside: simply enable authentication in MongoDB. To explicitly do this, you will need to put the following lines in the mongod.conf:

security:
    Authentication: on

If you have set the replication keyfile in the mongod.conf, you will also implicitly enable authentication.

As most current attackers are after easy-to-hijack instances, they will not attempt to break into a password protected MongoDB instance.

Tip 2: Don’t use weak passwords

Setup passwords according to good and proven password guidelines, or make use of a strong password generator.

Tip 3: Authorize users by roles

When provisioning MongoDB from ClusterControl, we deploy new MongoDB replicaSets and sharded clusters with a separate admin and backup user.

Tip 4: Add a replication keyfile

Tip 5: Make backups regularly

ClusterControl has a very easy interface to schedule (shard consistent) backups, and restoring your data is only one click away.

Tip 6: Run MongoDB on a non-standard port

Standard ports are:

27017 (mongod / mongos)
27018 (mongod in sharded environments)
27019 (configservers)
2700x (some MongoDB tutorials)

This requires the following line to be added/changed in the mongod.conf and a restart is required:

net:
   port: 17027

In ClusterControl, you can deploy new MongoDB replicaSets and sharded clusters with custom port numbers.

Tip 7: Does your application require public access?

This requires the following line to be added/changed in the mongod.conf and a restart is required:

net:
   bindIp: 127.0.0.1

This requires the following line to be added/changed in the mongod.conf and a restart is required:

net:
   bindIp: 127.0.0.1,172.16.1.234

Tip 8: Enable firewall rules or security groups

It is a good practice to enable firewall rules on your hosts, or security groups with cloud hosting. Simply disallowing the MongoDB port ranges from outside will keep most attackers outside.

Also, if your application does require MongoDB to be available on the public interface, you can limit the hosts accessing the database by simply adding similar firewall rules.

Single Console for Your Entire Database Infrastructure

Deploy, manage, monitor, scale your databases on the technology stack of your choice!

Install ClusterControl Book a Demo

Tip 9: Go Hack Yourself ! Check for external connectivity

From external, check if you can connect to your host via telnet on the command line.

In case you did change the port number of MongoDB, use the appropriate port here.

telnet your.host.com 27017

If this command returns something similar to this, the port is open:

Trying your.host.com...
Connected to your.host.com.
Escape character is '^]'.

Another method of testing would be installing nmap on the host and testing it against your host:

[you@host ~]$ sudo yum install nmap
[you@host ~]$ nmap -p 27017 --script mongodb-databases your.host.com

If nmap is able to connect, you will see something similar to this:

PORT      STATE SERVICE REASON
27017/tcp open  unknown syn-ack
| mongodb-databases:
|   ok = 1
|   databases
|     1
|       empty = false
|       sizeOnDisk = 83886080
|       name = test
|     0
|       empty = false
|       sizeOnDisk = 83886080
|       name = yourdatabase
|     2
|       empty = true
|       sizeOnDisk = 1
|       name = local
|     2
|       empty = true
|       sizeOnDisk = 1
|       name = admin
|_  totalSize = 167772160

If you only enabled authentication, nmap is able to open the open port but not list the databases:

Starting Nmap 6.40 ( http://nmap.org ) at 2017-01-16 14:36 UTC
Nmap scan report for 10.10.22.17
Host is up (0.00031s latency).
PORT      STATE SERVICE
27017/tcp open  mongodb
| mongodb-databases:
|   code = 13
|   ok = 0
|_  errmsg = not authorized on admin to execute command { listDatabases: 1.0 }

And if you managed to secure everything from external, the output would look similar to this:

Starting Nmap 6.40 ( http://nmap.org ) at 2017-01-16 14:37 UTC
Nmap scan report for 10.10.22.17
Host is up (0.00013s latency).
PORT      STATE  SERVICE
27017/tcp closed unknown

If nmap is able to connect to MongoDB, with or without authentication enabled, it can identify which MongoDB version you are running with the mongodb-info flag:

[you@host ~]$ nmap -p 27017 --script mongodb-info 10.10.22.17

Starting Nmap 6.40 ( http://nmap.org ) at 2017-01-16 14:37 UTC
Nmap scan report for 10.10.22.17
Host is up (0.00078s latency).
PORT      STATE SERVICE
27017/tcp open  mongodb
| mongodb-info:
|   MongoDB Build info
|     javascriptEngine = mozjs
|     buildEnvironment
|       distmod =
|       target_arch = x86_64
…
|     openssl
|       running = OpenSSL 1.0.1e-fips 11 Feb 2013
|       compiled = OpenSSL 1.0.1e-fips 11 Feb 2013
|     versionArray
|       1 = 2
|       2 = 11
|       3 = -100
|       0 = 3
|     version = 3.2.10-3.0
…
|   Server status
|     errmsg = not authorized on test to execute command { serverStatus: 1.0 }
|     code = 13
|_    ok = 0

Tip 10: Check for excessive privileges

For example:

use mydatastore
db.createUser(
  {
    user: "user",
    pwd: "password",
    roles: [ { role: "readWrite", db: "mysdatastore" },
             { role: "readWrite", db: "admin" } ]
  }
);

This is a good reason to review the privileges of your users on a regular basis. You can do this by the following command:

my_mongodb_0:PRIMARY> use mydatastore
switched to db mydatastore
my_mongodb_0:PRIMARY> db.getUsers();
[
    {"_id" : "mysdatastore.user","user" : "user","db" : "mysdatastore","roles" : [
            {"role" : "readWrite","db" : "mysdatastore"
            },
            {"role" : "readWrite","db" : "admin"
            }
        ]
    }
]

That’s all folks! Do not hesitate to get in touch if you have any questions on how to secure your database environment.

Tags:

Our latest release of ClusterControl turns some of the most troublesome MongoDB tasks into a mere 15 second job. New features have been added to give you more control over your cluster and perform topology changes:

Convert a MongoDB replicaSet to a sharded MongoDB Cluster
Add and remove shards
Add shard routers to a sharded MongoDB cluster
Step down or freeze a node
New MongoDB advisors

We will describe these added features in depth below.

Convert a MongoDB replicaSet to a sharded MongoDB cluster

As most MongoDB users will start off with a replicaSet to store their database, this is the most frequently used type of cluster. If you happen to run into scaling issues you can scale this replicaSet by either adding more secondaries or scaling out by sharding. You can convert an existing replicaSet into a sharded cluster, however this is a long process where you could easily make errors. In ClusterControl we have automated this process, where we automatically add the Configservers, shard routers and enable sharding.

To convert a replicaSet into a sharded cluster, you can simply trigger it via the actions drop down:

This will open up a two step dialogue on how to convert this into a shard. The first step is to define where to deploy the Configserver and shard routers to:

The second step is where to store the data and which config files should be used for the Configserver and shard router.

After the shard migration job has finished, the cluster overview now displays shards instead of replicaSet instances:

After converting to a sharded cluster, new shards can be added.

Add or remove shards from a sharded MongoDB cluster

Adding shards

As a MongoDB shard is technically a replicaSet, adding a new shard involves the deployment of a new replicaSet as well. Within ClusterControl we first deploy a new replicaSet and then add it to the sharded cluster.

From the ClusterControl UI, you can easily add new shards with a two step wizard, opened from the actions drop down:

Here you can define the topology of the new shard.

Once the new shard has been added to the cluster, the MongoDB shard router will start to assign new chunks to it, and the balancer will automatically balance all chunks over all the shards.

Removing shards

Removing shards is a bit harder than to add a shard, as this involves moving the data to the other shards before removing the shard itself. For all data that has been sharded over all shards, this will be a job performed by the MongoDB balancer.

However any non-sharded database/collection, that was assigned this shard as its primary shard, needs to be moved to another shard and made its new primary shard. For this process, MongoDB needs to know where to move these non-sharded databases/collections to.

In ClusterControl you can simply remove them via the actions drop down:

This will allow you to select the shard that you wish to remove, and the shard you wish to migrate any primary databases to:

Once all the data has been removed, it will remove the shard from the UI.

Adding additional MongoDB shard routers

Once you start to scale out your application using a MongoDB sharded cluster, you may find you are in need of additional shard routers.

Adding additional MongoDB shard routers is a very simple process with ClusterControl, just open the Add Node dialogue from the actions drop down:

This will add a new shard router to the cluster. Don’t forget to set the proper default port (27017) on the router.

Step down server

In case you wish to perform maintenance on the primary node in a replicaSet, it is better to have it first “step down” in a graceful manner before taking it offline. Stepping down a primary basically means the host stops being a primary and becomes a secondary and is not eligible to become a primary for a set number of seconds. The nodes in the MongoDB replicaSet with voting power, will elect a new primary with the stepped down primary excluded for the set number of seconds.

In ClusterControl we have added the step down functionality as an action on the Nodes page. To step down, simply select this as an action from the drop down:

After setting the number of seconds for stepdown and confirming, the primary will step down and a new primary will be elected.

Freeze a node

This functionality is similar to the step down command: this makes a certain node ineligible to become a primary for a set number of seconds. This means you could prevent one or more secondary nodes to become a primary when stepping down the primary, and force a certain node to become the new primary this way.

In ClusterControl we have added the freeze node functionality as an action on the Nodes page. To freeze a node, simply select this as an action from the drop down:

After setting the number of seconds and confirming, the node will not be eligible as primary for the set number of seconds.

New MongoDB advisors

Advisors are mini programs that provide advice on specific database issues. We’ve added three new advisors for MongoDB. The first one calculates the replication window, the second watches over the replication window, and the third checks for un-sharded databases/collections.

MongoDB Replication Lag advisor

Replication lag is very important to keep an eye on, if you are scaling out reads via adding more secondaries. MongoDB will only use these secondaries if they don’t lag too far behind. If the secondary has replication lag, you risk serving out stale data that already has been overwritten on the primary.

To check the replication lag, it suffices to connect to the primary and retrieve this data using the replSetGetStatus command. In contrary to MySQL, the primary keeps track of the replication status of its secondaries.

We have implemented this check into an advisor in ClusterControl, to ensure your replication lag will always be watched over.

MongoDB Replication Window advisor

Just like the replication lag, the replication window is an equally important metric to look at. The lag advisor already informs us of the number of seconds a secondary node is behind the primary/master. As the oplog is limited in size, having slave lag imposes the following risks:

If a node lags too far behind, it may not be able to catch up anymore as the transactions necessary to catch up are no longer in the oplog of the primary.
A lagging secondary node is less favoured in a MongoDB election for a new primary. If all secondaries are lagging behind in replication, you will have a problem and one with the least lag will be made primary.
Secondaries lagging behind are less favoured by the MongoDB driver when scaling out reads with MongoDB, it also adds a higher workload on the remaining secondaries.

If we would have a secondary node lagging behind by a few minutes (or hours), it would be useful to have an advisor that informs us how much time we have left before our next transaction will be dropped from the oplog. The time difference between the first and last entry in the oplog is called the Replication Window. This metric can be created by fetching the first and last items from the oplog, and calculating the difference of their timestamps.

In the MongoDB shell, there is already a function available that calculates the replication window for you. However this function is built into the command line shell, so any outside connection not using the command line shell will not have this built-in function. Therefore we have made an advisor that will watch over the replication window and alerts you if you exceed a pre-set threshold.

MongoDB un-sharded databases and collections advisor

Non-sharded databases and collections will be assigned to a default primary shard by the MongoDB shard router. This means the database or collection is limited to the size of this primary shard, and if written to in large volumes, could use up all remaining disk space of a shard. Once this happens the shard will obviously no longer function. Therefore it is important to watch over all existing databases and collections, and scan the config database to validate that they have been enabled for sharding.

To prevent this from happening, we have created an un-sharded database and collection advisor. This advisor will scan every database and collection, and warn you if it has not been sharded.

ClusterControl improved the MongoDB maintainability

We have made a big step by adding all the improvements to ClusterControl for MongoDB replicaSets and sharded clusters. This improves the usability for MongoDB greatly, and allows DBAs, sysops and devops to maintain their clusters even better!

Tags:

As we continue to announce all the great new features in ClusterControl we have been developing for MongoDB we always want to take a moment to look back at some of the top content from the recent past that can help you securely deploy and manage your MongoDB instances.

Top MongoDB Blogs

Over the past year Severalnines has been developing blog content targeted at helping the MySQL DBA learn about MongoDB and how to integrate it into their database infrastructures. In our blog series “Become a MongoDB DBA (if you’re really a MySQL DBA)” we covered an array of topics to help you maximize your MongoDB knowledge

MongoDB Provisioning and Deployment If you are a MySQL DBA you may ask yourself why you would install MongoDB? That is actually a very good question as MongoDB and MySQL were in a flame-war a couple of years ago. But there are many cases where you simply have to.
MongoDB: The Basics of Configuration This blog covers configuration of MongoDB, especially around ReplicaSet, security, authorization, SSL and HTTP / REST api.
MongoDB: Monitoring and Trending Part 1 Part 2 These blogs will give a primer in monitoring MongoDB: how to ship metrics using free open source tools. Part two will deep dive into monitoring MongoDB, what metrics to pay attention to and why.
MongoDB: Backing Up Your Data How to make a good backup strategy for MongoDB, what tools are available and what you should watch out for.
MongoDB: Recovering Your Data This blog covered how to recover MongoDB using a backup.
MongoDB: How to Scale Reads The first in the series on how to scale MongoDB
MongoDB: Sharding Ins and Outs Part 1 Part 2 These blogs cover how to shard your MongoDB databases and the theory behind it. How to monitor the shards to make sure they are performing, and that data is distributed evenly between your shards and how to consistently backup your data across shards.

Top MongoDB Webinars

If you are in the mood for some deep dives into the technical world of MongoDB check out our top MongoDB webinar replays below and make sure to signup for our next webinar “How to Secure MongoDB” scheduled for March 14, 2017

Become a MongoDB Webinar So, maybe you’ve been working with MySQL for a while and are now being asked to also properly maintain one or more MongoDB instances. It is not uncommon that MySQL DBAs, developers, network/system administrators or DevOps folks with general backgrounds, find themselves in this situation at some point in time. In fact, with more organisations operating polyglot environments, it’s starting to become commonplace.
What to Monitor in MongoDB Webinar To operate MongoDB efficiently, you need to have insight into database performance. And with that in mind, we’ll dive into monitoring in this second webinar in the ‘Become a MongoDB DBA’ series. MongoDB offers many metrics through various status overviews and commands, but which ones really matter to you? How do you trend and alert on them? What is the meaning behind the metrics?
Scaling and Sharding with MongoDB Webinar In this third webinar of the ‘Become a MongoDB DBA’ series, we've focused on scaling and sharding your MongoDB setup.
Management and Automation of MongoDB Clusters This webinar gives you the tools to more effectively manage your MongoDB cluster, immediately. The presentation includes code samples and a live Q&A session.

Securing MongoDB

The recent ransom hack has shown a vulnerability in default deployments of MongoDB. While ClusterControl solves these issues by automatically providing the security people need to stay protected here are some other blogs we previously released to help keep you secure.

How to Secure MongoDB from Ransomware - 10 Tips Need tips on how to secure your MongoDB setup and protect yourself against ransomware? We have collected ten tips that are easy to follow and execute on your MongoDB databases.
Secure MongoDB and Protect Yourself from the Ransom Hack Recently, several attackers were able to break into thousands of MongoDB systems, wipe the databases and leave a ransom note. This could have been prevented if those in charge would have followed some standard security procedures. This blog post describes how to protect yourself from MongoDB ransomware. What is it, why is it a problem and what can you do to protect yourself against it?

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Using MongoDB with ClusterControl

The ClusterControl team has spent the last year developing a full array of expanded features for MongoDB to provide developers and DBAs an alternative system for deploying and managing their infrastructures. Here are some useful resources for deploying and managing MongoDB using ClusterControl.

We have a lot more exciting and technical content for MongoDB in the works so follow us on our ClusterControl LinkedIn Page for even more information.

Tags:

MongoDB

clustercontrol