Quantcast
Channel: Severalnines - MongoDB
Viewing all 206 articles
Browse latest View live

Video: How to Secure MongoDB

$
0
0

Recently the IT world has seen a lot of news about tens of thousands of unsecured MongoDB instances that have been hacked and held for ransom.

In his blog “How to Secure MongoDB from Ransonware - Ten Tips” Severalnines Support Engineer Art van Scheppingen gave us some great ways to help to secure your data and prevent hackers from invading and ransoming your MongoDB instances.  In the video below you will find Art’s ten tips and some more information about how ClusterControl provides automated security to help you stay safe.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

How MongoDB Database Automation Improves Security

$
0
0

The growing number of cyberattacks on open source database deployments highlights the industry’s poor administrative and operational practices.

If 2016 taught us anything, it was the importance of sound operational practices and security measures in open source database deployments. For several years, researchers had warned about publicly exposed databases - with estimates ranging in the tens of thousands of servers. If the scale of the problem had not been apparent or frightening, well it surely is now.

Recently, ransomware groups deleted over 10,000 MongoDB databases within just a few days. Other open source databases (ElasticSearch, Hadoop, CouchDB) were also hit. Meanwhile, the number of exposed databases has gone up to about 100,000 instances.

What has led to this? Open source databases, and open source software in general, power a significant portion of today’s online services. Thanks to the increased use of agile development lifecycles, the cloud has become home to a variety of applications that are quickly deployed. Many businesses have also moved beyond using the cloud for non-critical functions, and now rely on the cloud for storing valuable data. This means more databases are being deployed in public clouds, in environments that are directly exposed to the Internet.

MongoDB in particular is very popular amongst developers, because of its convenience and expediency. But here’s the problem - quickly spinning up an environment for development is not the same thing as setting up for live production. They both demand very different levels of expertise. Thousands of database instances were not secured, and anyone could get read and write access to the databases (including any sensitive data) without any special tools or without having to circumvent any security measures. This is not a lapse of concentration from a few individuals that got us here, we’re facing a problem which is more widespread than anyone could imagine. We need to recognise that finding the middle ground between ease of use, speed of deployment and operational/security readiness is hard to find. So this begs the question - how can we collectively get beyond this type of problem?

If we could train every single individual who deploys MongoDB into a deployment engineer, it might help. At least, there will be some level of protection so not just anyone can walk in through an open door.

Operations is not rocket science, but it might not be reasonable to expect all developers, who are the primary users of MongoDB, to turn into full-fledged systems/deployment engineers. The IT industry is moving towards faster, leaner implementations and deployment of services. The middle ground between ease of use, deployment speed and sound operational practices might seem even further away. Automation might just be the thing that helps us find that middle ground.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Database configurations suitable for production tend to be a bit more complex, but once designed, they can be duplicated many times with minimal variation.

Automation can be applied to initial provisioning and configuration, as well as ongoing patching, backups, anomaly detection and other maintenance activities. This is the basis for our own automation platform for MongoDB, ClusterControl. A well deployed and managed system can mitigate operational risk, and would certainly have prevented these thousands of databases from getting hacked.

MongoDB Tutorial - Monitoring & Securing MongoDB with ClusterControl Advisors

$
0
0

Database ops management consists of 80% reading and interpreting your monitoring systems. Hundreds of metrics can be interpreted and combined in various ways to give you deep insights into your database systems and how to optimize them. When running multiple database systems, the monitoring of these systems can become quite a chore. If the interpretation and combination of metrics takes a lot of time, wouldn’t it be great if this could be automated in some way?

This is why we created database advisors in ClusterControl: small scripts that can interpret and combine metrics for you, and give you advice when applicable. For MySQL we have created an extensive library of the most commonly used MySQL monitoring checks. But also for MongoDB we have a broad library of advisors to your disposal. For this blog post, we have picked the nine most important ones for you. We’ll describe each and every one of them in detail.

The nine MongoDB advisors we will cover in this blog post are:

  • Disk mount options check
  • Numa check
  • Collection lock percentage (MMAP)
  • Replication lag
  • Replication Window
  • Un-sharded databases and collections (sharded cluster only)
  • Authentication enabled check
  • Authentication/authorization sanity check
  • Error detection (new advisor)

Disk mount options advisor

It is very important to have your disks mounted in the most optimal way. With the ClusterControl disk mount options advisor, we look more closely at your data disk on a daily basis. In this advisor, we investigate the filesystem used, mount options and the io scheduler settings of the operating system.

We check if your disks have been mounted with noatime and nodiratime. Setting these will decrease the performance of the disks, as on every access to a file or directory the access time has to be written to disk. Since this happens continuously on databases, this is a good performance setting and also increases the durability of your SSDs.

For file systems we recommend to use modern file systems like xfs,zfs,ext4 or btrfs. These file systems are created with performance in mind. The io scheduler is advised to be either on noop or deadline.Deadline has been the default for databases for years, but thanks to faster storage like SSDs the noop scheduler is making more sense nowadays.

Numa check advisor

For MongoDB we enable our NUMA check advisor. This advisor will check if NUMA (Non-Uniform Access Memory) has been enabled on your system, and if this is the case, to switch it off.

When Non-Uniform Access Memory has been enabled, the CPU of the server is only able to address its own memory directly and not the other CPUs in the machine. This way the CPU is only able to allocate memory from its own memory space, and allocating anything in excess will result in swap usage. This architecture has a strong performance benefit on multi-processor applications that allocate all CPUs, but as MongoDB isn’t a multi-processor application it will decrease the performance greatly and could lead to huge swap usage.

Collection lock percentage (MMAP)

As MMAP is a file based storage system, it doesn’t support the document level locking as found in WiredTiger and RocksDB. Instead the lowest level of locking for MMAP is the collection locks. This means any writes to a collection (insert, update or delete) will lock the entire collection. If the percentage of locks is getting too high, this indicates you have contentions problems on the collection. When not addressed properly, this could bring your write throughput to a grinding halt. Therefore having an advisor warning you up front is very helpful.

MongoDB Replication Lag advisor

If you are scaling out MongoDB for reads via secondaries, the replication lag is very important to keep an eye on. The MongoDB client drivers will only use secondaries that don’t lag too far behind, else you may risk serving out stale data.

Inside MongoDB the primary will keep track of the replication status of its secondaries. The advisor will fetch the replication information and guards the replication lag. If the lag becomes too high it will send out a warning or critical status message.

MongoDB Replication Window advisor

Next to replication lag, the replication window is an important metric to watch. The MongoDB oplog is a single collection, that has been limited in a (preset) size. Once the oplog reaches the end and a new transaction needs to be stored, it will evict the oldest transaction to make room for the new transaction. The replication window reflects the number of seconds between the oldest and newest transaction in the oplog.

This metric is very important as you need to know for how long you can take a secondary out of the replicaSet, before it will no longer be able to catch up with the master due to being too far behind in replication. Also if a secondary starts lagging behind, it would be good to know how long we can tolerate this before the secondary is no longer able to catch up.

In the MongoDB shell, a function is available to calculate the replication window. This advisor in ClusterControl uses the function to make the same calculation. The benefit would be that you now can be alerted on a too short replication window.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

MongoDB un-sharded databases and collections advisor

In a sharded MongoDB cluster, all un-sharded databases and collections are assigned to a default primary shard by the MongoDB shard router. This primary shard can vary between the databases and collections, but in general this would be the shard with the most disk space available.

Having a un-sharded database or collection doesn’t immediately pose a risk for your cluster. However if an application or user starts to write large volumes of data to one of these, the primary shard could fill up quickly and create an outage on this shard. As the database or collection is not sharded, it will not be able to make use of other shards.

Because of this reason we have created an advisor that will prevent this from happening. The advisor will scan all databases and collections, and warn you if it has not been sharded.

Authentication enabled check

Without enabling authentication in MongoDB, any user logging in will be treated as an admin. This is a serious risk as normally admin tasks, like creating users or making backups, now have become available to anyone. This combined with exposed MongoDB servers, resulted in the recent MongoDB ransom hacks, while a simple enabling of authentication would have prevented most of these cases.

We have implemented an advisor that verifies if your MongoDB servers have authentication enabled. This can be done explicitly by setting this in the configuration, or implicitly by enabling the replication keyfile. If this advisor fails to detect the authentication has been enabled you should immediately act upon this, as you server is vulnerable to be compromised.

Authentication/authorization sanity check

Next to the authentication enabled advisor, we also have built an advisor that performs a sanity check for both authentication and authorization in MongoDB.

In MongoDB the authentication and authorization is not placed in a central location, but is performed and stored on database level. Normally users will connect to the database, authenticating against the database they intend to use. However, with the correct grants, it is also possible to authenticate against other (unrelated) databases and still make use of another database. Normally this is perfectly fine, unless a user has excessive rights (like the admin role) over another database.

In this advisor, we verify if these excessive roles are present, and if they could pose a threat. We also check at the same time for weak and easy to guess passwords.

Error detection (new advisor)

In MongoDB, any error encountered will be counted or logged. Within MongoDB there is a big variety of possible errors: user asserts, regular asserts, warnings and even internal server exceptions. If there are trends in these errors, it is likely that there is either a misconfiguration or an application issue.

This advisor will look at the statistics of MongoDB errors (asserts) and makes sense of this. We interpret the trends found and advice on how to decrease errors in your MongoDB environment!

MongoDB Webinar - How to Secure MongoDB with ClusterControl

$
0
0

Join us for our new webinar on “How to secure MongoDB with ClusterControl” on Tuesday, March 14th!

In this webinar we will walk you through the essential steps necessary to secure MongoDB and how to verify if your MongoDB instance is safe.

How to secure MongoDB with ClusterControl

The recent MongoDB ransom hijack caused a lot of damage and outages, which could have been prevented with maybe two or three simple configuration changes. MongoDB offers a lot of security features out of the box, however it disables them by default.

In this webinar, we will explain which configuration changes are necessary to enable MongoDB’s security features, and how to test if your setup is secure after enabling. We will demonstrate how ClusterControl enables security on default installations. And we will discuss how to leverage the ClusterControl advisors and the MongoDB Audit Log to constantly scan your environment, and harden your security even more.

Date, Time & Registration

Europe/MEA/APAC

Tuesday, March 14th at 09:00 GMT / 10:00 CET (Germany, France, Sweden)

Register Now

North America/LatAm

Tuesday, March 14th at 09:00 Pacific Time (US) / 12:00 Eastern Time (US)

Register Now

Agenda

  • What is the MongoDB ransom hack?
  • What other security threats are valid for MongoDB?
  • How to enable authentication / authorisation
  • How to secure MongoDB from ransomware
  • How to scan your system
  • ClusterControl MongoDB security advisors
  • Live Demo

Speaker

Art van Scheppingen is a Senior Support Engineer at Severalnines. He’s a pragmatic database expert with over 16 years experience in web development. He previously worked at Spil Games as Head of Database Engineering, where he kept a broad vision upon the whole database environment: from MySQL to MongoDB, Vertica to Hadoop and from Sphinx Search to SOLR. He regularly presents his work and projects at various conferences (Percona Live, MongoDB Open House, FOSDEM) and related meetups.

We look forward to “seeing” you there!

This session is based upon the experience we have securing MongoDB and implementing it for our database infrastructure management solution, ClusterControl. For more details, read through our ‘Become a MongoDB DBA’ blog series.

Pre-emptive security with audit logging for MongoDB

$
0
0

Database security is a broad subject that stretches from pre-emptive measurements to keeping unwanted visitors out. Even if you would be able to secure your MongoDB servers fully, you still would like to know if anyone has attempted to break into your system. And if they manage to breach your security and install the MongoDB ransom hack, you would need an audit trail for post-mortems or for taking new preventive measures. An audit log would enable you to keep track of anyone attempting to log in and see what they did in your system.

The MongoDB Enterprise version contains the ability to enable the audit log, but the community version lacks this functionality. Percona created their own audit logging functionality in their MongoDB derived Percona Server for MongoDB. The MongoDB and Percona approaches are different from each other and we will explain how to configure and use both of them.

MongoDB audit logging

The MongoDB audit log is easy to set up: to enable audit logging to a JSON file, simply add the following section to your config file and restart MongoDB:

auditLog:
   destination: file
   format: JSON
   path: /var/lib/mongodb/auditLog.json

MongoDB supports file, console and syslog as destinations. For file formats there are two options: JSON and BSON. In JSON the audit log lines look similar to this:

{ "atype" : "authCheck", "ts" : { "$date" : "2017-02-15T22:20:08.322-0000" }, "local" : { "ip" : "127.0.0.1", "port" : 27017 }, "remote" : { "ip" : "127.0.0.1", "port" : 63357 }, "users" : [], "roles" : [], "param" : { "command" : "update", "ns" : "test.inserttest", "args" : { "update" : "preauth_case", "updates" : [ { "q" : { "createdByUserId" : -2 }, "u" : { "$set" : { "statusCode" : "Update" } }, "multi" : false, "upsert" : false } ], "ordered" : true } }, "result" : 0 }

The configuration above would enable the audit log for each and every action by any user of your system. If you have high concurrency this would dramatically decrease the performance of your MongoDB cluster. Luckily enough, there is the option to filter events that are to be logged.

Filters for the audit logging can be placed on the type of query, the user/role querying or on the collection that is being queried. The documentation on audit logging at MongoDB is very broad and lengthy with many examples. We will give some of the most important examples below.

Authenticating against a specific collection:

    filter: '{ atype: "authenticate", "param.db": "test" }'

Log for multiple audit types:

    filter: '{ atype: { $in [ "update", "delete" ]}, "param.db": "test" }'

Log all authentication checks for insert/updates/deletes on a specific collection:

    filter: '{ atype: "authCheck", "param.ns": "test.orders", "param.command": { $in: [ "find", "insert", "delete", "update", "findandmodify" ] } }'

As you can see the filters can be quite flexible, and you would be able to filter the messages that you require for your audit trail.

Percona Server for MongoDB audit logging

The Percona Server for MongoDB audit logging is limited to JSON file. The majority of users will only log to JSON files, but it is unclear if Percona will add other logging facilities in the future.

Depending on the version of Percona Server for MongoDB, your configuration might be different. At the moment of writing, all versions have the following syntax:

audit:
   destination: file
   format: JSON
   path: /var/lib/mongodb/auditLog.json

However this configuration difference has recently been resolved, but still has to be released. After the release it should follow the MongoDB auditLog directive again:

auditLog:
   destination: file
   format: JSON
   path: /var/lib/mongodb/auditLog.json

The format by Percona is slightly different:

{ "atype" : "authenticate", "ts" : { "$date" : { "$numberLong" : "1487206721903" } }, "local" : { "host" : "n3", "port" : 27017 }, "remote" : { "host" : "172.16.140.10", "port" : 50008 }, "users" : [ { "user" : "admin", "db" : "admin" } ], "params" : { "user" : "admin", "db" : "admin", "mechanism" : "SCRAM-SHA-1" }, "result" : 0 }

As opposed to MongoDB logging everything, Percona chose to only log the important commands. Judging from the source of the Percona audit plugin, the following queries are supported: authentication, create/update/delete users, add/update/remove roles, create/drop database/index and most of the important admin commands.

Also the filtering of the Percona Server for MongoDB audit log doesn’t seem to follow the same standard as MongoDB has. It is quite unclear what the exact filter syntax and options are as the Percona documentation is very concise about it.

Enabling the auditlog without filtering would give you more than enough entries in your log file. From here you can narrow the filter down, as it follows the JSON syntax of the log entries.

Making use of the audit log

To make it easier for yourself, it might be the best to feed the audit log into a log analysing framework. An ELK stack is an excellent environment to do your analysis in and it enables you to drill down to more detailed levels quite easily. Using a field mapper would even allow you to do the audit trail inside ELK.

As described in the introduction, we can use the audit log for various security purposes. The most obvious one is when you need it as a reference during a post-mortem. The MongoDB audit log provides a detailed overview of what exactly happened. The Percona audit log contains a little less information, but it should be sufficient for most post-mortems. Using the audit log for post-mortems is great, although we would rather have prevented the issue in the first place.

Another purpose for the audit log is to see trends happening and you could set traps on a certain audit log message. A good example would be to check the rate of (failed) authentication attempts and if this exceeds a certain threshold, act upon it. Depending on the situation, the action taken could differ. One actions could be to automatically block the ip address the requests are coming from, but in another case, you could consult with the user about why the password was forgotten. It really depends on the case and environment you are working in.

Another advanced pre-emptive measurement would be using MongoDB as a honeypot and leveraging the audit log to catch unwanted users. Just expose MongoDB and allow anyone to connect to your MongoDB server. Then use the audit log to detect what users will do if they are allowed to do things beyond their normal powers and block them if necessary. Most humans rather take the easy way than the hard way, so the honeypot could deflect an attack and the hacker will move on to the next target.

Conclusion

Apart from explaining how to set up the audit log for both MongoDB Enterprise and Percona Server for MongoDB, we also explained what you could potentially do with the logged data inside the audit log.

By default ClusterControl will not enable the audit log, but it is relatively easy to enable it cluster wide using our Config Manager. You could also enable it inside the configuration templates, prior to deploying a new cluster.

Happy clustering!

MongoDB tools from the community that complement ClusterControl

$
0
0

Since MongoDB is the favored database for many developers, it comes to no surprise that the community support is excellent. You can quickly find answers to most of your problems on knowledge sites like Stack Overflow, but the community also creates many tools, scripts and frameworks around MongoDB.

ClusterControl is part of the community tools that allow you to deploy, monitor, manage and scale any MongoDB topology. ClusterControl is designed around the database lifecycle, but naturally it can’t cover all aspects of a development cycle. This blog post will cover a selection of community tools that can be used to complement ClusterControl in managing a development cycle.

Schema management

The pain of schema changes in conventional RDBMS was one of the drivers behind the creation of MongoDB: we all suffered from painfully slow or failed schema migrations. Therefore MongoDB has been developed with a schemaless document design. This allows you to change your schema whenever you like, without the database holding you back.

Schema changes are generally made whenever there is application development. Adding new features to existing modules, or creating new modules may involve the creation of another version of your schema. Also schema and performance optimizations may create new versions of your schemas.

Even though many people will say it’s brilliant not being held back by the database, it also brings a couple of issues as well: since old data is not migrated to the new schema design, your application should be able to cope with every schema version you have in your database. Alternatively you could update all (old) data with the newer schema right after you have deployed the application.

The tools discussed in this section will all be very helpful in solving these schema issues.

Meteor2 collection

The Meteor2 collection module will ensure that from both client and server side, the schema will be validated. This will ensure that all data gets written according to the defined schema. The module will only be reactive, so whenever data does not get written according to the schema, a warning will be returned.

Mongoose

Mongoose is Node.js middleware for schema modelling and validation. The schema definition is placed inside your Node.js application, and this will allow Mongoose to act as an ORM. Mongoose will not migrate existing data into the new schema definition.

MongoDB Schema

So far we only have spoken about schema changes, so it is time to introduce MongoDB Schema. MongoDB Schema is a schema analyzer that will take a (random) sample of your data and output the schema for the sampled data. This doesn’t necessarily mean it will be 100% accurate on its schema estimation though.

With this tool you could regularly check your data against your schema and detect important or unintentional changes in your schema.

Backups

ClusterControl supports two implementations for backing up MongoDB: mongodump and Percona Consistent Backup. Still, some less regular used functionalities, like partial/incremental backups and streaming backups to other clusters, will not be available out of the box.

MongoDB Backup

MongoDB Backup is a NodeJS logical backup solution that offers similar functionality as mongodump. In addition to this, it can also stream backups over the network, making it useful for transporting a collection from one MongoDB instance to another.

Another useful feature is that it has been written in NodeJS. This means it will be very easy to integrate in a Hubot chatbot, and automate the collection transfers. Don’t be afraid if your company isn’t using Hubot as a chatbot: it can also function as either a webhook or be controlled via the CLI.

Mongob

Mongob is another logical backup solution, but in this case it has been written in Python and is only available as a CLI tool. Just like MongoDB Backup, it is able to transfer databases and collections between MongoDB instances, but in addition to that, it can also limit the transfer rate.

Another useful feature of Mongob is that it will be able to create incremental backups. This is good if you wish to have more compact backups, but also if you need to perform a point in time recovery.

MongoRocks Strata

MongoRocks Strata is the backup tool for the MongoRocks storage engine. Percona Server for MongoDB includes the MongoRocks storage engine, however it lacks the Strata backup tool for making file level backups. In principle mongodump and Percona Consistent Backup are able to make reliable backups, but as they are logical dumps the recovery time will be long.

MongoRocks is a storage engine that relies on a LSM tree architecture. This basically means it is an append only storage. To be able to do this, it operates with buckets of data: older data will be stored in larger (archive) buckets, recent data will be stored in smaller (recent) buckets and all new incoming data will be written into a special memory bucket. Every time a compaction is done, data will trickle down from the memory bucket to the recent buckets, and recently changed data back to the archive bucket.

To make a backup of all buckets, Strata instructs MongoDB to flush the memory bucket to disk, and then it copies all buckets of data on file level. This will create a consistent backup of all available data. It will also be possible to instruct Strata to only copy the recent buckets and effectively take an incremental backup.

Another good point of Strata is that it provides the mongoq binary, that allows you to query the backups directly. This means there is no need to restore the backup to a MongoDB instance, before being able to query it. You would be able to leverage this functionality to ship your production data offline to your analytics system!

MongoDB GUIs

WIthin ClusterControl we allow querying the MongoDB databases and collections via advisors. These advisors can be developed in the ClusterControl Developer Studio interface. We don’t feature a direct interface with the databases, so to make changes to your data you will either need to log into the MongoDB shell, or have a tool that allows you to makes these changes.

PHPMoAdmin

PHPMoAdmin is the MongoDB equivalent of PHPMyAdmin. It features similar functionality as PHPMyAdmin: data and admin management. The tool will allow you to perform CRUD operations in both JSON and PHP syntax on all databases and collections. Next to all that, it also features an import/export functionality of your current data selection.

Mongo-Express

If you seek a versatile data browser, Mongo-Express is a tool you definitely need to check out. Not only does it allow similar operations as PHPMoAdmin, it also is able to display images and videos inline. It even supports fetching large objects from GridFS buckets.

Robomongo

The tool that goes one step further is Robomongo. Being a crowd funded tool, the feature list is huge. It is able to perform all the same operations as Mongo-Express, but in addition to this also allows user, role and collection management. For connections it supports direct MongoDB connections, but also supports replicaSet topologies and MongoDB Atlas instances.

Conclusion

With this selection of free community tools, we hope we have given you a good overview how to manage MongoDB data next to ClusterControl.

Happy clustering!

Video: The Difference Between MongoDB Sharding and a MongoDB ReplicaSet

$
0
0

In this video we will demonstrate the differences between deploying a MongoDB ReplicaSet versus deploying a MongoDB Sharded Cluster in ClusterControl.  

With the new version of ClusterControl 1.4, you can automatically and securely deploy sharded MongoDB clusters or Replica Sets with ClusterControl’s free community version; as well as automatically convert a Replica Set into a sharded cluster.

The video also demonstrates the different overview sections of each type of deployment within ClusterControl as well as how to add nodes and convert the ReplicaSet into a Sharded Cluster.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

What is a MongoDB ReplicaSet?

A replica set is a group of MongoDB instances that host the same data set. In a replica, one node is a primary node that receives all write operations. All other instances, such as secondaries, apply operations from the primary so that they have the same data set.

What is a MongoDB Sharded Cluster?

MongoDB Sharding is the process of storing data records across multiple machines and it is MongoDB's approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput.

Become a MongoDB DBA - New Whitepaper

$
0
0

We’re happy to announce that our new whitepaper Become a MongoDB DBA: Bringing MongoDB to Production is now available to download for free!

This guide discusses the different versions of MongoDB that are available to you and how, once selected, you can take it from deployment to production. It also describes things you need to take into consideration like security and management.

Topics included in this whitepaper are…

  • Choosing the right MongoDB version
  • Securing MongoDB
  • Monitoring and Trending
  • Backup and Recovery
  • Scaling MongoDB

While MongoDB has spent nearly a decade achieving maturity, the technology is somewhat of a mystery to those experienced in traditional environments. This whitepaper provides a blueprint of what you need to know to ensure you are maximizing the potential of the MongoDB database.

If your team is considering or in the process of introducing MongoDB into your environment then this new guide on Becoming a MongoDB DBA is for you. If you want to dive even deeper into MongoDB make sure to check out the Become a MongoDB DBA blog series.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

ClusterControl for MongoDB

Users of MongoDB often have to use a variety of tools to achieve their requirements; ClusterControl provides an all-inclusive system where you don’t have to cobble together different tools.

ClusterControl offers users a single interface to securely manage their MongoDB infrastructures and mixed open source database environments, while preventing vendor lock-in; whether on premise or in the cloud.  ClusterControl offers an alternative to other companies who employ aggressive pricing increases, helping you to avoid vendor lock-in and control your costs.

ClusterControl provides the following features to deploy and manage your MongoDB stacks...

  • Advisors: ClusterControl’s library of Advisors allows you to extend the features of ClusterControl to add even more MongoDB management functionality.
  • Developer Studio: The ClusterControl Developer Studio lets you customize your own MongoDB deployment to enable you to solve your unique problems.
  • Easy Deployment: You can now automatically and securely deploy sharded MongoDB clusters or Replica Sets with ClusterControl’s free community version; as well as automatically convert a Replica Set into a sharded cluster if that’s required.
  • Single Interface: ClusterControl provides one single interface to automate your mixed MongoDB, MySQL, and PostgreSQL database environments.
  • Advanced Security: ClusterControl removes human error and provides access to a suite of security features automatically protecting your databases from hacks and other threats.
  • Monitoring: ClusterControl provides a unified view of all sharded environments across your data centers and lets you drill down into individual nodes.
  • Scaling: Easily add and remove nodes, resize instances, and clone your production clusters with ClusterControl.
  • Management: ClusterControl provides management features that automatically repair and recover broken nodes, and test and automate upgrades.
  • Consistent Backups of Sharded Clusters: Using the Percona MongoDB Consistent Backup tool, ClusterControl will allow you to make consistent snapshots of your MongoDB sharded clusters.

To learn more about the exciting features we offer for MongoDB click here.


Automating and Managing MongoDB in the Cloud

$
0
0

Database management has traditionally been complex and time-consuming. Deployment, with the headaches of security, complex networking, backup planning and implementation, and monitoring, has been a headache. Scaling out your database cluster has been a major undertaking. And in a world where 24/7 availability and rapid disaster recovery is expected, managing even a single database cluster can be a full-time job.

Severalnines’ ClusterControl is a database deployment and management system that addresses the above, facilitating rapid deployment of redundant, secure database clusters or nodes, including advanced backup and monitoring functionality - whether on premise or in the cloud. With plugins supporting Nagios, PagerDuty, and Zabbix, among others, ClusterControl integrates well with existing infrastructure and tools to help you manage your database servers with confidence.

MongoDB is the leading NoSQL database server in the world today. Using ClusterControl, with which you can deploy and manage either official MongoDB or Percona Server for MongoDB, Percona’s competing offering incorporating MongoDB Enterprise features, we are going to walk through deploying a MongoDB Replica Set with three data nodes, and look at some of the features of the ClusterControl application.

We’re going to run through some key features of ClusterControl, especially as they pertain to MongoDB, using Amazon Web Services. Amazon Web Services (or AWS) is the largest Infrastructure as a Service cloud provider globally, hosting millions of users all over the world.It comprises many services for all use cases from virtually unlimited object storage with S3 and highly scalable virtual machine infrastructure using EC2 all the way to enterprise database warehousing with Redshift and even Machine Learning.

Once you’ve read this blog, you may also wish to read our DIY Cloud Database on Amazon Web Services Whitepaper, which discusses configuration and performance considerations for database servers in the AWS Cloud in more detail. In addition, we have Become a MongoDB DBA, a whitepaper with more in depth MongoDB-specific detail.

To begin, first you will need to deploy four AWS instances. For a production platform, the instance type should be carefully chosen based on the guidelines we have previously discussed, but for our purposes instances with 2 virtual CPUs and 4GB RAM will be sufficient. One of these nodes will host ClusterControl, the others will be used to deploy the three database nodes.

Begin by creating your database nodes’ security group, allowing inbound traffic on port 27017. There is no need to restrict outbound traffic, but should you wish to do so, allow outbound traffic on ports 1024-65535 to facilitate outbound communication from the database servers.

Next create the security group for your ClusterControl node. Allow inbound traffic on ports 22, and 80. Add this security group ID to your database nodes security group, and allow unrestricted TCP communication. This will facilitate communication between the two security groups, without allowing ssh access to the database nodes from external clients.

Launch the instances into their respective security groups, choosing for each instance a KeyPair for which you have the ssh key. For the purposes of this task, use the same KeyPair for all instances. If you have lost the ssh key for your KeyPair, you will have to create a new KeyPair. When launching the instances, do not choose the default Amazon Linux image, instead choose an AMI based on a supported operating system listed here. As I am using AWS region EU-CENTRAL-1, I will use community AMI ami-fa2df395, a CentOS 7.3 image, for this purpose.

If you have the AWS command line tools installed, use the aws ec2 describe-instances command detailed previously to confirm that your instances are running--otherwise view your instances in the AWS web console--and when confirmed, log in to the ClusterControl instance via ssh.

Copy the public key file you downloaded when creating your KeyPair to the ClusterControl instance. You can use the scp command for this purpose. For now, let’s leave it in the default /home/centos directory, the home directory of the centos user. I have called mine s9s.pem. You will need the wget tool installed; install it using the following command:

$ sudo yum -y install wget

To install ClusterControl, run the following commands:

$ wget http://www.severalnines.com/downloads/cmon/install-cc
$ chmod +x install-cc
$ ./install-cc # as root or sudo user

The installation will walk you through some initial questions, after which it will take a few minutes to retrieve and install dependencies using your operating system’s package manager.

When installation is complete, point your web browser to http://<address of your ClusterControl instance>. You can find the external facing address of the instance using the describe-instances command, or via the AWS web console.

Once you have successfully logged in, you will see the following screen, and can continue to deploy your MongoDB Replica Set.

Figure 1: Welcome to ClusterControl!

As you can see, ClusterControl can also import existing database clusters, allowing it to manage your existing infrastructure as easily as new deployments.

For our purposes, you are going to click Deploy Database Cluster. On the next screen you will see the selection of database servers and cluster types that ClusterControl supports. Click the tab labelled MongoDB ReplicaSet. Here the values with which you are concerned are SSH User, SSH Key Path, and Cluster Name. The port should already be 22, the default ssh port, and the AMI we are using does not require a Sudo Password.

Figure 2: Deploying a MongoDB Replica Set

The ssh user for the CentOS 7 AMI is centos, and the SSH Key Path is /home/centos/s9s.pem, or the appropriate path depending on your own Key file name. Let’s use MongoDB-RS0 as the Cluster Name. Accepting the default options, we click Continue.

Figure 3: Configuring your deployment

Here we can choose between the MongoDB official build, and a Percona build. Select whichever you prefer, and supply an admin user and password with which to configure MongoDB securely. Note that ClusterControl will not let you proceed unless you provide these details. Make a note of the credentials you have supplied, you will need them to log in to the deployed MongoDB database, if you wish to later use it. Now choose a Replica Set name, or accept the default. We are going to use the vendor repositories, but be aware that you can configure ClusterControl to use your own repositories or those of a third party, if you prefer.

Add your database nodes, one at a time. You can choose to use the external IP address, but if you provide the hostname, which is generally recommended, ClusterControl will record all network interfaces in the hosts, and you will be able to choose the interface on which you would like to deploy. Once you have added your three database nodes, click Deploy. ClusterControl will now deploy your MongoDB Replica Set. Click Full Job Details to observe as it carries out the configuration of your cluster. When the job is complete, go to the Database Clusters screen and see your cluster.

Figure 4: Auto Recovery

Taking a closer look, you can see that Auto Recovery is enabled at both a cluster and a node level; in the case of failures, ClusterControl will attempt to recover your cluster or the individual node having an issue. The green tick beside each node also displays the cluster’s health status at a glance.

Figure 5: Scheduling Backups

The last feature we will cover here is Backups. ClusterControl provides a backup feature that allows a full cluster consistent backup, or simply a standard mongodump backup if you prefer. It also provides the facility to create scheduled backups to run periodically to a schedule of your choosing. Backup retention is also handled, with the option to retain backups for a limited period, avoiding storage issues.

In this blog I’ve attempted to give you a brief overview of using ClusterControl with MongoDB, but there are many more features supported by ClusterControl. Deployment of Sharded Clusters, with hidden and/or delayed slaves, arbiters and other features are all available. More information is available on our website, where you can also find webinars, whitepapers, tutorials, and training, and try out ClusterControl free.

Percona’s Tyler Duzan Joins Us for “How to Automate and Manage Your MongoDB or Percona Server for MongoDB Database” Webinar

$
0
0

We’re excited to announce that Tyler Duzan, Product Manager at Percona, will join our colleague Ruairí Newman, Senior Support Engineer, on October 24th to present a different perspective on how to automate and manage your MongoDB or Percona Server for MongoDB databases.

Tyler will walk us through some of the key features of the drop-in compatible Percona Server for MongoDB as it compares to MongoDB, and how it is “Community ++”.

We had the chance to catch up with Tyler to ask him about what he’s going to be discussing during this webinar. Watch the interview here.

Ruairí will talk about the tools that are available, both commercial and open source, to aid with the automation of operational MongoDB tasks. There are a small number of specialist domain-specific automation tools available, and Ruairí will compare the MongoDB-relevant functionality of two of these products: MongoDB’s Ops Manager, and ClusterControl from Severalnines.

Sign up below to hear all about Percona Server for MongoDB and the differences between these two tools, and how they help automate and manage MongoDB operations. We’ll be covering key aspects from installation and maintenance via backing up your deployments through to doing upgrades.

“See” you there!

Date, Time & Registration

Europe/MEA/APAC

Tuesday, October 24th at 09:00 BST / 10:00 CEST (Germany, France, Sweden)

Register Now

North America/LatAm

Tuesday, October 24th at 09:00 PDT (US) / 12:00 EDT (US)

Register Now

Agenda

  • Introduction to Percona Server for MongoDB
  • How to automate and manage MongoDB
    • Installation and maintenance
    • Complexity of architecture
    • Options for redundancy
    • Comparative functionality
    • Monitoring, Dashboard, Alerting
    • Backing up your deployments
    • Automated deployment of advanced configurations
    • Upgrading existing deployments

Speakers

Ruairí Newman, Senior Support Engineer at Severalnines, is passionate about all things cloud and automation and has worked for MongoDB, VMware and Amazon Web Services among others. He has a background in Operational Support Systems and Professional Services.

Prior to joining Severalnines, Ruairí worked for Huawei Ireland as Senior Cloud Solutions Architect on their Web Services project, where he advised on commodity cloud architecture and Monitoring technologies, and deployed and administered a Research & Development Openstack lab.

Tyler Duzan: Prior to joining Percona as a Product Manager, Tyler spent almost 13 years as an operations and security engineer in a variety of different industries. Deciding to take his analytical mindset and strategic focus into new territory, Tyler is applying his knowledge to solving business problems for Percona customers with inventive solutions combining technology and services.

View the replay: how to manage MongoDB & Percona Server for MongoDB

$
0
0

Many thanks to everyone who participated in this week’s webinar on how to manage MongoDB and Percona Server for MongoDB! The replay is now available for viewing online.

For this webinar we’d teamed up with Percona’s Tyler Duzan, Product Manager, who talked to us about some of the key features and aspects of Percona Server for MongoDB.

And our colleague Ruairi Newman compared the MongoDB-relevant functionality of MongoDB’s Ops Manager and ClusterControl. Participants learned about the differences between these systems, and how they help automate and manage MongoDB operations.

View the replay to find out more

Percona Server for MongoDB is a fully compatible, open source, drop-in replacement for the MongoDB®Community Server and provides MongoDB© Enterprise Edition features at no licensing cost, with additional storage engines and performance improvements.

ClusterControl is the all-inclusive management system for open source databases.

With Percona Server for MongoDB and Severalnines ClusterControl together, users benefit from à cost-efficient solution with additional features and capabilities.

View the replay to learn more

Agenda

  • Introduction to Percona Server for MongoDB
  • How to automate and manage MongoDB
    • Installation and maintenance
    • Complexity of architecture
    • Options for redundancy
    • Comparative functionality
    • Monitoring, Dashboard, Alerting
    • Backing up your deployments
    • Automated deployment of advanced configurations
    • Upgrading existing deployments

Speakers

Ruairí Newman is passionate about all things cloud and automation and has worked for MongoDB, VMware and Amazon Web Services among others. He has a background in Operational Support Systems and Professional Services.

Prior to joining Severalnines, Ruairí worked for Huawei Ireland as Senior Cloud Solutions Architect on their Web Services project, where he advised on commodity cloud architecture and Monitoring technologies, and deployed and administered a Research & Development Openstack lab.

Prior to joining Percona as a Product Manager, Tyler Duzan spent almost 13 years as an operations and security engineer in a variety of different industries. Deciding to take his analytical mindset and strategic focus into new territory, Tyler is applying his knowledge to solving business problems for Percona customers with inventive solutions combining technology and services.

MongoDB Security - Resources to Keep NoSQL DBs Secure

$
0
0

We’ve almost become desensitized to the news. It seems that every other day there is a data breach at a major enterprise resulting in confidential customer information being stolen and sold to the highest bidder.

Data breaches rose by 40% in 2016 and once all the numbers are calculated, 2017 is expected to blow that number out of the water. Yahoo announced the largest breach in history in 2017, other companies like Xbox, Verizon, Equifax, and more also announced major breeches.

Because of the 2017 MongoDB Ransomware Hack, security for MongoDB is hot on everyone's minds.

We decided to pull together some of our top resources that you can use to ensure your MongoDB instances remain secure.

Here are our most popular and relevant resources on the topic of MongoDB Security…

ClusterControl & MongoDB Security

Data is the lifeblood of your business. Whether it’s protecting confidential client data or securing your own IP your business could be doomed should critical data get into the wrong hands. ClusterControl provides many advanced deployment, monitoring and management features to ensure your databases and their data are secure. Learn how!

How to Secure MongoDB with ClusterControl - The Webinar

In March of 2017, at the height of the MongoDB ransomware crisis, we hosted a webinar to talk about how you can keep MongoDB secure using ClusterControl. With authentication disabled by default in MongoDB, learning how to secure MongoDB becomes essential. In this webinar we explain how you can improve your MongoDB security and demonstrate how this is automatically done by ClusterControl.

Using the ClusterControl Developer Studio to Stay Secure

In our blog “MongoDB Tutorial: Monitoring and Securing MongoDB with ClusterControl Advisors” we demonstrated nine of the advisors from our repository for MongoDB that can assist with MongoDB security.

Audit Logging for MongoDB

In our blog “Preemptive Security with Audit Logging for MongoDB” we show that having access to an audit log would have given those affected by the ransom hack the ability to perform pre-emptive measures. The audit log is one of the most underrated features of MongoDB Enterprise and Percona Server for MongoDB. We will uncover its secrets in this blog post.

The 2017 MongoDB Ransom Hack

In January of 2017 thousands of MongoDB servers were held for ransom simply because they were deployed without basic authentication in place. In our first blog on the ransome hack, “Secure MongoDB and Protect Yourself from the Ransom Hack” we explain what happened and some simple steps to keep your data safe. In the second blog, “How to Secure MongoDB from Ransomware - Ten Tips” we went further showing even more things you could do to make sure your MongoDB instances are secure.

The Importance of Automation for MongoDB Security

Severalnines CEO Vinay Joosery shares with us the blog “How MongoDB Database Automation Improves Security” and discusses how the growing number of cyberattacks on open source database deployments highlights the industry’s poor administrative and operational practices. This blog explores how database automation is the key to keeping your MongoDB database secure.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

ClusterControl for MongoDB

Users of MongoDB often have to work with a variety of tools to achieve their requirements; ClusterControl provides an all-inclusive system where you don’t have to cobble together different tools.

ClusterControl offers users a single interface to securely manage their MongoDB infrastructures and mixed open source database environments, while preventing vendor lock-in; whether on premise or in the cloud. ClusterControl offers an alternative to other companies who employ aggressive pricing increases, helping you to avoid vendor lock-in and control your costs.

ClusterControl provides the following features to deploy and manage your MongoDB stacks...

  • Easy Deployment: You can now automatically and securely deploy sharded MongoDB clusters or Replica Sets with ClusterControl’s free community version; as well as automatically convert a Replica Set into a sharded cluster if that’s required.
  • Single Interface: ClusterControl provides one single interface to automate your mixed MongoDB, MySQL, and PostgreSQL database environments.
  • Advanced Security: ClusterControl removes human error and provides access to a suite of security features automatically protecting your databases from hacks and other threats.
  • Monitoring: ClusterControl provides a unified view of all sharded environments across your data centers and lets you drill down into individual nodes.
  • Scaling: Easily add and remove nodes, resize instances, and clone your production clusters with ClusterControl.
  • Management: ClusterControl provides management features that automatically repair and recover broken nodes, and test and automate upgrades.
  • Advisors: ClusterControl’s library of Advisors allows you to extend the features of ClusterControl to add even more MongoDB management functionality.
  • Developer Studio: The ClusterControl Developer Studio lets you customize your own MongoDB deployment to enable you to solve your unique problems.

To learn more about the exciting features we offer for MongoDB click here or watch this video.

New Whitepaper: How to Automate and Manage MongoDB with ClusterControl

$
0
0

At the time of writing this blog, MongoDB is the world’s leading NoSQL database server, and (per DB-Engines ranking, the most widely-known ranking in the database industry) the 5th database server overall in terms of popularity.

As you may have seen before, we’ve published a ‘Become a MongoDB DBA’ blog series, which covers all the need-to-know information when getting started with MongoDB (for example, when you’re rather a MySQL DBA) and we have now taken the next logical step in our work on MongoDB by producing this new white paper: MongoDB Management and Automation with ClusterControl.

This white paper extends on the Become a MongoDB DBA series by focussing further on how to manage and automate MongoDB with the help of ClusterControl, our all-inclusive management system for open source databases.

While MongoDB does have great features for developers, some key questions arise:

What of the operational management of a production environment?

How easy is it to deploy a distributed environment, and then manage it?

In this whitepaper, we cover some of the fundamentals of MongoDB, and show you how a clustered environment can be automated with ClusterControl.

Download the white paper

To summarise, in this white paper, we have reviewed the challenges involved in managing MongoDB at scale and have introduced mitigating features of ClusterControl. As a best of breed database management solution, ClusterControl brings consistency and reliability to your database environment, and simplifies your database operations at scale.

The main topics covered include...

Considerations for administering MongoDB

  • Built-in Redundancy
  • Scalability
  • Arbiters
  • Delayed Replica Set Members
  • Backups
  • Monitoring

Automation with ClusterControl

  • Deployment
  • Backup & Restore
  • Monitoring
  • MongoDB Advisors
  • Integrations
  • Command-Line Access

Download the white paper

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

ClusterControl is the all-inclusive open source database management system for users with mixed environments that removes the need for multiple management tools. It provides advanced deployment, management, monitoring, and scaling functionality to get your MySQL, MongoDB, and PostgreSQL databases up- and- running using proven methodologies that you can depend on to work. At the core of ClusterControl is its automation functionality that lets you automate many of the database tasks you have to perform regularly like deploying new databases, adding and scaling new nodes, running backups and upgrades, and more.

Download ClusterControl

Key Things to Monitor in MongoDB

$
0
0

Enhancing system performance, especially for computer structures, requires a process of getting a good overview of performance. This process is generally called monitoring. Monitoring is an essential part of database management and the detailed performance information of your MongoDB will not only help you to gauge its functional state; but also give a clue on anomalies, which is helpful when doing maintenance. It is essential to identify unusual behaviours and fix them before they escalate into more serious failures.

Some of the types of failures that could arise are...

  • Lag or slowdown
  • Resource inadequacy
  • System hiccup

Monitoring is often centered on analyzing metrics. Some of the key metrics you will want to monitor include...

  • Performance of the database
  • Utilization of resources (CPU usage, available memory and Network usage)
  • Emerging setbacks
  • Saturation and limitation of the resources
  • Throughput operations

In this blog we are going to discuss, in detail, these metrics and look at available tools from MongoDB (such as utilities and commands.) We will also look at other software tools such as Pandora, FMS Open Source, and Robo 3T. For the sake of simplicity, we are going to use the Robo 3T software in this article to demonstrate the metrics.

Performance of the Database

The first and foremost thing to check on a database is its general performance, for example, whether the server is active or not. If you run this command db.serverStatus() on a database in Robo 3T, you will be presented with this information showing the state of your server.

Replica sets

Replica set is a group of mongod processes that maintain the same data set. If you are using replica sets especially in the production mode, operation logs will provide a foundation for the replication process. All the write operations are tracked using nodes, that is a primary node and a secondary node, which store a limited-size collection. On the primary node, the write operations are applied and processed. However, if the primary node fails before they are copied to the operation logs, then the secondary writing is made but in this case the data might not be replicated.

Key metrics to keep an eye on...

Replication Lag

This defines how far the secondary node is behind the primary node. An optimal state requires the gap be as minute as possible. On a normal operating system, this lag is estimated to be 0. If the gap is too wide then data integrity will be compromised once the secondary node is promoted to primary. In this case you can set a threshold, for example 1 minute, and if it is exceeded an alert is set. Common causes of wide replication lag include...

  1. Shards that may have an insufficient write capacity which is often associated with resources saturation.
  2. The secondary node is providing data at a slower rate than the primary node.
  3. Nodes may also be hindered in some way from communicating, possibly due to a poor network.
  4. Operations on the primary node could also be slower, thereby blocking replication. If this happens you can run the following commands:
    1. db.getProfilingLevel(): if you get a value of 0, then your db operations are optimal.
      If the value is 1, then it corresponds to slow operations which can be consequently due to slow queries.
    2. db.getProfilingStatus(): in this case we check the value of slowms, by default it is 100ms. If the value is larger than this, then you might be having heavy write operations on the primary or inadequate resources on the secondary. In order to solve this, you can scale the secondary so it has as much resources as the primary.

Cursors

If you make a read request for example find, you will be provided with a cursor which is a pointer to the data set of the result. If you run this command db.serverStatus() and navigate to the metrics object then cursor, you will see this…

In this case, the cursor.timeOut property was updated incrementally to 9 because there were 9 connections that died without closing the cursor. The consequence is that it will remain open on the server and hence consuming memory, unless it is reaped by the default MongoDB setting. An alert to you should be identifying non-active cursors and reaping them off in order to save on memory. You can also avoid non-timeout cursors because they often hold on to resources, thereby slowing down the internal system performance. This can be achieved by setting the value of the cursor.open.noTimeout property to a value of 0.

Journaling

Considering the WiredTiger Storage Engine, before data is recorded, it is first written to the disk files. This is referred to as journaling. Journaling ensures the availability and durability of data on an event of failure from which a recovery can be carried out.

For the purpose of recovery, we often use checkpoints (especially for the WiredTiger storage system) to recover from the last checkpoint. However, if MongoDB shuts down unexpectedly, then we use the journaling technique to recover any data that was processed or provided after the last checkpoint.

Journaling should not be turned off in the first case, since it only takes like 60 seconds to create a new checkpoint. Hence if a failure occurs, MongoDB can replay the journal to recover data lost within these seconds.

Journaling generally narrows the time interval from when data is applied to memory until it is durable on disk. The storage.journal object has a property that describes the commiting frequency, that is, commitIntervalMs which is often set to a value of 100ms for WiredTiger. Tuning it to a lower value will enhance frequent recording of writes hence reducing instances of data loss.

Locking Performance

This can be caused by multiple read and write requests from many clients. When this happens there is a need to keep consistency and avoid write conflicts. In order to achieve this MongoDB uses multi-granularity-locking which allows locking operations to occur at different levels, such as global, database, or collection level.

If you have poor schema design patterns, then you will be vulnerable to locks being held for long durations. This is often experienced when making two or more different write operations to a single document in the same collection, with a consequence of blocking each other. For the WiredTiger storage engine we can use the ticket system where read or write requests come from something like a queue or thread.

By default the concurrent number of read and write operations are defined by the parameters wiredTigerConcurrentWriteTransactions and wiredTigerConcurrentReadTransactions which are both set to a value of 128.

If you scale this value too high then you will end up being limited by CPU resources. To increase throughput operations, it would be advisable to scale horizontally by providing more shards.

Severalnines
 
Become a MongoDB DBA - Bringing MongoDB to Production
Learn about what you need to know to deploy, monitor, manage and scale MongoDB

Utilization of Resources

This generally describes usage of available resources such as the CPU capacity/ processing rate and RAM. The performance, especially for the CPU can change drastically in accordance to unusual traffic loads. Things to check on include...

  1. Number of connections
  2. Storage
  3. Cache

Number of Connections

If the number of connections is higher than what the database system can handle then there will be a lot of queuing. Consequently, this will overwhelm performance of the database and make your setup run slowly. This number can result in driver issues or even complications with your application.

If you monitor a certain number of connections for some period and then notice that that value has peaked, it is always a good practice to set an alert if the connection exceeds this number.

If the number is getting too high then you can scale up in order to cater to this rise. To do this you have to know the number of connections available within a given period, otherwise, if the available connections are not enough, then requests will not be handled in a timely fashion.

By default MongoDB provides support for up to 1 million connections. With your monitoring, always ensure the current connections never get too close to this value. You can check the value in the connections object.

Storage

Every row and data record in MongoDB is referred to as a document. Document data is in BSON format. On a given database, if you run the command db.stats(), you will be presented with this data.

  • StorageSize defines the size of all data extents in the database.
  • IndexSize outlines the size of all indexes created within that database.
  • dataSize is a measure of the total space taken by the documents in the database.

You can sometimes see a change in memory, especially if a lot of data has been deleted. In this case you should set up an alert in order to ensure it was not due to malicious activity.

Sometimes, the overall storage size may shoot up while the database traffic graph is constant and in this case, you should check your application or database structure to avoid having duplicates if not needed.

Like the general memory of a computer, MongoDB also has caches in which active data is temporarily stored. However, an operation may request for data which is not in this active memory, hence making a request from the main disk storage. This request or situation is referred to as page fault. Page fault requests come with a limitation of taking longer time to execute, and can be detrimental when they occur frequently. To avoid this scenario, ensure the size of your RAM is always enough to cater to the data sets you are working with. You should also ensure you have no schema redundancy or unnecessary indices.

Cache

Cache is a temporal data storage item for frequently accessed data. In WiredTiger the file system cache and storage engine cache are often employed. Always ensure that your working set does not bulge beyond the available cache, otherwise, the page faults will increase in number causing some performance issues.

At some point you may decide to modify your frequent operations, but the changes are sometimes not reflected in the cache. This unmodified data is referred to as “Dirty Data.” It exists because it has not yet been flushed to disk. Bottlenecks will result if the amount of “Dirty Data” grows to some average value defined by slow writing to the disk. Adding more shards will help to reduce this number.

CPU Utilization

Improper indexing, poor schema structure and unfriendly designed queries will require more CPU attention hence will obviously increase its utilization.

Throughput operations

To a large extent getting enough information on these operations can enable one to avoid consequential setbacks such as errors, saturation of resources, and functional complications.

You should always take note of the number of read and write operations to the database, that is, a high-level view of the cluster’s activities. Knowing the number of operations generated for the requests will enable you to calculate the load that the database is expected to handle. The load can then be handled either scaling up your database or scaling out; depending on the type of resources you have. This allows you to easily gauge the quotient ratio in which the requests are accumulating to the rate at which they are being processed. Furthermore, you can optimize your queries appropriately in order to improve the performance.

In order to check the number of read and write operations, run this command db.serverStatus(), then navigate to the locks.global object, the value for the property r represents the number of read requests and w number of writes.

More often the read operations are more than the write operations. Active client metrics are reported under globalLock.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Saturation and Limitation of Resources

Sometimes the database may fail to keep in pace with the rate of writing and reading, as portrayed by an increasing number of queued requests. In this case, you have to scale up your database by providing more shards to enable MongoDB to address the requests fast enough.

Emerging Setbacks

MongoDB log files always give a general overview on assert exceptions returned. This result will give you a clue on the possible causes of errors. If you run the command, db.serverStatus(), some of the error alerts you will note include:

  1. Regular asserts: these are as a result of an operation failure. For example in a schema if a string value is provided to an integer field hence resulting in failure reading the BSON document.
  2. Warning asserts: these are often alerts on some issue but are not having much impact on its operation. For example when you upgrade your MongoDB you might be alerted using deprecated functions.
  3. Msg asserts: they are as a result of internal server exceptions such as slow network or if the server is not active.
  4. User asserts: like regular asserts, these errors arise when executing a command but they are often returned to the client. For example if there are duplicate keys, inadequate disk space or no access to write into the database. You will opt to check your application to fix these errors.

An Overview of Database Indexing for MongoDB

$
0
0

What is Indexing?

Indexing is an important concept in database world. Main advantage of creating index on any field is faster access of data . It optimizes the process of database searching and accessing. Consider this example to understand this.

When any user asks for a specific row from the database, what will DB system do? It will start from the first row and check whether this is the row that the user wants? If yes, then return that row, otherwise continue searching for the row till the end.

Generally, when you define an index on a particular field, the DB system will create a ordered list of that field’s value and store it in a different table. Each entry of this table will point to the corresponding values in the original table. So when the user tries to search for any row, it will first search for the value in the index table using binary search algorithm and return the corresponding value from the original table. This process will take less time because we are using binary search instead of linear search.

In this article, we will focus in MongoDB Indexing and understand how to create and use indexes in MongoDB.

How to Create an Index in MongoDB Collection?

To create index using Mongo shell, you can use this syntax:

db.collection.createIndex( <key and index type specification>, <options> )

Example:

To create index on name field in myColl collection:

db.myColl.createIndex( { name: -1 } )

Types of MongoDB Indexes

  1. Default _id Index

    This is the default index which will be created by MongoDB when you create a new collection. If you don’t specify any value for this field, then _id will be primary key by default for your collection so that a user can’t insert two documents with same _id field values. You can’t remove this index from the _id field.

  2. Single Field Index

    You can use this index type when you want to create a new index on any field other than _id field.

    Example:

    db.myColl.createIndex( { name: 1 } )

    This will create a single key ascending index on name field in myColl collection

  3. Compound Index

    You can also create an index on multiple fields using Compound indexes. For this index, order of the fields in which they are defined in the index matters. Consider this example:

    db.myColl.createIndex({ name: 1, score: -1 })

    This index will first sort the collection by name in ascending order and then for each name value, it will sort by score values in descending order.

  4. Multikey Index

    This index can be used to index array data. If any field in a collection has an array as its value then you can use this index which will create separate index entries for each elements in array. If the indexed field is an array, then MongoDB will automatically create Multikey index on it.

    Consider this example:

    {
    ‘userid’: 1,
    ‘name’: ‘mongo’,
    ‘addr’: [
        {zip: 12345, ...},
    {zip: 34567, ...}
    ]
    }

    You can create a Multikey index on addr field by issuing this command in Mongo shell.

    db.myColl.createIndex({ addr.zip: 1 })
  5. Geospatial Index

    Suppose you have stored some coordinates in MongoDB collection. To create index on this type fields(which has geospatial data), you can use a Geospatial index. MongoDB supports two types of geospatial indexes.

    • 2d Index: You can use this index for data which is stored as points on 2D plane.

      db.collection.createIndex( { <location field> : "2d" } )
    • 2dsphere Index: Use this index when your data is stored as GeoJson format or coordinate pairs(longitude, latitude)

    db.collection.createIndex( { <location field> : "2dsphere" } )
  6. Text Index

    To support queries which includes searching for some text in the collection, you can use Text index.

    Example:

    db.myColl.createIndex( { address: "text" } )
  7. Hashed Index

    MongoDB supports hash-based sharding. Hashed index computes the hash of the values of the indexed field. Hashed index supports sharding using hashed sharded keys. Hashed sharding uses this index as shard key to partition the data across your cluster.

    Example:

    db.myColl.createIndex( { _id: "hashed" } )
ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Properties of Index

  1. Unique Index

    This property ensures that there are no duplicate values in the indexed field. If any duplicates are found while creating the index, then it will discard those entries.

  2. Sparse Index

    This property ensures that all queries search documents with indexed field. If any document doesn’t have an indexed field, then it will be discarded from the result set.

  3. TTL Index

    This index is used to automatically delete documents from a collection after specific time interval(TTL) . This is ideal for removing documents of event logs or user sessions.

Performance Analysis

Consider a collection of student scores. It has exactly 3000000 documents in it. We haven’t created any indexes in this collection. See this image below to understand the schema.

Sample documents in score collection
Sample documents in score collection

Now, consider this query without any indexes:

db.scores.find({ student: 585534 }).explain("executionStats")

This query takes 1155ms to execute. Here is the output. Search for executionTimeMillis field for the result.

Execution time without indexing
Execution time without indexing

Now let’s create index on student field. To create the index run this query.

db.scores.createIndex({ student: 1 })

Now the same query takes 0ms.

Execution time with indexing
Execution time with indexing

You can clearly see the difference in execution time. It’s almost instantaneous. That’s the power of indexing.

Conclusion

One obvious takeaway is: Create indexes. Based on your queries, you can define different types of indexes on your collections. If you don’t create indexes, then each query will scan the full collections which takes a lot of time making your application very slow and it uses lots of resources of your server. On the other hand, don’t create too many indexes either because creating unnecessary indexes will cause extra time overhead for all insert, delete and update. When you perform any of these operations on an indexed field, then you have to perform the same operation on index tree as well which takes time. Indexes are stored in RAM so creating irrelevant indexes can eat up your RAM space, and slow down your server.


How to Go Into Production With MongoDB - Top Ten Tips

$
0
0

After the successful development of the application and prior to dedicating yourself to the production of MongoDB, reckon with these quick guidelines in order to ensure a smooth and efficient flow as well as to achieve optimal performance.

1) Deployment Options

Selection of Right Hardware

For optimal performance, it’s preferable to use SSD rather than the HDD. It is necessary to take care whether your storage is local or remote and take measures accordingly. It’s better to use RAID for protection of hardware defects and recovery scheme, but don’t rely completely on it, as it doesn’t offer any protection against adverse failures. For execution on disks, RAID-10 is a good fit in terms of performance and availability which lacks often in other RAID levels. The right hardware is the building block for your application for optimized performance and to avoid any major debacle.

Cloud Hosting

A range of cloud vendors is available which offer pre-installed MongoDB database hosts. The selection of best choice is the founding step for your application to grow and make first impressions on the target market. MongoDB Atlas is one of the possible choices which offers a complete solution for cloud interface with features like deployment of your nodes and a snapshot of your data stored in Amazon S3. ClusterControl is another good available option for easy deployment and scaling. Which offers a variety of features like easy addition and removal of nodes, resize instances, and cloning of your production cluster. You can try ClusterControl here without being charged. Other available options are RackSpace ObjectRocket and MongoStitch.

2) RAM

Frequently accessed items are cached in RAM, so that MongoDB can provide optimal response time. RAM usually depends on the amount of data you are going to store, the number of collections, and indexes. Make sure you have enough RAM to accommodate your indexes otherwise it will drastically affect your application performance on production. More RAM means less page fault and better response time.

3) Indexing

For applications which include chronic write requests, indexing plays an imperative role. According to MongoDB docs:

“If a write operation modifies an indexed field, MongoDB updates all indexes that have the modified field as a key”

So, be careful while choosing indexes as it may affect your DB performance.

Indexing Example: Sample entry in the restaurant database

{
  "address": {
     "building": "701",
     "street": "Harley street",
     "zipcode": "71000"
  },
  "cuisine": "Bakery",
  "grades": [
     { "date": { "$date": 1393804800000 }, "grade": "A", "score": 2 },
     { "date": { "$date": 1378857600000 }, "grade": "A", "score": 6 },
     { "date": { "$date": 1358985600000 }, "grade": "A", "score": 10 },
     { "date": { "$date": 1322006400000 }, "grade": "A", "score": 9 },
     { "date": { "$date": 1299715200000 }, "grade": "B", "score": 14 }
  ],
  "name": "Bombay Bakery",
  "restaurant_id": "187521"
}
  1. Creating Index on Single Field

    > db.restaurants.createIndex( { "cuisine": 1 } );
    {
         "createdCollectionAutomatically" : false,
         "numIndexesBefore" : 1,
         "numIndexesAfter" : 2,
         "ok" : 1
    }

    In above example ascending order index is created on cuisine field.

  2. Creating Index on Multiple Fields

    > db.restaurants.createIndex( { "cuisine": 1 , "address.zipcode": -1 } );
    {
            "createdCollectionAutomatically" : false,
            "numIndexesBefore" : 2,
            "numIndexesAfter" : 3,
            "ok" : 1
    }

    Here compound index is created on cuisine and zip code fields. The -ve number defines descending order.

4) Be Prepared for Sharding

MongoDB partitions the data into different machines using a mechanism known as sharding. It is not advised to add sharding in the beginning unless you are expecting hefty datasets. Do remember to keep your application performance in line you need a good sharding key, according to your data patterns as it directly affects your response time. Balancing of data across shards is automatic. However, it’s better to be prepared and have a proper plan. So you can consolidate whenever your application demands.

5) Best practices for OS Configuration

  • XFS File System
    • It’s highly scalable, high performance 64-bit journaling file system. Revamps I/O performance by permitting fewer and larger I/O operations.
  • Put file descriptor limit.
  • Disable transparent huge pages and Nonuniform Access Memory (NUMA).
  • Change the default TCP keepalive time to 300 seconds (for Linux) and 120 seconds (for Azure).

Try these commands for changing default keepalive time;

For Linux

sudo sysctl -w net.ipv4.tcp_keepalive_time=<value>

For Windows

Type this command in Command Prompt as an Administrator, where <value> is expressed in hexadecimal (e.g. 120000 is 0x1d4c0):

reg add HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ /t REG_DWORD /v KeepAliveTime /d <value>

6) Ensuring High Availability using Replication

Going on production without replication can cause your app sudden down failures. Replication takes care of the problem if a node fails. Manage read, write operations for your secondary MongoDB instances according to your application needs.

Keep these things in mind while Replicating:

  • For high availability, deploy your replica set into a minimum of three data centers.
  • Ensure that MongoDB instances have 0 or 1 votes.
  • Ensure full bi-directional network connectivity between all MongoDB instances.

Example of creating a replica set with 4 local MongoDB instances:

  1. Creating 4 local MongoDB instances

    First, create data directories

    mkdir -p /data/m0
    mkdir -p /data/m1
    mkdir -p /data/m2
    mkdir -p /data/m3
  2. Start 4 local instances

    mongod --replSet cluster1 --port 27017 --dbpath /data/m0
    mongod --replSet cluster2 --port 27018 --dbpath /data/m1
    mongod --replSet cluster1 --port 27019 --dbpath /data/m2
    mongod --replSet cluster1 --port 27020 --dbpath /data/m3
  3. Add the instances to the cluster and initiate

    mongo myhost:34014
    myConfig = {_id: ‘cluster1’, members: [
        {_id: 0, host: ‘myhost1:27017’},
        {_id: 1, host: ‘myhost2:27018’},
        {_id: 2, host: ‘myhost3:27019’},
        {_id: 3, host: ‘myhost4:27020’}]
    }
    rs.initiate(myConfig);

Security Measures

7) Secure Machines

Open ports on machines hosting MongoDB are vulnerable to various malicious attacks. More than 30 thousand MongoDB databases had been compromised in a ransomware attack due to lag in proper security configuration. While going on production close your public ports of MongoDB server. However, You should keep one port open for SSh purpose.

Enabling Authentication on MongoDB instance:

  1. Launch mongod.conf file in your favorite editor.

  2. Append these lines at the end of the config file.

    security:
          authorization: enabled
  3. Append these lines at the end of the config file.

    service mongod restart
  4. Confirm the status

    service mongod status

Restraining external access

Open mongod.conf file again to set limited IPs access your server.

bind_ip=127.0.0.1

By adding this line, it means you can only access your server through 127.0.0. (which is localhost). You can also add multiple IPs in bind option.

bind_ip=127.0.0.1,168.21.200.200

It means you can access from localhost and your private network.

8) Password Protection

To add an extra security layer to your machines enable access control and enforce authentication. Despite the fact that you have restrained MongoDB server to accept connections from the outside world, there is still a possibility of any malicious scripts to get into your server. So, don’t be reluctant to set a username/password for your database and assign required permissions. Enabled access control will allow users only to perform actions determined by their roles.

Here are the steps to create a user and assigning database access with specific roles.

In the first place we will create a user (in this case, it’s admin) for managing all users and databases and then we will create specific database owner having only read and write privileges on one MongoDB database instance.

Create an admin user for managing others users for database instances

  1. Open your Mongo shell and switch to the admin database:

    use admin
  2. Create a user for admin database

    db.createUser({ user: "admin", pwd: "admin_password", roles: [{ role: "userAdminAnyDatabase", db: "admin" }] })
  3. Authenticate newly created user

    db.auth("admin", "admin_password")
  4. Creating specific instance user:

    use database_1
    db.createUser({ user: "user_1", pwd: "your_password", roles: [{ role: "dbOwner", db: "database_1" }] })
  5. Now verify, if a user has been successfully created or not.

    db.auth("user_1", "your_password")
    show collections

That’s it! You have successfully secured your database instances with proper authentication. You can add as many users as you want following the same procedure.

9) Encryption and Protection of Data

If you are using Wiredtiger as a storage engine then you can use it’s encryption at rest configuration to encrypt your data. If not, then encryption should be performed on the host using a file system, devices or physical encryption.

10) Monitor Your Deployment

Once you have finished the deployment of MongoDB into production, then you must track the performance activity to prevent early possible problems. There is a range of strategies you can adapt to monitor your data performance in the production environment.

  • MongoDB includes utilities, which return statistics about instance performance and activity. Utilities are used to pinpoint issues and analyze normal operations.

  • Use mongostat to apprehend arrangement of operation types and capacity planning.

  • For tracking reports and read-write activities, mongotop is recommended.

mongotop 15

This command will return output after every 15 seconds.

                     ns    total    read    write          2018-04-22T15:32:01-05:00
   admin.system.roles      0ms     0ms      0ms
 admin.system.version      0ms     0ms      0ms
             local.me      0ms     0ms      0ms
       local.oplog.rs      0ms     0ms      0ms
   local.replset.minvalid  0ms     0ms      0ms
    local.startup_log      0ms     0ms      0ms
 local.system.indexes      0ms     0ms      0ms
  local.system.namespaces  0ms     0ms      0ms
 local.system.replset      0ms     0ms      0ms     
                     ns    total    read    write          2018-04-22T15:32:16-05:00
   admin.system.roles      0ms     0ms      0ms
 admin.system.version      0ms     0ms      0ms
             local.me      0ms     0ms      0ms
       local.oplog.rs      0ms     0ms      0ms
   local.replset.minvalid  0ms     0ms      0ms
    local.startup_log      0ms     0ms      0ms
 local.system.indexes      0ms     0ms      0ms
  local.system.namespaces  0ms     0ms      0ms
 local.system.replset      0ms     0ms      0ms

MongoDB Monitoring Service (MMS) is another available option that monitors your MongoDB cluster and makes it convenient for you to have a sight of production deployment activities.

And of course there is ClusterControl by Severalnines, the automation and management system for open source databases. ClusterControl enables easy deployment of clusters with automated security settings and makes it simple to troubleshoot your database by providing easy-to-use management automation that includes repair and recovery of broken nodes, automatic upgrades, and more. You can get started with its (free forever) Community Edition, with which you can deploy and monitor MongoDB as well as create custom advisors in order to tune your monitoring efforts to those aspects that are specific to your setup. Download it free here.

MongoDB Chain Replication Basics

$
0
0

What is Chain Replication?

When we talk about replication, we are referring to the process of making redundant copies of data in order to meet design criteria on data availability. Chain replication, therefore, refers to the linear ordering of MongoDB servers to form a synchronized chain. The chain contains a primary node, succeeded by secondary servers arranged linearly. Like the word chain suggest, the server closest to the primary server replicates from it while every other succeeding secondary server replicates from the preceding secondary MongoDB server. This is the main difference between chained replication and normal replication. Chained replication occurs when a secondary node selects its target using ping time or when the closest node is a secondary. Although chain replication as it appears, reduces load on the primary node, it may cause replication lag.

Why Use Chain Replication?

System infrastructures sometimes suffer unpredictable failures leading to loss of a server and therefore affecting availability. Replication ensures that unpredictable failures do not affect availability. Replication further allows recovery from hardware failure and service interruption. Both chained and unchained replication serve this purpose of ensuring availability despite system failures. Having established that replication is important, you may ask why use chain replication in particular. There is no performance difference between chained and unchained replication in MongoDb. In both cases, when the primary node fails, the secondary servers vote for a new acting primary and therefore writing and reading of data is not affected in both cases. Chained replication is however the default replication type in MongoDb.

How to Setup a Chain Replica

By default, chained replication is enabled in MongoDB. We will therefore elaborate on the process of deactivating chain replication. The major reason for which chain replication can be disabled is if it is causing lag. The merit of chain replication is however superior to the lag demerit and therefore in most cases deactivating it is unnecessary. Just in case chain replication is not active by default, the following commands will help you activate.

cfg = rs.config()
cfg.settings.chainingAllowed = true
rs.reconfig(cfg)

This process is reversible. When forced to deactivate chain replication, the following process is followed religiously.

cfg = rs.config()
cfg.settings.chainingAllowed = false
rs.reconfig(cfg)
ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Tips & Tricks for Chain Replication

The most dreadful limitations of chain replication is replication lag. Replication lag refers to the delay that occurs between the time when an operation is done on the primary and when the same operation is replicated on the secondary. Although it is naturally impossible, it is always desired that the speed of replication to be very high in that replication lag is zero. To avoid or minimize replication lag to be close to zero, it a prudent design criteria to use primary and secondary hosts of the same specs in terms of CPU, RAM, IO and network related specs.

Although chain replication ensures data availability, chain replication can be used together with journaling. Journaling provides data safety by writing to a log that is regularly flushed to disk. When the two are combined three servers are written per write request unlike in chain replication alone where only two servers are written per write request.

Another important tip is using w with replication. The w parameter controls the number of servers that a response should be written to before returning success. When the w parameter is set, the getlasterror checks the servers’ oplog and waits until the given number of ‘w’ servers have the operation applied.

Using a monitoring tool like MongoDB Monitoring Service (MMS) or ClusterControl allows you to obtain the status of your replica nodes and visualize changes over time. For instance, in MMS, you can find replica lag graphs of the secondary nodes showing the variation in replication lag time.

Measuring Chain Replication Performance

By now you are aware that the most important performance parameter of chain replication is the replication lag time. We will therefore discuss how to test for replication lag period. This test can be done through the MongoDb shell script. To do a replication lag test, we compare the oplog of the last event on the primary node and the oplog of last event on the secondary node.

To check the information for the primary node, we run the following code.

db.printSlaveReplicationInfo()

The above command will provide information on all the recent operations on the primary node.The results should appear as below.

rs-ds046297:PRIMARY db.printSlaveReplicationInfo()
source: ds046297-a1.mongolab.com:46297
synced To: Tue Mar 05 2013 07:48:19 GMT-0800 (PST)
      = 7475 secs ago (2.08hrs)
source: ds046297-a2.mongolab.com:46297
synced To: Tue Mar 05 2013 07:48:19 GMT-0800 (PST)
      = 7475 secs ago (2.08hrs)

Having obtained the oplog for the primary, we are now interested in the oplog for the secondary node. The following command will help us obtain the oplog.

db.printReplicationInfo()

This command will provide an output with details on oplog size, log length, time for oplog first event, time for oplog last event and the current time. The results appear as below.

rs-ds046297:PRIMARY db.printReplicationInfo()
configured oplog size:   1024MB
log length start to end: 5589 secs (1.55hrs)
oplog first event time:  Tue Mar 05 2013 06:15:19 GMT-0800 (PST)
oplog last event time:   Tue Mar 05 2013 07:48:19 GMT-0800 (PST)
now:                     Tue Mar 05 2013 09:53:07 GMT-0800 (PST)

From the oplog of the primary server, the last sync occurred on Tue Mar 05 2013 07:48:19 GMT-0800 (PST). From the oplog of the secondary server, the last operation occurred on Tue Mar 05 2013 07:48:19 GMT-0800 (PST). The replication lag was zero and therefore our chain replicated system is in correct operation. Replication time lag may however vary depending on the amount of changes that need to be replicated.

Optimizing Your Linux Environment for MongoDB

$
0
0

MongoDB performance depends on how it utilizes the underlying resources. It stores data on disk, as well as in memory. It uses CPU resources to perform operations, and a network to communicate with its clients. There should be adequate resources to support its general liveliness. In this article we are going to discuss various resource requirements for the MongoDB database system and how we can optimize them for maximum performance.

Requirements for MongoDB

Apart from providing large-scale resources such as the RAM and CPU to the database, tuning the Operating System can also improve performance to some extent. The significant utilities required for establishing a MongoDB environment include:

  1. Enough disk space
  2. Adequate memory
  3. Excellent network connection.

The most common operating system for MongoDB is Linux, so we’ll look at how to optimize it for the database.

Reboot Condition.

There are many tuning techniques that can be applied to Linux. However, as some changes take place without rebooting your host, it is always a good practice to reboot after making changes to ensure they are applied. In this section, the tuning implementations we are going to discuss are:

  1. Network Stack
  2. NTP Daemon
  3. Linux User Limit
  4. File system and Options
  5. Security
  6. Virtual Memory

Network Stack

Like any other software, an excellent network connection provides a better exchange interface for requests and responses with the server. However, MongoDB is not favored with the Linux default kernel network tunings. As the name depicts, this is an arrangement of many layers that can be categorized into 3 main ones: User area, Kernel area and Device area. The user area and kernel area are referred to as host since their tasks are carried out by the CPU. The device area is responsible for sending and receiving packets through an interface called Network Interface Card. For better performance with the MongoDB environment, the host should be confined to a 1Gbps network interface limit. In this case, what we are supposed to tune is the relatively throughput settings which include:

  1. net.core.somaxconn (increase the value)
  2. net.ipv4.tcp_max_syn_backlog (increase the value)
  3. net.ipv4.tcp_fin_timeout (reduce the value)
  4. net.ipv4.tcp_keepalive_intvl (reduce the value)
  5. net.ipv4.tcp_keepalive_time (reduce the value)

To make these changes permanent, create a new file /etc/sysctl.d/mongodb-sysctl.conf if it does not exist and add these lines to it.

net.core.somaxconn = 4096
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_max_syn_backlog = 4096

Then run the command as root user /sbin/sysctl -p in order to apply the changes permanently.

NTP Daemon

Network Time Protocol (NTP) is a technique for which a software clock of a Linux system is synchronized with internet time servers. MongoDB, being a cluster, is dependent on time consistency across nodes. For this reason, it is important for the NTP to be run permanently on MongoDB hosts. The importance of the NTP configuration is to ensure continuous serving of the server to some set time after a network disconnection. By default, the NTP is installed on the client side so for MongoDB to install the NTP on a Linux system with Debian/Ubuntu flavor, just run the command:

$ sudo apt-get install ntp

You can visit ntp.conf to see the configuration of the NTP daemon for different OS.

Linux User Limit

Sometimes a user side fault can end up impacting the entire server and host system. To shun this, the Linux system is designed to undertake some system resource limits regarding processes being executed on a per-user basis. This being evident, it will be inappropriate to deploy MongoDB on such default system configurations since it would require more resources than the default provision. Besides, MongoDB is often the main process to utilize the underlying hardware, therefore, it will be predominant to optimize the Linux system for such dedicated usage. ThE database can then fully exploit the available resources.

However, it will not be convenient to disable this limit constraints or set them to an unlimited state. For example, if you run into a shortage of CPU storage or RAM, a small fault can escalate into a huge problem and result into other features to fail - e.g., SSH which is vital in solving the initial problem.

In order to achieve better estimations, you should understand the constraints requirements at the database level. For instance, estimating the number of users that will make requests to the database and processing time. You can refer to Key things to Monitor for MongoDB. A most preferable limit for max-user-processes and open-files are 64000. To set these values create a new file if it does not exist as /etc/security/limits.d and add these lines

mongod       soft        nofile       64000
mongod       hard        nofile       64000
mongod       soft        nproc        64000
mongod       hard        nproc        64000

For you to apply this changes, restart your mongod since the changes apply only to new shells.

File System and Options

MongoDB employs 3 type of filesystems that is, ext3, ext4, and XFS for on-disk database data. For the WiredTiger storage engine employed for MongoDB version greater than 3, the XFS is best used rather than ext4 which is considered to create some stability issues while ext3 is also avoided due to its poor pre-allocation performance. MongoDB does not use the default filesystem technique of performing an access-time metadata update like other systems. You can therefore disable access-time updates to save on the small amount of disk IO activity utilized by these updates.

This can be done by adding a flag noatime to the file system options field in the file etc/fstab for the disk serving MongoDB data.

$ grep "/var/lib/mongo" /proc/mounts
/dev/mapper/data-mongodb /var/lib/mongo ext4 rw, seclabel, noatime, data=ordered 0 0

This change can only be realized when your reboot or restart your MongoDB.

Security

Among the several security features a Linux system has, at kernel-level is the Security-Enhanced Linux. This is an implementation of fine-grained Mandatory Access Control. It provides a bridge to the security policy to determine whether an operation should proceed. Unfortunately, many Linux users set this access control module to warn only or they disable it totally. This is often due to some associated setbacks such as unexpected permission denied error. This module, as much as many people ignore it, plays a major role in reducing local attacks to the server. With this feature enabled and the correspondent modes set to positive, it will provide a secure background for your MongoDB. Therefore, you should enable the SELinux mode and also apply the Enforcing mode especially at the beginning of your installation. To change the SELinux mode to Enforcing: run the command

$ sudo setenforce Enforcing

You can check the running SELinux mode by running

$ sudo getenforce
Severalnines
 
Become a MongoDB DBA - Bringing MongoDB to Production
Learn about what you need to know to deploy, monitor, manage and scale MongoDB

Virtual Memory

Dirty ratio

MongoDB employs the cache technology to enhance quick fetching of data. In this case, dirty pages are created and some memory will be required to hold them. Dirty ratio therefore becomes the percentage of the total system memory that can hold dirty pages. In most cases, the default values are between (25 - 35)%. If this value is surpassed, then the pages are committed to disk and have an effect of creating a hard pause. To avoid this, you can set the kernel to always flush data through another ratio referred to as dirty_background_ratio whose value ranges between (10% - 15%) to disk in the background without necessarily creating the hard pause.

The aim here is to ensure quality query performance. You can therefore reduce the background ratio if your database system will require large memory. If a hard pause is allowed, you might end up having data duplicates or some data may fail to be recorded during that time. You can also reduce the cache size to avoid data being written to disk in small batches frequently that may end up increasing the disk throughput. To check the currently running value you can run this command:

$ sysctl -a | egrep “vm.dirty.*_ratio”

and you will be presented with something like this.

vm.dirty_background_ratio = 10
vm.dirty_ratio = 20

Swappiness

It is a value ranging from 1 to 100 for which the Virtual Memory manager behaviour can be influenced from. Setting it to 100 implies to swap forcefully to disk and set it to 0 directs the kernel to swap only to shun out-of-memory problems. The default range for Linux is 50 - 60 of which is not appropriate for database systems. In my own test, setting the value between 0 to 10 is optimal. You can always set this value in the /etc/sysctl.conf

vm.swappiness = 5

You can then check this value by running the command

$ sysctl vm.swappiness

For you to apply these changes run the command /sbin/sysctl -p or you can reboot your system.

Decoding the MongoDB Error Logs

$
0
0

Sometimes decoding MongoDB error logs can be tricky and can consume big chunks of your valuable time. In this article, we will learn how to examine the MongoDB error logs by dissecting each part of the log messages.

Common Format for MongoDB Log Lines

Here is the log line pattern for version 3.0 and above...

<timestamp> <severity> <component> [<context>] <message>

Log line pattern for previous versions of MongoDB only included:

<timestamp> [<context>] <message>

Let’s look at each tag.

Timestamps

Timestamp field of log message stores the exact time when a log message was inserted in the log file. There are 4 types of timestamps supported by MongoDB. The default format is: iso8601-local. You can change it using --timeStampFormat parameter.

Timestamp Format NameExample
iso8601-local1969-12-31T19:00:00.000+0500
iso8601-utc1970-01-01T00:00:00.000Z
ctimeWed Dec 31 19:00:00.000
ctime-no-msWed Dec 31 19:00:00

Severity

The following table describes the meaning of all possible severity levels.

Severity LevelDescription
FFatal- The database error has caused the database to no longer be accessible
EError - Database errors which will stop DB execution.
WWarning - Database messages which explains potentially harmful behaviour of DB.
IInformational - Messages just for information purpose like ‘A new connection accepted’.
DDebug - Mostly useful for debugging the DB errors

Component

After version 3.0, log messages now include “component” to provide a functional categorization of the messages. This allows you to easily narrow down your search by looking at the specific components.

ComponentError Description
AccessRelated to access control
CommandRelated to database commands
ControlRelated to control activities
FTDCRelated to diagnostic data collection activities
GeoRelated to parsing geospatial shapes
IndexRelated to indexing operations
NetworkRelated to network activities
QueryRelated to queries
REPLRelated to replica sets
REPL_HBRelated to replica sets heartbeats
RollbackRelated to rollback db operations
ShardingRelated to sharding
StorageRelated to storage activities
JournalRelated to journal activities
WriteRelated to db write operations

Context

Context part of the error message generally contains the thread or connection id. Other values can be initandlisten. This part is surrounded by square brackets. Log messages of any new connection to MongoDB will have context value as initandlisten, for all other log messages, it will be either thread id or connection id. For e.g

2018-05-29T19:06:29.731+0000 [initandlisten] connection accepted from 127.0.0.1:27017 #1000 (13 connections now open)
2018-05-29T19:06:35.770+0000 [conn1000] end connection 127.0.0.1:27017 (12 connections now open)

Message

Contains the actual log message.

Log File Location

The default location on the server is: /var/log/mongodb/mongodb.log

If log file is not present at this location then you can check in the MongoDB config file. You can find MongoDB config file at either of these two locations.

/etc/mongod.conf or /yourMongoDBpath/mongod.conf

Once you open the config file, search for logpath option in it. logpath option tells MongoDB where to log all the messages.

Analyzing a Simple Log Message

Here is an example of a typical MongoDB error message...

2014-11-03T18:28:32.450-0500 I NETWORK [initandlisten] waiting for connections on port 27017

Timestamp: 2014-11-03T18:28:32.450-0500
Severity: I
Component: NETWORK
Context: [initandlisten]
Message: waiting for connections on port 27017

The most important part of this error is the message portion. In most of the cases, you can figure out the error by looking at this field. Sometimes if the message is not clear to you, then you can go for the component part. For this message, Component’s value is Network which means the log message is related to a network issue.

If you are not able to resolve your error, you can check the severity of the log message which says this message is for informational purpose. Further, you can also check out other parts of the log message like timestamp or context to find the complete root cause.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Decoding Common Error Log Messages

  1. Message:

    2018-05-10T21:19:46.942 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.

    Resolution: Create admin user in authentication database

  2. Message:

    2018-05-10T21:19:46.942 E COMMAND  [initandlisten] ** ERROR: getMore command failed. Cursor not found

    Resolution: Remove the timeout from the cursor or increase the cursor batch size.

  3. Message:

    2018-05-10T21:19:46.942 E INDEX  [initandlisten] ** ERROR:E11000 duplicate key error index: test.collection.$a.b_1 dup key: { : null }

    Resolution: Violation of unique key constraint. Try to insert document with different key.

  4. Message:

    2018-05-10T21:19:46.942 E NETWORK  [initandlisten] ** ERROR:Timed out connecting to localhost:27017.

    Resolution: Latency between the driver and the server is too great, the driver may give up. You can change setting by adding connectionTimeout option in connection string.

  5. Message:

    2018-05-10T21:19:46.942 E WRITE  [initandlisten] ** ERROR: A write operation resulted in an error. E11000 duplicate key error index: test.people.$_id_ dup key: { : 0 }

    Resolution: Remove duplication of _id field value from conflicting documents.

  6. Message:

    2018-05-10T21:19:46.942 E NETWORK  [initandlisten] ** ERROR: No connection could be made because the target machine actively refused it 127.0.0.1:27017 at System.Net.Sockets.Socket.EndConnect

    Resolution: Either server is not running on port 27017 or try to restart the server with correct host and port.

Log Management Tools

MongoDB 3.0 has updated its logging features to provide better insights for all database activities. There are many log management tools available in the market like MongoDB Ops Manager, log entries, mtools etc.

Conclusion

Logging is as important as Replication or Sharding for good and proper database management. For better database management, one should be able to decode the logs easily to rectify the exceptions/errors quickly. I hope that after reading this tutorial, you will feel more comfortable while analyzing complex MongoDB logs.

How to Optimize Performance of MongoDB

$
0
0

Excellent database performance is important when you are developing applications with MongoDB. Sometimes the overall data serving process may become degraded due to a number of reasons, some of which include:

  • Inappropriate schema design patterns
  • Improper use of or no use of indexing strategies
  • Inadequate hardware
  • Replication lag
  • Poorly performing querying techniques

Some of these setbacks might force you to increase hardware resources while others may not. For instance, poor query structures may result in the query taking a long time to be processed, causing replica lag and maybe even some data loss. In this case, one may think that maybe the storage memory is not enough, and that it probably needs scaling up. This article discusses the most appropriate procedures you can employ to boost the performance of your MongoDB database.

Schema Design

Basically the two most commonly employed schema relationships are...

  • One-to-Few
  • One-to-Many

While the most efficient schema design is the One-to-Many relationship, each has got its own merits and limitations.

One-to-Few

In this case, for a given field, there are embedded documents but they are not indexed with object identity.

Here is a simple example:

{
      userName: "Brian Henry",
      Email : "example@gmail.com",
      grades: [
             {subject: ‘Mathematics’,  grade: ‘A’},
             {subject: English,  grade: ‘B’},
      ]
}

One advantage of using this relationship is that you can get the embedded documents with just a single query. However, from a querying standpoint, you cannot access a single embedded document. So if you are not going to reference embedded documents separately, it will be optimal to use this schema design.

One-to-Many

For this relationship data in one database is related to data in a different database. For example, you can have a database for users and another for posts. So if a user makes a post it is recorded with user id.

Users schema

{ 
    Full_name: “John Doh”,
    User_id: 1518787459607.0
}

Posts schema

{
    "_id" : ObjectId("5aa136f0789cf124388c1955"),
    "postTime" : "16:13",
    "postDate" : "8/3/2018",
    "postOwnerNames" : "John Doh",
    "postOwner" : 1518787459607.0,
    "postId" : "1520514800139"
}

The advantage with this schema design is that the documents are considered as standalone (can be selected separately). Another advantage is that this design enables users of different ids to share information from the posts schema (hence the name One-to-Many) and sometimes can be “N-to-N” schema - basically without using table join. The limitation with this schema design is that you have to do at least two queries to fetch or select data in the second collection.

How to model the data will therefore depend on the application’s access pattern. Besides this you need to consider the schema design we have discussed above.

Optimization Techniques for Schema Design

  1. Employ document embedding as much as possible as it reduces the number of queries you need to run for a particular set of data.

  2. Don’t use denormalization for documents that are frequently updated. If anfield is going to be frequently updated, then there will be the task of finding all the instances that need to be updated. This will result in slow query processing, hence overwhelming even the merits associated with denormalization.

  3. If there is a need to fetch a document separately, then there is no need to use embedding since complex queries such as aggregate pipelining take more time to execute.

  4. If the array of documents to be embedded is large enough, don’t embed them. The array growth should at least have a bound limit.

Proper Indexing

This is the more critical part of performance tuning and requires one to have a comprehensive understanding on the application queries, ratio of reads to writes, and how much free memory your system has. If you use an index, then the query will scan the index and not the collection.

An excellent index is one that involves all the fields scanned by a query. This is referred to as a compound index.

To create a single index for a fields you can use this code:

db.collection.createIndex({“fields”: 1})

For a compound index, to create the indexing:

db.collection.createIndex({“filed1”: 1, “field2”:  1})

Besides faster querying by use of indexing, there is an addition advantage of other operations such as sort, samples and limit. For example, if I design my schema as {f: 1, m:1} i can do an additional operation apart from find as

db.collection.find( {f: 1} ).sort( {m: 1} )

Reading data from RAM is more efficient that reading the same data from disk. For this reason, it is always advised to ensure that your index fits entirely in the RAM. To get the current indexSize of your collection, run the command :

db.collection.totalIndexSize()

You will get a value like 36864 bytes. This value should also not be taking a large percentage of the overall RAM size, since you need to cater for the needs of the entire working set of the server.

An efficient query should also enhance Selectivity. Selectivity can be defined as the ability of a query to narrow the result using the index. To be more secant, your queries should limit the number of possible documents with the indexed field. Selectivity is mostly associated with a compound index which includes a low-selectivity field and another field. For example if you have this data:

{ _id: ObjectId(), a: 6, b: "no", c: 45 }
{ _id: ObjectId(), a: 7, b: "gh", c: 28 }
{ _id: ObjectId(), a: 7, b: "cd", c: 58 }
{ _id: ObjectId(), a: 8, b: "kt", c: 33 }

The query {a: 7, b: “cd”} will scan through 2 documents to return 1 matching document. However if the data for the value a is evenly distributed i.e

{ _id: ObjectId(), a: 6, b: "no", c: 45 }
{ _id: ObjectId(), a: 7, b: "gh", c: 28 }
{ _id: ObjectId(), a: 8, b: "cd", c: 58 }
{ _id: ObjectId(), a: 9, b: "kt", c: 33 }

The query {a: 7, b: “cd”} will scan through 1 document and return this document. Hence this will take shorter time than the first data structure.

ClusterControl
Single Console for Your Entire Database Infrastructure
Find out what else is new in ClusterControl

Resources Provisioning

Inadequate storage memory, RAM and other operating parameters can drastically degrade the performance of a MongoDB. For instance, if the number of user connections is very large, it will hinder the ability of the server application from handling requests in a timely manner. As discussed in Key things to monitor in MongoDB, you can get an overview of which limited resources you have and how you can scale them to suit your specifications. For a large number of concurrent application requests, the database system will be overwhelmed in keeping up with the demand.

Replication Lag

Sometimes you may notice some data missing from your database or when you delete something, it appears again. As much as you could have well designed schema, appropriate indexing and enough resources, in the beginning your application will run smoothly without any hiccups but then at some point you notice the latter mentioned problems. MongoDB relies on replication concept where data is redundantly copied to meet some design criteria. An assumption with this is that the process is instantaneous. However, some delay may occur maybe due to network failure or unhandled errors. In a nutshell, there will be a large gap between the time with which an operation is processed on the primary node and the time it will be applied in the secondary node.

Setbacks with Replica Lags

  1. Inconsistent data. This is especially associated with read operations that are distributed across secondaries.

  2. If the lag gap is wide enough, then a lot of unreplicated data may be on the primary node and will need to be reconciled in the secondary node. At some point, this may be impossible especially when the primary node cannot be recovered.

  3. Failure to recover the primary node can force one to run a node with data which is not up to date and consequently may drop the whole database in order to make the primary to recover.

Causes of the Secondary Node Failure

  1. Outmatching primary power over the secondary regarding the CPU, disk IOPS and network I/O specifications.

  2. Complex write operations. For example a command like

    db.collection.update( { a: 7}  , {$set: {m: 4} }, {multi: true} )

    The primary node will record this operation in the oplog quick enough. However, for the secondary node, it has to fetch those ops, read into RAM any index and data pages in order to meet some criteria specifications such as the id. Since it has to do this quick enough in order to keep the rate with the primary node does the operation, if the number of ops is large enough then there will be an expected lag.

  3. Locking of the secondary when making a backup. In this case we may forget to disable the primary hence will continue with its operations as normal. At the time when the lock will be released, replication lag will have be of a large gap especially when dealing with a huge amount of data backup.

  4. Index building. If an index builds up in the secondary node, then all other operations associated with it are blocked. If the index is long-running then the replication lag hiccup will be encountered.

  5. Unconnected secondary. Sometimes the secondary node may fail due to network disconnections and this results in a replication lag when it is reconnected.

How to Minimize the Replication Lag

  • Use unique indexes besides your collection having the _id field. This is to avoid the replication process from failing completely.

  • Consider other types of backup such as point-in-time and filesystem snapshots which not necessarily require locking.

  • Avoid building large indexes since they cause background blocking operation.

  • Make the secondary powerful enough. If the write operation is of lightweight, then using underpowered secondaries will be economical. But, for heavy write loads, the secondary node may lag behind the primary. To be more seccant, the secondary should have enough bandwidth to help reading oplogs fast enough in order to keep its rate with the primary node.

Efficient Query Techniques

Beside creating indexed queries and using Query Selectivity as discussed above, there are other concepts you can employ to fasten and make your queries effective.

Optimizing Your Queries

  1. Using a covered query. A covered query is one which is always completely satisfied by an index hence does not need to examine any document. The covered query therefore should have all fields as part of the index and consequently the result should contain all these fields.

    Let’s consider this example:

    {_id: 1, product: { price: 50 }

    If we create an index for this collection as

    {“product.price”: 1} 

    Considering a find operation, then this index will cover this query;

    db.collection.find( {“product.price”: 50}, {“product.price”: 1, _id: 0}  )

    and return the product.price field and value only.

  2. For embedded documents, use the dot notation (.). The dot notation helps in accessing elements of an array and fields of embedded document.

    Accessing an array:

    {
       prices: [12, 40, 100, 50, 40]  
    }

    To specify the fourth element for example, you can write this command:

    “prices.3”

    Accessing an object array:

    {
    
       vehicles: [{name: toyota, quantity: 50},
                 {name: bmw, quantity: 100},
                 {name: subaru, quantity: 300}                    
    } 

    To specify the name field in the vehicles array you can use this command

    “vehicles.name”
  3. Check if a query is is covered. To do this use the db.collection.explain(). This function will provide information on the execution of other operations -e.g. db.collection.explain().aggregate(). To learn more about the explain function you can check out explain().

In general, the supreme technique as far as querying is concerned is using indexes. Querying only an index is much faster than querying documents outside of the index. They can fit in memory hence available in RAM rather than in disk. This makes the easy and fast enough to fetch them from memory.

Viewing all 206 articles
Browse latest View live