MongoDB performance analysis and tuning
In this tutorial, you will learn how to:
- Use performance diagnostic tools and monitoring tools.
- Troubleshoot identified issues.
Yandex StoreDoc cluster performance drops most often due to one of the following:
- High CPU and disk I/O utilization.
- Inefficient query execution in MongoDB.
- Locks.
- Insufficient disk space.
Here are some tips for diagnosing and fixing these issues.
Getting started
- Install the
mongostatandmongotoputilities on an external host with network access to your MongoDB host (see Pre-configuring a connection to a Yandex StoreDoc cluster) to receive MongoDB performance data. - Determine which databases need to be checked for issues.
- Create a MongoDB user with the
mdbMonitorrole for these databases. You need to do this in order to usemongostatandmongotop.
Diagnosing resource shortages
If any of the CPU and disk I/O resources "hits a plateau", i.e., the graph that had been steadily ascending levels off, it is probably because the resource is in short supply, resulting in reduced performance. This usually happens when the resource usage reaches its limit.
In most cases, high CPU utilization and high Disk IO are due to suboptimal indexes or a large load on the hosts.
Start diagnostics by identifying the load pattern and problematic collections. Use the built-in MongoDB monitoring tools. Next, analyze the performance of specific queries using logs or profiler data.
Pay attention to queries:
- Not using indexes (
planSummary: COLLSCAN). Such queries may affect both I/O consumption (more reads from the disk) and CPU consumption (data is compressed by default and decompression is required for it). If the required index is present, but the database does not use it, you can force its usage withhint. - With large
docsExaminedvalues (number of scanned documents). This may mean that the currently running indexes are inefficient or additional ones are required.
As soon as performance drops, you can diagnose the problem in real time using a list of currently running queries:
To run these queries, users needs the mdbMonitor role.
-
Long queries, such as those taking more than one second to execute:
db.currentOp({"active": true, "secs_running": {"$gt": 1}}) -
Queries to create indexes:
db.currentOp({ $or: [{ op: "command", "query.createIndexes": { $exists: true } }, { op: "none", ns: /\.system\.indexes\b/ }] })
-
Long queries, such as those taking more than one second to execute:
db.currentOp({"$ownOps": true, "active": true, "secs_running": {"$gt": 1}}) -
Queries to create indexes:
db.currentOp({ "$ownOps": true, $or: [{ op: "command", "query.createIndexes": { $exists: true } }, { op: "none", ns: /\.system\.indexes\b/ }] })
Troubleshooting resource shortage issues
Try optimizing the identified queries. If the load is still high or there is nothing to optimize, the only option is to upgrade the host class.
Diagnosing inefficient query execution
To identify problematic queries in MongoDB:
-
Review the logs. Pay special attention to:
- For read queries, the
responseLengthfield (written asreslenin the logs). - For write queries, the number of affected documents.
In the cluster logs, they are displayed in thenModified,keysInserted, andkeysDeletedfields. On the cluster monitoring page, analyze the Documents affected on primary, Documents affected on secondaries, and Documents affected per host graphs.
- For read queries, the
-
Review the profiler data. Output long-running queries (adjustable with the
slowOpThresholdDBMS setting).
Troubleshooting issues with inefficient queries
Each individual query can be analyzed in terms of the query plan.
Analyze the graphs on the cluster monitoring page:
- Index size on primary, top 5 indexes.
- Scan and order per host.
- Scanned / returned.
To narrow down the search scope quicker, use indexes.
Warning
Each new index slows down writes. Too many indexes may negatively affect write performance.
To optimize read requests, use a projection. In many cases, you need to return only a few fields rather than the entire document.
If you can neither optimize the queries you found nor go without them, upgrade the host class.
Diagnosing locks
Poor query performance can be caused by locks.
MongoDB does not provide detailed information on locks. There are only indirect ways to find out what is locking a specific query:
-
Pay attention to large or growing
db.serverStatus().metrics.operation.writeConflictsvalues: they may indicate high write contention on some documents. -
Examine large or growing values using the Write conflicts per hosts graph on the cluster monitoring page.
-
As soon as performance drops, carefully review the list of currently running queries:
Queries from all usersQueries from the current userTo run these queries, the user needs the
mdbMonitorrole.-
Find queries that hold exclusive locks, such as:
db.currentOp({'$or': [{'locks.Global': 'W'}, {'locks.Database': 'W'}, {'locks.Collection': 'W'} ]}).inprog -
Find queries waiting for locks (the
timeAcquiringMicrosfield shows the waiting time):db.currentOp({'waitingForLock': true}).inprog db.currentOp({'waitingForLock': true, 'secs_running' : { '$gt' : 1 }}).inprog
-
Find queries that hold exclusive locks, such as:
db.currentOp({"$ownOps": true, '$or': [{'locks.Global': 'W'}, {'locks.Database': 'W'}, {'locks.Collection': 'W'} ]}).inprog -
Find queries waiting for locks (the
timeAcquiringMicrosfield shows the waiting time):db.currentOp({"$ownOps": true, 'waitingForLock': true}).inprog db.currentOp({"$ownOps": true, 'waitingForLock': true, 'secs_running' : { '$gt' : 1 }}).inprog
-
-
Pay attention to the following in the logs and profiler:
- Queries that had waited a long time to get locks will have large
timeAcquiringMicrosvalues. - Queries that had competed for the same documents will have large
writeConflictsvalues.
- Queries that had waited a long time to get locks will have large
Troubleshooting locking issues
Detected locks indicate unoptimized queries. Try optimizing problematic queries.
Diagnosing insufficient disk space
If a cluster shows poor performance combined with a small amount of free disk space, one or more hosts in the cluster may have switched to the "read-only" mode.
The amount of used disk space is displayed on the Disk space usage per host, top 5 hosts graphs on the cluster monitoring page.
To monitor cluster host storage utilization, configure an alert.
Troubleshooting disk space issues
For recommendations on troubleshooting these issues, see Maintaining a cluster in operable condition.