MongoDB performance analysis and tuning
In this tutorial, you will learn how to:
- Use performance diagnostic tools and monitoring tools.
- Troubleshoot identified issues.
Managed Service for MongoDB cluster performance drops most often due to one of the following:
- High CPU and disk I/O utilization.
- Inefficient query execution in MongoDB.
- Locks.
- Insufficient disk space.
Here are some tips for diagnosing and fixing these issues.
Getting started
- Install the
mongostat
andmongotop
utilities on an external host with network access to your MongoDB host (see Pre-configuring a connection to a MongoDB cluster) to receive MongoDB performance data. - Determine which databases need to be checked for issues.
- Create a MongoDB user with the
mdbMonitor
role for these databases. You need to do this in order to usemongostat
andmongotop
.
Diagnosing resource shortages
If some of the CPU and disk I/O resources "hit a plateau", that is, the curve of the graph that was previously steadily ascending has leveled off, it's probably due to a resource shortage, which led to reduced performance. This usually happens when the resource usage reaches its limit.
In most cases, high CPU utilization and high Disk IO are due to suboptimal indexes or a large load on the hosts.
Start diagnostics by identifying the load pattern and problematic collections. Use the built-in MongoDB monitoring tools. Next, analyze the performance of specific queries using logs or profiler data.
Pay attention to queries:
- Not using indexes (
planSummary: COLLSCAN
). Such queries may affect both I/O consumption (more reads from the disk) and CPU consumption (data is compressed by default and decompression is required for it). If the required index is present, but the database does not use it, you can force its usage with hint . - With large
docsExamined
values (number of scanned documents). This may mean that the currently running indexes are inefficient or additional ones are required.
As soon as performance drops, you can diagnose the problem in real time using a list of currently running queries:
To run these queries, users needs the mdbMonitor
role.
-
Long queries, such as those taking more than one second to execute:
db.currentOp({"active": true, "secs_running": {"$gt": 1}})
-
Queries to create indexes:
db.currentOp({ $or: [{ op: "command", "query.createIndexes": { $exists: true } }, { op: "none", ns: /\.system\.indexes\b/ }] })
-
Long queries, such as those taking more than one second to execute:
db.currentOp({"$ownOps": true, "active": true, "secs_running": {"$gt": 1}})
-
Queries to create indexes:
db.currentOp({ "$ownOps": true, $or: [{ op: "command", "query.createIndexes": { $exists: true } }, { op: "none", ns: /\.system\.indexes\b/ }] })
See also the examples in the MongoDB documentation
Troubleshooting resource shortage issues
Try optimizing the identified queries. If the load is still high or there is nothing to optimize, the only option is to upgrade the host class.
Diagnosing inefficient query execution
To identify problematic queries in MongoDB:
-
Review the logs. Pay special attention to:
- For read queries, the
responseLength
field (written asreslen
in the logs). - For write queries, the number of affected documents.
In the cluster logs, they are displayed in thenModified
,keysInserted
, andkeysDeleted
fields. On the cluster monitoring page, analyze the Documents affected on primary, Documents affected on secondaries, and Documents affected per host graphs.
- For read queries, the
-
Review the profiler data. Output long-running queries (adjustable with the
slowOpThreshold
DBMS setting).
Troubleshooting issues with inefficient queries
Each individual query can be analyzed in terms of the query plan. Learn more about this in the MongoDB documentation:
Analyze the graphs on the cluster monitoring page:
- Index size on primary, top 5 indexes.
- Scan and order per host.
- Scanned / returned.
To more quickly narrow down the search scope, use indexes
Warning
Each new index slows down writes. Too many indexes may negatively affect write performance.
To optimize read requests, use projection
If you can neither optimize the queries you found nor go without them, upgrade the host class.
Diagnosing locks
Poor query performance can be caused by locks.
MongoDB does not provide detailed information on locks. There are only indirect ways to find out what is locking a specific query:
-
Pay attention to large or growing
db.serverStatus().metrics.operation.writeConflicts
values: they may indicate high write contention on some documents. -
Examine large or growing values using the Write conflicts per hosts graph on the cluster monitoring page.
-
As soon as performance drops, carefully review the list of currently running queries:
Queries from all usersQueries from the current userTo run these queries, the user needs the
mdbMonitor
role.-
Find queries that hold exclusive locks, such as:
db.currentOp({'$or': [{'locks.Global': 'W'}, {'locks.Database': 'W'}, {'locks.Collection': 'W'} ]}).inprog
-
Find queries waiting for locks (the
timeAcquiringMicros
field shows the waiting time):db.currentOp({'waitingForLock': true}).inprog db.currentOp({'waitingForLock': true, 'secs_running' : { '$gt' : 1 }}).inprog
-
Find queries that hold exclusive locks, such as:
db.currentOp({"$ownOps": true, '$or': [{'locks.Global': 'W'}, {'locks.Database': 'W'}, {'locks.Collection': 'W'} ]}).inprog
-
Find queries waiting for locks (the
timeAcquiringMicros
field shows the waiting time):db.currentOp({"$ownOps": true, 'waitingForLock': true}).inprog db.currentOp({"$ownOps": true, 'waitingForLock': true, 'secs_running' : { '$gt' : 1 }}).inprog
-
-
Pay attention to the following in the logs and profiler:
- Queries that had waited a long time to get locks will have large
timeAcquiringMicros
values. - Queries that had competed for the same documents will have large
writeConflicts
values.
- Queries that had waited a long time to get locks will have large
Learn more about which locks are used by standard client
Troubleshooting locking issues
Detected locks indicate unoptimized queries. Try optimizing problematic queries.
Diagnosing insufficient disk space
If a cluster shows poor performance combined with a small amount of free disk space, this is probably because one or more hosts in the cluster switched to "read-only".
The amount of used disk space is displayed on the Disk space usage per host, top 5 hosts graphs on the cluster monitoring page.
To monitor storage usage on cluster hosts, configure an alert.
Troubleshooting disk space issues
For recommendations on troubleshooting these issues, see Maintaining a cluster in operable condition.