Performance analysis and tuning of Yandex StoreDoc
In this tutorial, you will learn how to:
- Use performance diagnostic and monitoring tools.
- Troubleshoot identified issues.
Yandex StoreDoc cluster performance decline most often stems from one of the following causes:
The following are tips for diagnosing and resolving these issues.
Getting started
- Install the
mongostatandmongotoputilities on an external host with network access to your MongoDB host (see Pre-configuring a Yandex StoreDoc cluster connection) to receive MongoDB performance data. - Determine which databases need to be checked for issues.
- To use
mongostatandmongotop, create a MongoDB user with themdbMonitorrole for these databases.
Diagnosing resource shortages
If any of the CPU and disk I/O resources plateaued, i.e., a previously growing chart leveled off, it may indicate that the resource has become a bottleneck, leading to performance degradation. This usually happens when resource consumption reaches its limit.
In most cases, high CPU and disk I/O usage are caused by inefficient indexes or excessive host workload.
Start diagnostics by analyzing the workload pattern and identifying problematic collections using the built-in MongoDB monitoring tools. Next, analyze the performance of specific queries using logs or profiler data.
Take note of the following queries:
- Non-indexed queries (
planSummary: COLLSCAN). Such queries can increase both I/O consumption due to more disk reads and CPU usage, since data is compressed by default and requires decompression. If the required index exists but the database is not using it, you can force index usage viahint. - Queries with large
docsExaminedvalues, indicating a high volume of scanned documents. This may indicate that the existing indexes are inefficient or that additional ones are required.
When performance drops, you can diagnose the problem in real time using the list of active queries:
To view these queries, you need the mdbMonitor role.
-
Long queries, e.g., those running longer than one second:
db.currentOp({"active": true, "secs_running": {"$gt": 1}}) -
Index creation queries:
db.currentOp({ $or: [{ op: "command", "query.createIndexes": { $exists: true } }, { op: "none", ns: /\.system\.indexes\b/ }] })
-
Long queries, e.g., those running longer than one second:
db.currentOp({"$ownOps": true, "active": true, "secs_running": {"$gt": 1}}) -
Index creation queries:
db.currentOp({ "$ownOps": true, $or: [{ op: "command", "query.createIndexes": { $exists: true } }, { op: "none", ns: /\.system\.indexes\b/ }] })
Troubleshooting resource shortages
Try optimizing the problematic queries you have identified. If the load remains high after optimization, you have to upgrade the host class.
Diagnosing inefficient queries
To identify problematic queries in MongoDB:
-
Review the logs. Pay special attention to the following:
- For read queries, look at the
responseLength(reslen) field. - For write queries, look at the number of documents affected.
In thenModified,keysInserted, andkeysDeletedfields of the cluster logs. On the cluster monitoring page, review the following charts: Documents affected on primary, Documents affected on secondaries, and Documents affected per host.
- For read queries, look at the
-
Review the profiler data. Retrieve long-running queries by using the
slowOpThresholdDBMS setting.
Troubleshooting inefficient queries
You can analyze the execution plan of each individual query.
Examine the charts on the cluster monitoring page:
- Index size on primary, top 5 indexes.
- Scan and order per host.
- Scanned / returned.
Use indexes to speed up queries.
Warning
Each index you add slows down write operations. An excessive number of indexes may negatively affect write performance.
Use projection to optimize read queries. In many cases, you do not need to retrieve the entire document; a subset of its fields is enough.
If you can neither optimize troublesome queries nor eliminate them, you have to upgrade the host class.
Diagnosing locks
Poor query performance can result from locks.
MongoDB does not provide detailed information about locks. You can only use indirect methods to find out what is locking a specific query:
-
Pay attention to large or growing
db.serverStatus().metrics.operation.writeConflictsvalues, as they may indicate high write contention on certain documents. -
Examine large or growing values using the Write conflicts per hosts graph on the cluster monitoring page.
-
When performance drops, closely review the list of currently running queries:
All users’ queries:Current user’s queries:To view these queries, you need the
mdbMonitorrole.-
Identify queries that hold exclusive locks, such as:
db.currentOp({'$or': [{'locks.Global': 'W'}, {'locks.Database': 'W'}, {'locks.Collection': 'W'} ]}).inprog -
Identify queries waiting for locks; their wait time is shown in the
timeAcquiringMicrosfield:db.currentOp({'waitingForLock': true}).inprog db.currentOp({'waitingForLock': true, 'secs_running' : { '$gt' : 1 }}).inprog
-
Identify queries that hold exclusive locks, such as:
db.currentOp({"$ownOps": true, '$or': [{'locks.Global': 'W'}, {'locks.Database': 'W'}, {'locks.Collection': 'W'} ]}).inprog -
Identify queries waiting for locks; their wait time is shown in the
timeAcquiringMicrosfield:db.currentOp({"$ownOps": true, 'waitingForLock': true}).inprog db.currentOp({"$ownOps": true, 'waitingForLock': true, 'secs_running' : { '$gt' : 1 }}).inprog
-
-
In the logs and profiler, note the following:
- Queries with large
timeAcquiringMicrosvalues, indicating long wait time to acquire locks. - Queries with large
writeConflictsvalues, indicating contention for the same documents.
- Queries with large
Troubleshooting locking issues
The detected locks indicate unoptimized queries. Try optimizing the problematic queries.
Diagnosing disk space shortages
If a cluster shows poor performance when its free disk space is limited, one or more cluster hosts may have switched to "read-only" mode.
The amount of used disk space is displayed on the Disk space usage per host, top 5 hosts graphs on the cluster monitoring page.
Configure an alert to track storage use on cluster hosts.
Troubleshooting disk space issues
For troubleshooting recommendations, see Maintaining a cluster in operable condition.