About Monium
Monium is a platform you can use to monitor and analyze Yandex Cloud services or your own infrastructure and applications.
Telemetry types
Monium supports collecting telemetry:
-
Metrics: Numerical indicators measured over time (e.g., RPS, CPU load). Used for charts and alerts.
-
Logs: Structured records of events in an application or infrastructure (e.g., start messages, error messages). Used for system diagnostics.
-
Traces: Linked chain of operations for a specific request, showing the path and execution time of each step. Used to monitor distributed systems.
Telemetry transmission
You can use the following for telemetry transmission:
- OpenTelemetry-compatible agents, e.g., OTel Collector (for all telemetry types), Fluent Bit (for logs).
- Unified Agent, Yandex's data collection and delivery agent.
- Sending directly from the application via the OpenTelemetry SDK.
For collection of Prometheus metrics, there is integration via Yandex Managed Service for Prometheus®.
Monium currently only accepts OpenTelemetry (OTLP)
Going forward, the platform is going to get more observability tools.
Telemetry distribution
Monium achieves logical separation of telemetry data using entities: project, cluster, service, and shard.
-
Project: Top-level logical entity. You can use projects to aggregate telemetry from associated applications and microservices and restrict data access for development teams. Some examples include an online store, billing system, or security services.
-
Cluster: Allows you to isolate an environment or independent service installations, e.g., production and test clusters, clusters in different regions.
-
Service: Standalone client application generating telemetry data. This may be a microservice or its component, e.g., Nginx, Envoy, or Compute Cloud VM instance.
-
Shard: Container for data of a specific service-cluster pair and data storage settings, e.g., TTL
.
The project, cluster, and service objects define the data source, and the shard defines the storage rules.
Find the description of other Monium objects and terms in the Basic terms section.
Platform feature overview
Metrics
Metrics are real time numerical indicators of system performance. Their common use cases include:
- Monitoring CPU, memory, and network usage.
- Analyzing trends and performance.
- Detecting anomalies and bottlenecks.
Logs
Logs are structured records of events and messages that help you:
- Investigate specific incidents in detail.
- Analyze errors and exceptions.
- Audit user and system activity.
You can temporarily disable alert notifications by creating a mute.
Traces
Traces are visual representations of request paths in distributed applications, which enable you to:
- Identify bottlenecks in microservice chains.
- Analyze latency between system components.
- Understand dependencies in complex architectures.
- Examine requests and responses during LLM agent monitoring.
Alerts
Alerts are automated notifications for critical events, for which you can set up the following:
- Event trigger rules, e.g., a sudden change in a metric.
- Notifications via messaging apps, email, phone call, or cloud function.
Alerts help you respond to issues before they affect users or at least minimize that impact.
Dashboards and Metric Explorer
Real-time visualization of system data and key indicators enables you to:
- Gain a unified view of your system's health.
- Collect data from multiple sources.
- Analyze performance and forecast trends.
- Drill down from high-level overviews to granular details to investigate issues and their root causes.
Learn more about dashboards and Metric Explorer.
Monium delivers end-to-end visibility into your systems, reducing troubleshooting time and empowering data-driven decisions.