Performance and System Monitoring¶
The Testing Automation Engine includes a pre-installed instance of the third-party tool Prometheus for monitoring and performance analytics. This setup provides critical insights into system and process performance metrics.
Table of Contents¶
- Performance Metrics Table
- Meaning of Parameters
- Key Features of Prometheus
- Data Centralization with Remote Write
- KPI Export and Grafana Integration
- Benefits of Prometheus-Grafana Setup
Performance Metrics Table¶
| Metric | Parameters | Description |
|---|---|---|
| device_battery_level | Device test_node instance job tenant | Monitors the battery level of devices during testing. |
| device_network_operator | Device test_node instance job tenant | Tracks the network operator used by devices during testing. |
| device_radio_access_type | Device test_node instance job tenant | Indicates the radio access type used by devices during testing. |
| device_signal_strength | Device test_node instance job tenant | Measures the signal strength of devices during testing. |
| device_signal_rsrp | Device test_node instance job tenant | Reference Signal Received Power (RSRP): Measures average LTE reference signal strength. |
| device_signal_rsrq | Device test_node instance job tenant | Reference Signal Received Quality (RSRQ): Indicates LTE signal quality using RSRP and RSSI. |
| device_signal_rssnr | Device test_node instance job tenant | Signal-to-Interference-plus-Noise Ratio (SINR): Reflects LTE signal clarity and resistance to interference. |
| device_signal_rssi | Device test_node instance job tenant | Received Signal Strength Indicator (RSSI): Represents total received signal power including noise. |
| data_speed | Device Test_Case instance job tenant | Aggregates Download/Upload Speed, Jitter, Latency, Packet Loss, Round-Trip Time (RTT), Data Roaming Success Rate and Data Session Setup Time. |
| device_temperature | Device test_node instance job tenant | Monitors the temperature of devices during testing for stability analysis. |
| call_duration | Device Test_Case instance job tenant | Call Duration = Time (BYE) − Time (ACK) from device; calculated for every call (VoLTE and non-VoLTE) and exported to Prometheus. |
| call_setup_time | Device Test_Case instance job tenant | Call Setup Time = Time (180 Ringing) − Time (SIP INVITE); computed for every VoLTE call regardless of SIP logcat state and exported to Prometheus. |
| test_case_results_total | instance job tenant result testcase | Tracks the total number of results for each test case in the system. |
| testnode_command_results_total | command instance job tenant result testnode | Tracks the total results of commands executed on test nodes. |
| testnode_up_status | instance job tenant name | Tracks the status of test nodes to determine if they are up and running. |
| total_mobiledevices | instance job tenant | Tracks the total number of mobile devices involved in testing. |
| total_testcases | instance job tenant | Tracks the total number of test cases available in the system. |
| total_testexecutions | instance job tenant | Tracks the total number of test executions performed in the system. |
| total_testnodes | instance job tenant | Tracks the total number of test nodes available in the system. |
| total_testsuites | instance job tenant | Tracks the total number of test suites executed in the system. |
| ts_disabled_rate | Test_Suite instance job tenant | Calculates the disabled rate of test suites. |
| ts_success_rate | Test_Suite instance job tenant | Calculates the success rate of test suites. |
Meaning of Parameters¶
| Parameter | Description | Explanation |
|---|---|---|
| command | Counts the successful execution of commands on test nodes. | This parameter tracks the number of functions or operations successfully executed on a test node, such as SPEED_TEST or DATA_ROAMING commands. |
| Device | Refers to the physical or virtual device being tested. | This could include smartphones, IoT devices, or emulated devices used during testing. |
| instance | Specifies the instance or environment where tests are executed. | This refers to the environment (e.g., production, staging) in which the tests are being run, ensuring the results are contextual. |
| job | Defines a specific job in the test pipeline. | Jobs are tasks or steps (e.g., build, test) managed by CI/CD pipelines during testing. |
| name | Identifier for the test node. | Provides unique naming or labels for nodes being used during execution. |
| result | Outcome of a test case or command (pass/fail). | Tracks success, failure, or exceptions generated during testing. |
| tenant | Indicates the client or organization for whom testing is performed. | Represents multi-tenancy structures where tests align with a particular client’s environment or needs. |
| testcase | Specific test scenario or script. | Defines individual test scripts that validate particular features or use cases. |
| Test_Node | Physical or virtual node where tests are executed. | Nodes are computing resources (e.g., virtual machines, Docker containers) used to run tests. |
| Test_Suite | A group of test cases executed together. | Collections of related tests for a specific functionality or module to validate grouped execution. |
| Test_Case | Label of the executed test case. | Matches a single run in the test matrix; used by data_speed, call_duration and call_setup_time metrics. |
Key Features of Prometheus in the Testing Automation Engine¶
- Comprehensive Metrics:
- CPU utilization
- Memory allocation
- Disk usage
- Input/output (I/O) operations
-
System processes
-
Detailed Test Execution Insights: Prometheus delivers metrics that evaluate the performance and health of test execution processes.
-
Key Performance Indicators (KPIs): The engine includes built-in KPIs for various test execution environments, enabling granular monitoring.
Data Centralization with Remote Write¶
Prometheus’ remote write capability allows metrics from multiple nodes in the testing engine cluster to be transmitted to an external Prometheus server. This enables:
-
Centralized Data Collection: All metrics from distributed nodes are aggregated into a single Prometheus instance.
-
Custom Alerting: Users can define alerts based on custom KPI thresholds, ensuring timely notifications for anomalies.
KPI Export and Grafana Integration¶
Metrics collected by Prometheus can be exported using the KPI shipping capability to an external Prometheus instance. This enables:
-
Data Aggregation: Metrics from all test environment virtual machines (VMs) are consolidated.
-
Visualization with Grafana: - The external Prometheus instance acts as the primary data source for Grafana. - Grafana creates dashboards for real-time monitoring and performance analytics.
-
Robust Alerting: Alerts can be defined using user-configured thresholds or pre-exported KPIs, supporting efficient issue detection.
Benefits of the Prometheus-Grafana Setup¶
- Real-time visualization and monitoring for test execution processes.
- Improved system health and performance analytics.
- Customizable dashboards and alerts tailored to testing requirements.