The Problem
Maintaining visibility into resource usage across a network of nodes is critical for cluster health. The challenge was to create a lightweight agent that could track metrics without overhead on the host VMs.
The Approach
Developed a suite of Bash scripts that leverage Linux utilities to capture CPU, memory, and disk statistics. These metrics are formatted and loaded into a Dockerized PostgreSQL environment. Used crontab to automate the monitoring pulse and built SQL analytical queries to detect performance anomalies.
Technical Stack
Challenges & Constraints
Required a small memory footprint on host CentOS 7 machines. Implemented error detection queries to ensure the monitoring agent remained active and reporting to the central database without gaps.
Outcome & Learnings
Successfully deployed on Google Cloud Platform (GCP), providing an automated, auditable record of resource usage that reduced manual cluster health checks by 80%.