Documentation

Everything you need to know about the DevOps monitoring system

Overview

The DevOps dashboard monitors your VPS servers using a lightweight bash agent installed on each machine. The agent collects system metrics and pushes them to the UCA API every 30 seconds, which stores them in PostgreSQL and surfaces them in the dashboard.

Architecture

Agent on VPS → pushes metrics every 30s → UCA API → PostgreSQL → Dashboard

Getting Started
  1. 1

    Add a server

    Go to the Servers page and add a new server. Provide a name, IP address, and optional tags to organise your fleet.

  2. 2

    Copy the install command

    After the server is created, a one-line curl command is shown on the server detail page. Copy it.

  3. 3

    Run on your VPS

    SSH into your VPS and paste the curl command. The installer will set up the agent as a systemd service.

  4. 4

    Wait 30 seconds

    The server will appear as Online in the dashboard within 30 seconds of the agent starting.

Agent

The agent is a small bash script that runs as a systemd service. It is read-only — it never performs write operations on your VPS.

What it collects every 30 seconds:

  • CPU usage
  • RAM usage
  • Disk usage
  • Network I/O
  • Docker containers (name, status, image)
  • Listening ports
  • Swap usage
  • Disk I/O (read/write bytes)
  • Top processes by CPU
  • Failed systemd services
  • SSL certificate expiry dates
  • Open connection count
  • Docker container restart counts

Systemd service

The agent is installed as horizon-agent.service. It auto-restarts on failure and survives reboots.

Managing the Agent

Check status

systemctl status horizon-agent

Stop and disable

systemctl stop horizon-agent && systemctl disable horizon-agent

Remove completely

rm /etc/systemd/system/horizon-agent.service
rm -rf /opt/horizon-agent
systemctl daemon-reload

Reinstall

Go to the server detail page and regenerate the install token. Then run the new curl command on the VPS.

System Tab

The System tab on each server detail page shows extended metrics beyond the standard CPU/RAM/Disk charts. These are collected by the agent and displayed in real time.

  • Swap Usage — current swap used vs total, with a progress bar
  • Disk I/O — cumulative read/write bytes since boot
  • Top Processes — the top processes sorted by CPU usage, showing PID, name, CPU%, and MEM%
  • Failed Services — any systemd services in a failed state, highlighted in red
  • SSL Certificates — detected SSL certs with domain, expiry date, and days remaining (color-coded: red < 14d, amber < 30d, green otherwise)
  • Open Connections — total number of established network connections
  • Docker Restarts — containers with non-zero restart counts, useful for detecting crash loops
Updating the Agent

When a server is online, the server detail page shows an Update Agent button instead of the install command. Click it to copy the update command to your clipboard, then paste it on your VPS.

What the update does

  1. Stops and removes the current agent and systemd service
  2. Generates a temporary install token (valid for 5 minutes)
  3. Downloads and installs the latest agent script via the install endpoint
  4. Starts the new agent service automatically

The API key is preserved during the update. The server will briefly appear as offline during the process and come back online within 30 seconds.

Overview Cards

The server list page shows summary cards for each server. Each card includes:

  • Uptime percentage — calculated from the ratio of online snapshots over the selected period, displayed as a percentage badge
  • CPU sparkline — a mini line chart showing recent CPU usage history at a glance, rendered inline on the card
  • Status badge — online (green), offline (red), or unknown (gray)
  • Key metrics — CPU, memory, and disk usage percentages
CSV Export

Export metric data from the server detail page using the CSV export feature. Select a time period and download a CSV file containing all metric snapshots for that range. The export includes:

  • Timestamp, CPU%, memory used/total, disk usage per mount
  • Network in/out bytes, load averages
  • Container status (if Docker is running)

CSV files can be imported into spreadsheets or monitoring tools for offline analysis and reporting.

Alert Rules

Create alert rules on the Alerts page to be notified when a metric crosses a threshold.

  • Metrics: CPU, Memory, or Disk
  • Scope: Global (all servers) or per-server
  • Operator: Greater than or less than
  • Duration: How many seconds the condition must hold before the alert fires

Alerts auto-resolve when the metric returns to normal. You will receive a resolution notification in addition to the firing notification.

Matrix Notifications

Connect a Matrix bot to receive alert notifications in a Matrix room. Configure the integration in the DevOps Settings page.

  • Homeserver URL (e.g. https://matrix.org)
  • Bot access token
  • Room ID

Both alert fires and resolutions are sent to the configured room. Use the Test button in Settings to verify the connection before relying on it.

Batch Import

Import multiple servers at once from the Servers page using a CSV or JSON file.

CSV format

name,ip,"tag1,tag2"
web-01,192.168.1.10,"prod,nginx"
db-01,192.168.1.20,"prod,postgres"

JSON format

[
  { "name": "web-01", "ip": "192.168.1.10", "tags": ["prod", "nginx"] },
  { "name": "db-01",  "ip": "192.168.1.20", "tags": ["prod", "postgres"] }
]
Data Retention

Metric data is retained according to the retention period configured in Settings (default: 30 days). Data is stored at three levels of granularity to balance storage and resolution.

0 – 24 hoursFull granularity (every 30 seconds)
1 – 7 days5-minute averages
8 – 30 daysHourly averages
Background Jobs

Two background cron jobs keep the system accurate and the database clean.

  • Offline detection — runs every 60 seconds. Marks a server as offline if no metrics have been received in the past 60 seconds.
  • Data aggregation — runs daily. Rolls up raw metrics into 5-minute and hourly averages, then deletes data older than the configured retention period.