System Monitoring

This guide covers monitoring WsprDaemon performance, health, and data quality to ensure optimal operation.

Real-Time Monitoring

System Status Dashboard

Primary Status Command:

# Comprehensive system overview
./wsprdaemon.sh -s

# Continuous monitoring (updates every 30 seconds)
watch -n 30 './wsprdaemon.sh -s'

Key Status Indicators:

  • Process health (running/stopped/failed)

  • Recent decode activity

  • Upload success rates

  • System resource usage

  • Error conditions

Process Monitoring

Active Process Overview:

# List all WsprDaemon processes
ps aux | grep wsprdaemon

# Process tree view
pstree -p $(pgrep -f wsprdaemon.sh)

# Resource usage by process
top -p $(pgrep -f wsprdaemon | tr '\n' ',' | sed 's/,$//')

Job Status Monitoring:

# Current job status
./wsprdaemon.sh -j s

# Detailed job information
./wsprdaemon.sh -j l

# Job restart history
./wsprdaemon.sh -j h

Performance Metrics

Decoding Performance

Decode Success Rates:

# Count successful decodes per band (last hour)
find /tmp/wsprdaemon/decoding.d -name "*.log" -mmin -60 \
  -exec grep -c "spots decoded" {} \; | awk '{sum+=$1} END {print "Total decodes:", sum}'

# Average decode time per band
grep "decode completed" /tmp/wsprdaemon/decoding.d/*/*/decoding.log | \
  awk -F'[: ]' '{band=$1; time=$NF; sum[band]+=time; count[band]++} 
  END {for(b in sum) printf "%s: %.2fs avg\n", b, sum[b]/count[b]}'

Signal Quality Metrics:

# SNR distribution analysis
grep "SNR:" /tmp/wsprdaemon/decoding.d/*/*/decoding.log | \
  awk '{print $NF}' | sort -n | \
  awk 'BEGIN{count=0; sum=0} {sum+=$1; count++; values[count]=$1} 
  END {print "Count:", count, "Mean:", sum/count, "Median:", values[int(count/2)]}'

Upload Performance

Upload Success Monitoring:

# Recent upload success rate
find /tmp/wsprdaemon/uploads.d -name "*.log" -mmin -60 \
  -exec grep -E "(SUCCESS|FAILED)" {} \; | \
  awk '/SUCCESS/{s++} /FAILED/{f++} END {print "Success:", s, "Failed:", f, "Rate:", s/(s+f)*100"%"}'

# Upload queue status
find /tmp/wsprdaemon/uploads.d -name "*.txt" | wc -l
echo "Pending uploads: $(find /tmp/wsprdaemon/uploads.d -name '*.txt' | wc -l)"

Network Performance:

# Monitor network connections to wsprnet.org
netstat -an | grep :14236

# Upload bandwidth usage
iftop -i eth0 -f "port 14236"

System Resource Monitoring

CPU and Memory Usage:

# WsprDaemon-specific resource usage
ps aux | grep wsprdaemon | \
  awk '{cpu+=$3; mem+=$4; vsz+=$5; rss+=$6} 
  END {printf "CPU: %.1f%% Memory: %.1f%% VSZ: %dMB RSS: %dMB\n", cpu, mem, vsz/1024, rss/1024}'

# System load average
uptime

# Memory pressure indicators
free -h
cat /proc/pressure/memory 2>/dev/null || echo "Memory pressure info not available"

Disk I/O and Storage:

# Disk usage monitoring
df -h /tmp/wsprdaemon ~/wsprdaemon

# I/O statistics for WsprDaemon directories
iostat -x 1 3

# Directory size tracking
du -sh /tmp/wsprdaemon/* ~/wsprdaemon/* 2>/dev/null | sort -hr

Log Analysis and Alerting

Automated Log Monitoring

Error Detection Script:

#!/bin/bash
# error_monitor.sh - Monitor for critical errors

LOG_FILES="/tmp/wsprdaemon/*.log"
ERROR_THRESHOLD=5
ALERT_EMAIL="admin@example.com"

# Count recent errors (last 10 minutes)
ERROR_COUNT=$(find /tmp/wsprdaemon -name "*.log" -mmin -10 \
  -exec grep -c "ERROR\|CRITICAL\|FATAL" {} \; | \
  awk '{sum+=$1} END {print sum}')

if [ ${ERROR_COUNT:-0} -gt $ERROR_THRESHOLD ]; then
    echo "High error rate detected: $ERROR_COUNT errors in last 10 minutes" | \
    mail -s "WsprDaemon Alert: High Error Rate" $ALERT_EMAIL
fi

Performance Degradation Detection:

#!/bin/bash
# performance_monitor.sh - Detect performance issues

# Check decode success rate (should be >80% during active periods)
DECODE_RATE=$(find /tmp/wsprdaemon/decoding.d -name "*.log" -mmin -30 \
  -exec grep -c "spots decoded\|no spots" {} \; | \
  awk 'NR%2==1{decoded=$1} NR%2==0{total=decoded+$1; if(total>0) rate=decoded/total*100; print rate}' | \
  awk '{sum+=$1; count++} END {if(count>0) print sum/count; else print 0}')

if (( $(echo "$DECODE_RATE < 50" | bc -l) )); then
    echo "Low decode rate: $DECODE_RATE%" | \
    mail -s "WsprDaemon Alert: Low Decode Rate" admin@example.com
fi

Log Rotation and Archival

Automated Log Management:

#!/bin/bash
# log_maintenance.sh - Automated log cleanup and archival

LOG_DIR="/tmp/wsprdaemon"
ARCHIVE_DIR="$HOME/wsprdaemon/archives"
RETENTION_DAYS=30

# Archive old logs
find $LOG_DIR -name "*.log" -mtime +1 -exec gzip {} \;
find $LOG_DIR -name "*.log.gz" -mtime +7 -exec mv {} $ARCHIVE_DIR/ \;

# Clean up very old archives
find $ARCHIVE_DIR -name "*.log.gz" -mtime +$RETENTION_DAYS -delete

# Rotate large current logs
find $LOG_DIR -name "*.log" -size +10M -exec logrotate -f {} \;

Health Monitoring Dashboard

Custom Monitoring Script

#!/bin/bash
# wsprdaemon_dashboard.sh - Comprehensive health dashboard

clear
echo "==============================================="
echo "        WsprDaemon Health Dashboard"
echo "        $(date)"
echo "==============================================="

# System Status
echo
echo "=== SYSTEM STATUS ==="
if pgrep -f wsprdaemon.sh > /dev/null; then
    echo "✓ WsprDaemon: RUNNING"
    PROCESS_COUNT=$(pgrep -f wsprdaemon | wc -l)
    echo "  Active processes: $PROCESS_COUNT"
else
    echo "✗ WsprDaemon: STOPPED"
fi

# Resource Usage
echo
echo "=== RESOURCE USAGE ==="
LOAD=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//')
echo "System load: $LOAD"

DISK_USAGE=$(df /tmp/wsprdaemon 2>/dev/null | tail -1 | awk '{print $5}' | sed 's/%//')
if [ -n "$DISK_USAGE" ]; then
    if [ $DISK_USAGE -lt 80 ]; then
        echo "✓ Disk usage: ${DISK_USAGE}%"
    else
        echo "⚠ Disk usage: ${DISK_USAGE}% (HIGH)"
    fi
fi

# Recent Activity
echo
echo "=== RECENT ACTIVITY (Last 30 minutes) ==="
RECENT_DECODES=$(find /tmp/wsprdaemon/decoding.d -name "*.log" -mmin -30 \
  -exec grep -c "spots decoded" {} \; 2>/dev/null | awk '{sum+=$1} END {print sum+0}')
echo "Successful decodes: $RECENT_DECODES"

RECENT_UPLOADS=$(find /tmp/wsprdaemon/uploads.d -name "*.log" -mmin -30 \
  -exec grep -c "SUCCESS" {} \; 2>/dev/null | awk '{sum+=$1} END {print sum+0}')
echo "Successful uploads: $RECENT_UPLOADS"

# Error Summary
echo
echo "=== ERROR SUMMARY (Last hour) ==="
ERROR_COUNT=$(find /tmp/wsprdaemon -name "*.log" -mmin -60 \
  -exec grep -c "ERROR" {} \; 2>/dev/null | awk '{sum+=$1} END {print sum+0}')
WARNING_COUNT=$(find /tmp/wsprdaemon -name "*.log" -mmin -60 \
  -exec grep -c "WARNING" {} \; 2>/dev/null | awk '{sum+=$1} END {print sum+0}')

if [ $ERROR_COUNT -eq 0 ]; then
    echo "✓ Errors: $ERROR_COUNT"
else
    echo "⚠ Errors: $ERROR_COUNT"
fi

if [ $WARNING_COUNT -lt 5 ]; then
    echo "✓ Warnings: $WARNING_COUNT"
else
    echo "⚠ Warnings: $WARNING_COUNT"
fi

echo
echo "==============================================="

External Monitoring Integration

Prometheus Metrics Export

Metrics Collection Script:

#!/bin/bash
# prometheus_exporter.sh - Export WsprDaemon metrics

METRICS_FILE="/tmp/wsprdaemon_metrics.prom"

cat > $METRICS_FILE << EOF
# HELP wsprdaemon_processes_total Number of active WsprDaemon processes
# TYPE wsprdaemon_processes_total gauge
wsprdaemon_processes_total $(pgrep -f wsprdaemon | wc -l)

# HELP wsprdaemon_decodes_total Total successful decodes in last hour
# TYPE wsprdaemon_decodes_total counter
wsprdaemon_decodes_total $(find /tmp/wsprdaemon/decoding.d -name "*.log" -mmin -60 -exec grep -c "spots decoded" {} \; | awk '{sum+=$1} END {print sum+0}')

# HELP wsprdaemon_uploads_total Total successful uploads in last hour
# TYPE wsprdaemon_uploads_total counter
wsprdaemon_uploads_total $(find /tmp/wsprdaemon/uploads.d -name "*.log" -mmin -60 -exec grep -c "SUCCESS" {} \; | awk '{sum+=$1} END {print sum+0}')

# HELP wsprdaemon_errors_total Total errors in last hour
# TYPE wsprdaemon_errors_total counter
wsprdaemon_errors_total $(find /tmp/wsprdaemon -name "*.log" -mmin -60 -exec grep -c "ERROR" {} \; | awk '{sum+=$1} END {print sum+0}')
EOF

# Serve metrics on port 9090
python3 -m http.server 9090 --directory $(dirname $METRICS_FILE) &

MQTT Status Publishing

MQTT Publisher Script:

#!/bin/bash
# mqtt_publisher.sh - Publish status to MQTT broker

MQTT_BROKER="192.168.1.50"
MQTT_TOPIC="wsprdaemon/status"

# Collect status data
STATUS_JSON=$(cat << EOF
{
  "timestamp": "$(date -Iseconds)",
  "running": $(pgrep -f wsprdaemon.sh > /dev/null && echo "true" || echo "false"),
  "processes": $(pgrep -f wsprdaemon | wc -l),
  "recent_decodes": $(find /tmp/wsprdaemon/decoding.d -name "*.log" -mmin -10 -exec grep -c "spots decoded" {} \; | awk '{sum+=$1} END {print sum+0}'),
  "recent_uploads": $(find /tmp/wsprdaemon/uploads.d -name "*.log" -mmin -10 -exec grep -c "SUCCESS" {} \; | awk '{sum+=$1} END {print sum+0}'),
  "disk_usage": $(df /tmp/wsprdaemon 2>/dev/null | tail -1 | awk '{print $5}' | sed 's/%//' || echo "0")
}
EOF
)

# Publish to MQTT
mosquitto_pub -h $MQTT_BROKER -t $MQTT_TOPIC -m "$STATUS_JSON"

Automated Monitoring Setup

Cron Job Configuration

# Add to crontab (crontab -e)

# Health check every 5 minutes
*/5 * * * * /home/wsprdaemon/scripts/health_check.sh

# Performance monitoring every 15 minutes
*/15 * * * * /home/wsprdaemon/scripts/performance_monitor.sh

# Log maintenance daily at 2 AM
0 2 * * * /home/wsprdaemon/scripts/log_maintenance.sh

# Status dashboard update every minute
* * * * * /home/wsprdaemon/scripts/wsprdaemon_dashboard.sh > /tmp/wsprdaemon_status.txt

# MQTT status publishing every 5 minutes
*/5 * * * * /home/wsprdaemon/scripts/mqtt_publisher.sh

Systemd Service for Monitoring

# /etc/systemd/system/wsprdaemon-monitor.service
[Unit]
Description=WsprDaemon Monitoring Service
After=wsprdaemon.service

[Service]
Type=simple
User=wsprdaemon
ExecStart=/home/wsprdaemon/scripts/continuous_monitor.sh
Restart=always
RestartSec=30

[Install]
WantedBy=multi-user.target

This comprehensive monitoring setup provides real-time visibility into WsprDaemon operation, automated alerting for issues, and integration with external monitoring systems.