System Monitoring
This guide covers monitoring WsprDaemon performance, health, and data quality to ensure optimal operation.
Real-Time Monitoring
System Status Dashboard
Primary Status Command:
# Comprehensive system overview
./wsprdaemon.sh -s
# Continuous monitoring (updates every 30 seconds)
watch -n 30 './wsprdaemon.sh -s'
Key Status Indicators:
Process health (running/stopped/failed)
Recent decode activity
Upload success rates
System resource usage
Error conditions
Process Monitoring
Active Process Overview:
# List all WsprDaemon processes
ps aux | grep wsprdaemon
# Process tree view
pstree -p $(pgrep -f wsprdaemon.sh)
# Resource usage by process
top -p $(pgrep -f wsprdaemon | tr '\n' ',' | sed 's/,$//')
Job Status Monitoring:
# Current job status
./wsprdaemon.sh -j s
# Detailed job information
./wsprdaemon.sh -j l
# Job restart history
./wsprdaemon.sh -j h
Performance Metrics
Decoding Performance
Decode Success Rates:
# Count successful decodes per band (last hour)
find /tmp/wsprdaemon/decoding.d -name "*.log" -mmin -60 \
-exec grep -c "spots decoded" {} \; | awk '{sum+=$1} END {print "Total decodes:", sum}'
# Average decode time per band
grep "decode completed" /tmp/wsprdaemon/decoding.d/*/*/decoding.log | \
awk -F'[: ]' '{band=$1; time=$NF; sum[band]+=time; count[band]++}
END {for(b in sum) printf "%s: %.2fs avg\n", b, sum[b]/count[b]}'
Signal Quality Metrics:
# SNR distribution analysis
grep "SNR:" /tmp/wsprdaemon/decoding.d/*/*/decoding.log | \
awk '{print $NF}' | sort -n | \
awk 'BEGIN{count=0; sum=0} {sum+=$1; count++; values[count]=$1}
END {print "Count:", count, "Mean:", sum/count, "Median:", values[int(count/2)]}'
Upload Performance
Upload Success Monitoring:
# Recent upload success rate
find /tmp/wsprdaemon/uploads.d -name "*.log" -mmin -60 \
-exec grep -E "(SUCCESS|FAILED)" {} \; | \
awk '/SUCCESS/{s++} /FAILED/{f++} END {print "Success:", s, "Failed:", f, "Rate:", s/(s+f)*100"%"}'
# Upload queue status
find /tmp/wsprdaemon/uploads.d -name "*.txt" | wc -l
echo "Pending uploads: $(find /tmp/wsprdaemon/uploads.d -name '*.txt' | wc -l)"
Network Performance:
# Monitor network connections to wsprnet.org
netstat -an | grep :14236
# Upload bandwidth usage
iftop -i eth0 -f "port 14236"
System Resource Monitoring
CPU and Memory Usage:
# WsprDaemon-specific resource usage
ps aux | grep wsprdaemon | \
awk '{cpu+=$3; mem+=$4; vsz+=$5; rss+=$6}
END {printf "CPU: %.1f%% Memory: %.1f%% VSZ: %dMB RSS: %dMB\n", cpu, mem, vsz/1024, rss/1024}'
# System load average
uptime
# Memory pressure indicators
free -h
cat /proc/pressure/memory 2>/dev/null || echo "Memory pressure info not available"
Disk I/O and Storage:
# Disk usage monitoring
df -h /tmp/wsprdaemon ~/wsprdaemon
# I/O statistics for WsprDaemon directories
iostat -x 1 3
# Directory size tracking
du -sh /tmp/wsprdaemon/* ~/wsprdaemon/* 2>/dev/null | sort -hr
Log Analysis and Alerting
Automated Log Monitoring
Error Detection Script:
#!/bin/bash
# error_monitor.sh - Monitor for critical errors
LOG_FILES="/tmp/wsprdaemon/*.log"
ERROR_THRESHOLD=5
ALERT_EMAIL="admin@example.com"
# Count recent errors (last 10 minutes)
ERROR_COUNT=$(find /tmp/wsprdaemon -name "*.log" -mmin -10 \
-exec grep -c "ERROR\|CRITICAL\|FATAL" {} \; | \
awk '{sum+=$1} END {print sum}')
if [ ${ERROR_COUNT:-0} -gt $ERROR_THRESHOLD ]; then
echo "High error rate detected: $ERROR_COUNT errors in last 10 minutes" | \
mail -s "WsprDaemon Alert: High Error Rate" $ALERT_EMAIL
fi
Performance Degradation Detection:
#!/bin/bash
# performance_monitor.sh - Detect performance issues
# Check decode success rate (should be >80% during active periods)
DECODE_RATE=$(find /tmp/wsprdaemon/decoding.d -name "*.log" -mmin -30 \
-exec grep -c "spots decoded\|no spots" {} \; | \
awk 'NR%2==1{decoded=$1} NR%2==0{total=decoded+$1; if(total>0) rate=decoded/total*100; print rate}' | \
awk '{sum+=$1; count++} END {if(count>0) print sum/count; else print 0}')
if (( $(echo "$DECODE_RATE < 50" | bc -l) )); then
echo "Low decode rate: $DECODE_RATE%" | \
mail -s "WsprDaemon Alert: Low Decode Rate" admin@example.com
fi
Log Rotation and Archival
Automated Log Management:
#!/bin/bash
# log_maintenance.sh - Automated log cleanup and archival
LOG_DIR="/tmp/wsprdaemon"
ARCHIVE_DIR="$HOME/wsprdaemon/archives"
RETENTION_DAYS=30
# Archive old logs
find $LOG_DIR -name "*.log" -mtime +1 -exec gzip {} \;
find $LOG_DIR -name "*.log.gz" -mtime +7 -exec mv {} $ARCHIVE_DIR/ \;
# Clean up very old archives
find $ARCHIVE_DIR -name "*.log.gz" -mtime +$RETENTION_DAYS -delete
# Rotate large current logs
find $LOG_DIR -name "*.log" -size +10M -exec logrotate -f {} \;
Health Monitoring Dashboard
Custom Monitoring Script
#!/bin/bash
# wsprdaemon_dashboard.sh - Comprehensive health dashboard
clear
echo "==============================================="
echo " WsprDaemon Health Dashboard"
echo " $(date)"
echo "==============================================="
# System Status
echo
echo "=== SYSTEM STATUS ==="
if pgrep -f wsprdaemon.sh > /dev/null; then
echo "✓ WsprDaemon: RUNNING"
PROCESS_COUNT=$(pgrep -f wsprdaemon | wc -l)
echo " Active processes: $PROCESS_COUNT"
else
echo "✗ WsprDaemon: STOPPED"
fi
# Resource Usage
echo
echo "=== RESOURCE USAGE ==="
LOAD=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//')
echo "System load: $LOAD"
DISK_USAGE=$(df /tmp/wsprdaemon 2>/dev/null | tail -1 | awk '{print $5}' | sed 's/%//')
if [ -n "$DISK_USAGE" ]; then
if [ $DISK_USAGE -lt 80 ]; then
echo "✓ Disk usage: ${DISK_USAGE}%"
else
echo "⚠ Disk usage: ${DISK_USAGE}% (HIGH)"
fi
fi
# Recent Activity
echo
echo "=== RECENT ACTIVITY (Last 30 minutes) ==="
RECENT_DECODES=$(find /tmp/wsprdaemon/decoding.d -name "*.log" -mmin -30 \
-exec grep -c "spots decoded" {} \; 2>/dev/null | awk '{sum+=$1} END {print sum+0}')
echo "Successful decodes: $RECENT_DECODES"
RECENT_UPLOADS=$(find /tmp/wsprdaemon/uploads.d -name "*.log" -mmin -30 \
-exec grep -c "SUCCESS" {} \; 2>/dev/null | awk '{sum+=$1} END {print sum+0}')
echo "Successful uploads: $RECENT_UPLOADS"
# Error Summary
echo
echo "=== ERROR SUMMARY (Last hour) ==="
ERROR_COUNT=$(find /tmp/wsprdaemon -name "*.log" -mmin -60 \
-exec grep -c "ERROR" {} \; 2>/dev/null | awk '{sum+=$1} END {print sum+0}')
WARNING_COUNT=$(find /tmp/wsprdaemon -name "*.log" -mmin -60 \
-exec grep -c "WARNING" {} \; 2>/dev/null | awk '{sum+=$1} END {print sum+0}')
if [ $ERROR_COUNT -eq 0 ]; then
echo "✓ Errors: $ERROR_COUNT"
else
echo "⚠ Errors: $ERROR_COUNT"
fi
if [ $WARNING_COUNT -lt 5 ]; then
echo "✓ Warnings: $WARNING_COUNT"
else
echo "⚠ Warnings: $WARNING_COUNT"
fi
echo
echo "==============================================="
External Monitoring Integration
Prometheus Metrics Export
Metrics Collection Script:
#!/bin/bash
# prometheus_exporter.sh - Export WsprDaemon metrics
METRICS_FILE="/tmp/wsprdaemon_metrics.prom"
cat > $METRICS_FILE << EOF
# HELP wsprdaemon_processes_total Number of active WsprDaemon processes
# TYPE wsprdaemon_processes_total gauge
wsprdaemon_processes_total $(pgrep -f wsprdaemon | wc -l)
# HELP wsprdaemon_decodes_total Total successful decodes in last hour
# TYPE wsprdaemon_decodes_total counter
wsprdaemon_decodes_total $(find /tmp/wsprdaemon/decoding.d -name "*.log" -mmin -60 -exec grep -c "spots decoded" {} \; | awk '{sum+=$1} END {print sum+0}')
# HELP wsprdaemon_uploads_total Total successful uploads in last hour
# TYPE wsprdaemon_uploads_total counter
wsprdaemon_uploads_total $(find /tmp/wsprdaemon/uploads.d -name "*.log" -mmin -60 -exec grep -c "SUCCESS" {} \; | awk '{sum+=$1} END {print sum+0}')
# HELP wsprdaemon_errors_total Total errors in last hour
# TYPE wsprdaemon_errors_total counter
wsprdaemon_errors_total $(find /tmp/wsprdaemon -name "*.log" -mmin -60 -exec grep -c "ERROR" {} \; | awk '{sum+=$1} END {print sum+0}')
EOF
# Serve metrics on port 9090
python3 -m http.server 9090 --directory $(dirname $METRICS_FILE) &
MQTT Status Publishing
MQTT Publisher Script:
#!/bin/bash
# mqtt_publisher.sh - Publish status to MQTT broker
MQTT_BROKER="192.168.1.50"
MQTT_TOPIC="wsprdaemon/status"
# Collect status data
STATUS_JSON=$(cat << EOF
{
"timestamp": "$(date -Iseconds)",
"running": $(pgrep -f wsprdaemon.sh > /dev/null && echo "true" || echo "false"),
"processes": $(pgrep -f wsprdaemon | wc -l),
"recent_decodes": $(find /tmp/wsprdaemon/decoding.d -name "*.log" -mmin -10 -exec grep -c "spots decoded" {} \; | awk '{sum+=$1} END {print sum+0}'),
"recent_uploads": $(find /tmp/wsprdaemon/uploads.d -name "*.log" -mmin -10 -exec grep -c "SUCCESS" {} \; | awk '{sum+=$1} END {print sum+0}'),
"disk_usage": $(df /tmp/wsprdaemon 2>/dev/null | tail -1 | awk '{print $5}' | sed 's/%//' || echo "0")
}
EOF
)
# Publish to MQTT
mosquitto_pub -h $MQTT_BROKER -t $MQTT_TOPIC -m "$STATUS_JSON"
Automated Monitoring Setup
Cron Job Configuration
# Add to crontab (crontab -e)
# Health check every 5 minutes
*/5 * * * * /home/wsprdaemon/scripts/health_check.sh
# Performance monitoring every 15 minutes
*/15 * * * * /home/wsprdaemon/scripts/performance_monitor.sh
# Log maintenance daily at 2 AM
0 2 * * * /home/wsprdaemon/scripts/log_maintenance.sh
# Status dashboard update every minute
* * * * * /home/wsprdaemon/scripts/wsprdaemon_dashboard.sh > /tmp/wsprdaemon_status.txt
# MQTT status publishing every 5 minutes
*/5 * * * * /home/wsprdaemon/scripts/mqtt_publisher.sh
Systemd Service for Monitoring
# /etc/systemd/system/wsprdaemon-monitor.service
[Unit]
Description=WsprDaemon Monitoring Service
After=wsprdaemon.service
[Service]
Type=simple
User=wsprdaemon
ExecStart=/home/wsprdaemon/scripts/continuous_monitor.sh
Restart=always
RestartSec=30
[Install]
WantedBy=multi-user.target
This comprehensive monitoring setup provides real-time visibility into WsprDaemon operation, automated alerting for issues, and integration with external monitoring systems.