Nature's Digital Garden

❯

❯

HoYoverse Project

❯

Insight Improvement Directions

Insight -- Improvement Directions

Nov 30, 20251 min read

Modular Probe Engine (Plug-in Architecture)

Problem (v1):
Ping tests are hard-coded into the service (Layer-7 Application)

Improvement:
Introduce a probe interface, then implement:

HTTPProbe
gRPCProbe
TCPConnectProbe
DatabaseProbe (ClickHouse / Redis ping), etc.

Benefit:

Easily extendable
Other teams can contribute probes
Keeps Insight maintainable long-term

Baseline Normalization & Noise Reduction

Problem (v1):

Raw network metrics are noisy (daily variation, cross-region jitter).

Improvement:

Add:

Baseline calculation per hour-of-day
Deviation-based alerts instead of static thresholds
Jitter smoothing (EWMA)
Automatic outlier filtering

Benefit:

Far fewer false alerts
Better signal quality
Helps identify true performance regressions

Probe Scheduling Improvements

Problem (v1):

Probes run at fixed intervals; not load aware.

Improvement:

Adaptive frequency (increase probing when degradation detected)
Randomized intervals to avoid thundering herd
Sliding window metrics

Benefit:
Lower overhead + more accurate detection.

Historical Trend Analysis

Problem (v1):

Only real-time Prometheus metrics, no long-term analytics.

Improvement:

Export metrics to Thanos for long-term retention
Add trend dashboards (weekly/monthly latency patterns)
Correlate network latency with deployment events

Benefit:

Understand long-term baseline
Detect slow regressions
Improves incident root-cause analysis

Graph View

Modular Probe Engine (Plug-in Architecture)
Baseline Normalization & Noise Reduction
Probe Scheduling Improvements
Historical Trend Analysis

© 2025 Nature's Digital Garden

GitHub
Linkedin