PURR数据自愈系统BUG1

SHI XIAOLONG

15 Feb 2026 — 6 min read

数据自愈系统 BUG 分析报告（第七轮）

系统调用链总览

graph TD
    A["TradingOrchestrator.start()"] --> B["DataHealingOrchestrator.__init__()"]
    B --> C["heal_and_prepare(required_count)"]
    C --> D["Phase 1: _load_zscore_history()"]
    D --> E["Phase 2: _diagnose()"]
    E --> F["Phase 3: RepairExecutor.repair()"]
    F --> G["_find_kline_gaps()"]
    G --> H["KlineRepository.query_range()"]
    G --> I["KlineDataFiller.validate_continuity()"]
    F --> J["_fill_kline_gaps()"]
    J --> K["KlineDataFiller.fill_missing_data_precise()"]
    K --> L["_fetch_ohlcv_native() / Hyperliquid SDK"]
    F --> M["_repair_from_klines()"]
    M --> H
    M --> N["_extract_kline_window()"]
    M --> O["_build_analysis_record()"]
    C --> P["Phase 4: _final_assessment()"]

    style G fill:#ff6b6b,color:#fff
    style O fill:#ff6b6b,color:#fff
    style L fill:#ffa94d,color:#fff
    style D fill:#ffa94d,color:#fff

BUG #7-1：`_find_kline_gaps` 的 `end_time` 差一错误 — 最后一个 missing_time 的 K 线永远查不到

严重度：🔴 严重

因果链

输入: missing_times = [T1, T2, T3]   (3 个 zscore 缺口时间点)
                                       │
状态变化: end_time = max(missing_times) = T3
                                       │
调用路径: _find_kline_gaps()
  → kline_repo.query_range(start_time, end_time=T3)
                                       │
出错点: query_range SQL: "time >= start_time AND time < end_time"
        ← 半开区间 [start, end)，T3 时刻的 K 线被排除
                                       │
根因: _find_kline_gaps 没有像 _repair_from_klines 那样进行 +1 interval 补偿

详细分析

[repair_executor.py:206](file:///Users/test/Downloads/Trading-in-websocket/src/utils/data_healing/repair_executor.py#L206) 中：

# _find_kline_gaps
end_time = max(missing_times)  # ❌ 不含 max 时刻

而在同一文件 [repair_executor.py:115](file:///Users/test/Downloads/Trading-in-websocket/src/utils/data_healing/repair_executor.py#L115) 中，_repair_from_klines 已经修复过此问题：

# _repair_from_klines — 已修复
end_time = max(missing_times) + timedelta(minutes=interval_min)  # ✅

但 _find_kline_gaps 遗漏了同样的修复。

[query_range](file:///Users/test/Downloads/Trading-in-websocket/src/utils/database/timescaledb.py#L443-L483) 的 SQL 使用 time < %s（半开区间），因此 end_time = T3 意味着 T3 时刻的 K 线永远不会被查到。

后果

T3 时刻的 K 线即使存在，也会被判定为"缺口"
触发不必要的 API 调用去补充已经存在的数据
如果 missing_times 只有一个元素 [T]：start_time = T - lookback，end_time = T，那么目标时间点本身的 K 线就被排除，导致 validate_continuity() 在返回的数据中一定会在目标点附近发现一个"缺口"

BUG #7-2：`_build_analysis_record` 无条件写入 `zscore_4h` — 非 4h 修复时覆盖错误列

严重度：🟡 中等

因果链

输入: timeframe = '4h'（当前唯一使用场景，但方法签名接受任意 timeframe）
                                       │
状态变化: _build_analysis_record 构建写入记录
                                       │
调用路径: repair_executor.py:320
    zscore_4h=zscore,   # ← 无论 timeframe 是什么，zscore 总是写到 zscore_4h
                                       │
出错点: 如果 timeframe='5m'，zscore 应该只写到 zscore_5m；
        zscore_4h 应为 None（未计算）
                                       │
根因: zscore_5m/zscore_1h 按 timeframe 条件写，但 zscore_4h 无条件写

详细分析

[repair_executor.py:313-329](file:///Users/test/Downloads/Trading-in-websocket/src/utils/data_healing/repair_executor.py#L313-L329)：

def _build_analysis_record(...) -> Dict:
    return {
        ...
        'zscore_5m': zscore if timeframe == '5m' else None,   # ✅ 条件写入
        'zscore_1h': zscore if timeframe == '1h' else None,   # ✅ 条件写入
        'zscore_4h': zscore,                                   # ❌ 无条件写入！
        ...
    }

zscore_4h 应该是 zscore if timeframe == '4h' else None，与 5m/1h 保持一致。

后果

当前无症状：因为目前自愈只用 4h timeframe
潜在风险：如果未来扩展到多 timeframe 修复，会用 5m/1h 的 zscore 值覆盖 4h 列，导致分析结果错误

BUG #7-3：`KlineDataFiller.get_stats()` 引用 `self.exchange` — 使用原生 SDK 时 `AttributeError`

严重度：🟠 高

因果链

输入: exchange_id = 'hyperliquid'
                                       │
状态变化: __init__ 走 native SDK 分支
    self.use_native_sdk = True
    self.info = Info(...)
    # self.exchange 从未被赋值！
                                       │
调用路径: 任何时候调用 get_stats()
                                       │
出错点: kline_data_filler.py:858
    'exchange': self.exchange.id if self.exchange else None
                          ↑
              AttributeError: 'KlineDataFiller' has no attribute 'exchange'
                                       │
根因: native SDK 分支未初始化 self.exchange，get_stats 方法未适配

详细分析

[kline_data_filler.py:82-110](file:///Users/test/Downloads/Trading-in-websocket/src/utils/analysis/kline_data_filler.py#L82-L110) 的 __init__ 中：

if exchange_id == 'hyperliquid':
    self.info = Info(...)
    self.use_native_sdk = True           # 不设置 self.exchange
else:
    self.exchange = self._init_exchange(exchange_id)
    self.use_native_sdk = False

[kline_data_filler.py:847-859](file:///Users/test/Downloads/Trading-in-websocket/src/utils/analysis/kline_data_filler.py#L847-L859)：

def get_stats(self) -> Dict:
    return {
        ...
        'exchange': self.exchange.id if self.exchange else None,  # ❌ AttributeError
    }

Python 的 self.exchange 在 if self.exchange 之前就会触发 AttributeError——Python 不会延迟求值属性访问，而是在访问 self.exchange 的瞬间就抛异常。

后果

任何调用 get_stats() 的监控/诊断代码在 Hyperliquid 模式下直接崩溃
虽然 get_stats() 目前不在数据自愈主路径中被调用，但属于定时炸弹级 BUG

BUG #7-4：`_load_zscore_history` 中 `DISTINCT ON` + `DESC` 排序 → 截断逻辑取到的是最新的 N 条

严重度：🟢 低（仅影响日志准确性）

因果链

输入: required_count = 144, 数据库有 200 条记录
                                       │
状态变化: SQL 返回 rows (按 kline_time DESC 排列)
    rows = [T200, T199, T198, ..., T1]  (最新在前)
                                       │
调用路径: orchestrator.py:479
    records = list(reversed(rows[:required_count]))
    ↓
    rows[:144] = [T200, T199, ..., T57]  ← 取最新 144 条
    reversed   = [T57, T58, ..., T200]  ← 反转为升序
                                       │
分析: 这里的行为其实是**正确的** — 取最新 144 条，
      然后把完整有序序列交给 diagnose 检查连续性
                                       │
但问题在 else 分支 (line 486):
    elif rows:
        records = rows   ← 仍然是 DESC 排序！
                                       │
    然后到 line 495:
        final_records = sorted(records, key=lambda r: r['kline_time'])
                                       │
分析: 最终通过 sorted() 修正了顺序，所以功能正确。
      但 else 分支未截断，如果 rows 有 100 条（< 144），
      100 条全部保留 → 后续 diagnose 正确处理数量不足场景。

结论

严格来说不是 BUG，逻辑是正确的。sorted() 在最后统一修正了排序。但代码可读性差——读者容易误以为 rows[:required_count] 在 DESC 排序下截断的是"最旧的 N 条"。

BUG #7-5：`_find_kline_gaps` 与 `_repair_from_klines` 使用不一致的 `end_time` — 缺口检测与修复范围不对齐

严重度：🔴 严重（与 BUG #7-1 联动）

因果链

输入: missing_times = [T1, T2, T3], interval = 4h
                                       │
_find_kline_gaps:
    start_time = T1 - 130*4h
    end_time   = T3                    ← ❌ 不含 T3
    → query_range: [start_time, T3)    ← 缺少 T3 时刻 K 线
    → validate_continuity              ← 可能误报 T3 附近"缺口"
    → 多余的 kline_gaps 包含 T3
                                       │
_fill_kline_gaps:
    → fill_missing_data_precise(kline_gaps 含 T3)
    → API 请求拉取 T3 附近数据         ← ✅ 补充了实际可能已存在的数据
    → batch_upsert_copy(on_conflict='ignore')
                                       │
_repair_from_klines:
    start_time = T1 - 130*4h
    end_time   = T3 + 4h              ← ✅ 含 T3
    → query_range: [start_time, T3+4h) ← 包含 T3 时刻 K 线
                                       │
根因: 两个方法对 end_time 的处理不一致（_repair_from_klines 做了 +1 interval，
      _find_kline_gaps 没做），导致：
      1. 缺口检测阶段虚报缺口
      2. 触发不必要的 API 调用
      3. 浪费 10 分钟冷却时间窗口

后果组合效应

最严重的场景是冷却期浪费：

_find_kline_gaps 误判 T3 附近有缺口
_fill_kline_gaps 为两个 symbol 各触发一次 fill_missing_data_precise
每个 (symbol, timeframe) 组合进入 10 分钟冷却
如果第一轮修复后仍有真正的缺口（例如 T1 和 T2 的 K 线确实缺失），第二轮迭代中 _fill_kline_gaps 会因冷却期跳过，无法补充真正缺失的数据
修复循环无进展，最终以 repaired_count == 0 提前终止

修复优先级建议

BUG	严重度	修复复杂度	建议
#7-1 + #7-5	🔴 严重	一行代码	立即修复 — `end_time = max(missing_times) + timedelta(minutes=interval_min)`
#7-3	🟠 高	三行代码	立即修复 — `__init__` 中加 `self.exchange = None` 或 `get_stats` 用 `getattr`
#7-2	🟡 中等	一行代码	建议修复 — `zscore_4h=zscore if timeframe == '4h' else None`
#7-4	🟢 低	不修复	逻辑正确，仅可读性问题

PURR数据自愈系统BUG1

SHI XIAOLONG

数据自愈系统 BUG 分析报告（第七轮）

系统调用链总览

BUG #7-1：`_find_kline_gaps` 的 `end_time` 差一错误 — 最后一个 missing_time 的 K 线永远查不到

严重度：🔴 严重

因果链

详细分析

后果

BUG #7-2：`_build_analysis_record` 无条件写入 `zscore_4h` — 非 4h 修复时覆盖错误列

严重度：🟡 中等

因果链

详细分析

后果

BUG #7-3：`KlineDataFiller.get_stats()` 引用 `self.exchange` — 使用原生 SDK 时 `AttributeError`

严重度：🟠 高

因果链

详细分析

后果

BUG #7-4：`_load_zscore_history` 中 `DISTINCT ON` + `DESC` 排序 → 截断逻辑取到的是最新的 N 条

严重度：🟢 低（仅影响日志准确性）

因果链

结论

BUG #7-5：`_find_kline_gaps` 与 `_repair_from_klines` 使用不一致的 `end_time` — 缺口检测与修复范围不对齐

严重度：🔴 严重（与 BUG #7-1 联动）

因果链

后果组合效应

修复优先级建议

Read more

跑步的技巧（滚动落地）

AMI的优越性

什么是：“世界模型（World Models）”

K线周期可配置化设计方案

数据自愈系统 BUG 分析报告（第七轮）

系统调用链总览

BUG #7-1：_find_kline_gaps 的 end_time 差一错误 — 最后一个 missing_time 的 K 线永远查不到

严重度：🔴 严重

因果链

详细分析

后果

BUG #7-2：_build_analysis_record 无条件写入 zscore_4h — 非 4h 修复时覆盖错误列

严重度：🟡 中等

因果链

详细分析

后果

BUG #7-3：KlineDataFiller.get_stats() 引用 self.exchange — 使用原生 SDK 时 AttributeError

严重度：🟠 高

因果链

详细分析

后果

BUG #7-4：_load_zscore_history 中 DISTINCT ON + DESC 排序 → 截断逻辑取到的是最新的 N 条

严重度：🟢 低（仅影响日志准确性）

因果链

结论

BUG #7-5：_find_kline_gaps 与 _repair_from_klines 使用不一致的 end_time — 缺口检测与修复范围不对齐

严重度：🔴 严重（与 BUG #7-1 联动）

因果链

后果组合效应

修复优先级建议

Read more

跑步的技巧（滚动落地）

AMI的优越性

什么是：“世界模型（World Models）”

K线周期可配置化设计方案

BUG #7-1：`_find_kline_gaps` 的 `end_time` 差一错误 — 最后一个 missing_time 的 K 线永远查不到

BUG #7-2：`_build_analysis_record` 无条件写入 `zscore_4h` — 非 4h 修复时覆盖错误列

BUG #7-3：`KlineDataFiller.get_stats()` 引用 `self.exchange` — 使用原生 SDK 时 `AttributeError`

BUG #7-4：`_load_zscore_history` 中 `DISTINCT ON` + `DESC` 排序 → 截断逻辑取到的是最新的 N 条

BUG #7-5：`_find_kline_gaps` 与 `_repair_from_klines` 使用不一致的 `end_time` — 缺口检测与修复范围不对齐