CCXT替换为Hyperliquid官方SDK存在的BUG4

SHI XIAOLONG

15 Feb 2026 — 8 min read

Hyperliquid SDK K 线请求 BUG — 第七轮因果链分析（现存 BUG 审计）

分析日期: 2026-02-15
背景: 第五轮和第六轮修复后，核心路径（kline_data_filler.py、hyperliquid_candles.py）的 3 个 BUG 已修复。
本轮对代码库进行全面审计，找出当前仍然存在的严重 BUG。

第五/六轮修复状态确认

BUG	文件	状态	验证
BUG5 #1: `limit→endTime` 语义错误	`kline_data_filler.py`	✅ 已修复	`fill_missing_data_precise` 现在使用 `fetch_candles_range_with_retry(since_ms, until_ms)`
BUG5 #2: 全局锁持有 HTTP 请求	`hyperliquid_candles.py`	✅ 已修复	`info.candles_snapshot()` 已移至锁外（第 113 行）
BUG5 #3: 分页 `+1ms` 推进	`hyperliquid_candles.py`	✅ 已修复	使用 `+ interval_ms` 并有 forward-progress 保护（第 193-196 行）
BUG6: 实时路径走错误分支	`realtime_kline_service_base.py`	✅ 已修复	与 BUG5 #1 同步修复，`fill_missing_data_precise` 已改用正确 API

[!NOTE]
核心生产路径的 BUG 已修复。以下为审计中发现的残余 BUG，存在于回补脚本和修复执行器中。

BUG #1（严重 🔴）：`backfill_all_data.py` 分页推进用 `+1ms` — 与已修复的 BUG5 #3 同类缺陷

因果链

阶段	详情
输入	`backfill_klines()` 回补 PURR 4h K 线，时间范围 2024-11-01 至今
状态变化	第一页返回 1500 条 K 线，`last_ts` = 最后一条的时间戳
调用路径	`backfill_klines()` → `fetch_ohlcv_page()` → `fetch_candles_with_retry(since_ms, limit=1500)`
出错点	`backfill_all_data.py:142`: `since_ms = last_ts + 1`
根因	`+1ms` 推进依赖于 API 不返回 `last_ts + 1` 时间戳的 K 线，而 `fetch_candles_with_retry` 使用 `since_ms` 作为 `start_time_ms` → 如果 API 再次返回 `last_ts` 的 K 线，则出现数据重复；如果下一根 K 线恰好在 `last_ts + 1ms` 则跳过

具体推演

# backfill_all_data.py:126-142
while since_ms < until_ms:
    ohlcv = fetch_ohlcv_page(info, symbol, tf, since_ms, KLINE_FILLER_API_LIMIT)
    if not ohlcv:
        break
    all_rows.extend(ohlcv)
    last_ts = ohlcv[-1][0]
    since_ms = last_ts + 1       # ← ❌ +1ms 推进

    if len(ohlcv) < KLINE_FILLER_API_LIMIT:
        break

问题 A: fetch_candles_with_retry 内部计算 end_ms = since_ms + limit * interval_ms，
这是正确的分页策略。但外层循环用 +1ms 推进 since_ms 而非 + interval_ms，
与 hyperliquid_candles.py:193 的已修复逻辑不一致。

问题 B: all_rows.extend(ohlcv) 没有去重 — 如果相邻两页有重叠（API 返回 [startTime, endTime] 双闭区间），
则同一根 K 线会被写入数据库两次。虽然 batch_upsert_copy(on_conflict='update') 会处理冲突，
但这浪费了 API 配额和写入带宽。

问题 C: 更隐蔽的是 fetch_candles_with_retry 本身仍使用 limit→endTime 语义（第 149 行）：

end_ms = since_ms + min(limit, api_limit) * interval_ms

虽然在 backfill 场景中 limit=1500 足够大，但这个函数本身的语义仍然有缺陷 —
它把 limit（条数上限）当作时间跨度来计算 endTime，而 Hyperliquid API 可能在该范围内
返回不到 limit 条数据（如果部分时间段无数据）。

触发场景

正常场景（K 线连续）：
  Page 1: since=T0, end=T0+1500*4h → 返回 [T0, T1, ..., T1499]
  since_ms = T1499 + 1ms                  ← +1ms，但 T1500 > T1499+1ms，无影响
  Page 2: since=T1499+1, end=T1499+1+1500*4h → 可能重复返回 T1499
  → 数据重复 ⚠️

异常场景（K 线有空洞）：
  Page 1: since=T0, end=T0+1500*4h → 返回 [T0, ..., T800]（T801~T899 无数据）
  len(ohlcv) = 801 < 1500 → break
  → 丢失 T900~T1499 的数据 ❌

修复方案

# 方案 A: 与 fetch_candles_range_with_retry 的修复一致
interval_ms = _interval_ms(tf)
since_ms = last_ts + interval_ms   # ✅ 跳过整个 interval

# 方案 B: 直接使用已修复的 fetch_candles_range_with_retry
all_rows = fetch_candles_range_with_retry(
    info, symbol, tf, since_ms, until_ms,
    api_limit=KLINE_FILLER_API_LIMIT,
)

BUG #2（中 🟠）：`repair_executor._build_analysis_record` 硬编码 `zscore_4h = zscore` — 错误写入 Z-score 字段

因果链

阶段	详情
输入	`RepairExecutor.repair(missing_times, symbol, base_symbol, timeframe='4h')`
状态变化	`_repair_from_klines()` 计算出 zscore 值
调用路径	`_repair_from_klines()` → `_compute_zscore()` → `_build_analysis_record(timeframe=timeframe, zscore=zscore)`
出错点	`repair_executor.py:320`: `'zscore_4h': zscore` — 无论 `timeframe` 是什么，都写入 `zscore_4h`
根因	`_build_analysis_record` 假设 `timeframe` 总是 `'4h'`，但方法签名默认 `timeframe='5m'`，且条件分支只处理了 `zscore_5m` 和 `zscore_1h` 的条件赋值

具体推演

# repair_executor.py:303-329
@staticmethod
def _build_analysis_record(
    missing_time, symbol, base_symbol,
    timeframe, zscore, corr,
) -> Dict:
    return {
        ...
        'zscore_5m': zscore if timeframe == '5m' else None,   # ← 按 timeframe 条件赋值
        'zscore_1h': zscore if timeframe == '1h' else None,   # ← 按 timeframe 条件赋值
        'zscore_4h': zscore,                                   # ← ❌ 无条件赋值！
        'corr_5m_7d': corr if timeframe == '5m' else None,
        'corr_1h_30d': corr if timeframe == '1h' else None,
        'corr_4h_60d': corr if timeframe == '4h' else None,   # ← corr_4h 有条件判断
        ...
    }

不一致性：

zscore_5m / zscore_1h / corr_* — 根据 timeframe 条件赋值 ✅
zscore_4h — 始终赋值为 zscore，即使 timeframe='5m' ❌

当 timeframe='5m' 时：

{
    'zscore_5m': zscore,  # ✅ 正确
    'zscore_1h': None,    # ✅ 正确
    'zscore_4h': zscore,  # ❌ 错误！5m 的 zscore 被写入 zscore_4h 字段
    'corr_4h_60d': None,  # ✅ 正确（有条件判断）
}

当前影响评估

实际调用方 RepairExecutor.repair() 的默认参数 timeframe='4h'，
且数据自愈系统 DataHealingOrchestrator 也以 repair_timeframe='4h' 调用。
因此当前生产环境中此 BUG 未被触发。

但这是一个潜伏 BUG — 如果未来有人修改修复逻辑支持其他周期，
或新增一个 5m/1h 粒度的修复入口，此 BUG 会导致 z-score 数据静默错误。

修复方案

'zscore_4h': zscore if timeframe == '4h' else None,  # ✅ 与其他字段保持一致

BUG #3（低 🟡）：`fetch_candles_with_retry` 函数遗留 — `limit→endTime` 语义仍然错误

因果链

阶段	详情
输入	`backfill_all_data.py` 调用 `fetch_candles_with_retry(info, symbol, tf, since_ms, limit=1500)`
状态变化	函数内部计算 `end_ms = since_ms + min(1500, 1500) * interval_ms`
调用路径	`fetch_candles_with_retry()` → `fetch_candles(info, symbol, interval, since_ms, end_ms)`
出错点	`hyperliquid_candles.py:149`: `end_ms = since_ms + min(limit, api_limit) * interval_ms`
根因	这个翻译在 `limit` = 实际期望条数且 K 线连续时（如 backfill 场景）功能正确，但语义仍然混淆

分析

核心路径（kline_data_filler.py）已经完全使用 fetch_candles_range_with_retry，
不再调用 fetch_candles_with_retry。唯一的调用方是 backfill_all_data.py。

两个版本的对比：

fetch_candles_with_retry(since_ms, limit):
  → end_ms = since_ms + limit * interval_ms     ← limit 语义为"时间跨度/interval"
  → 返回 [since_ms, end_ms] 的所有 K 线
  → 问题: 只有 limit = (期望条数) 且 K 线连续时才正确

fetch_candles_range_with_retry(since_ms, until_ms):
  → 分页拉取 [since_ms, until_ms] 的所有 K 线    ← 正确的时间范围语义
  → 自动处理分页、重试、前进保护
  → ✅ 语义清晰，无歧义

修复方案

让 backfill_all_data.py 也使用 fetch_candles_range_with_retry，然后可以考虑废弃 fetch_candles_with_retry 函数。

BUG 联合效应图

graph TD
    A["全量回补脚本<br>backfill_all_data.py"] --> B["fetch_candles_with_retry<br>(limit→endTime 语义)"]
    B --> C["分页推进 +1ms<br>❌ BUG #1"]
    C --> D{"数据有空洞?"}
    D -->|"是"| E["提前 break<br>丢失后续数据 ❌"]
    D -->|"否"| F["数据可能重复<br>⚠️ 写入开销增加"]

    G["修复执行器<br>repair_executor.py"] --> H["_build_analysis_record"]
    H --> I["zscore_4h = zscore<br>❌ BUG #2 无条件赋值"]
    I --> J{"timeframe = '4h'?"}
    J -->|"是"| K["✅ 碰巧正确"]
    J -->|"否"| L["❌ 错误 zscore_4h<br>数据质量降级"]

    M["遗留函数<br>fetch_candles_with_retry"] --> N["limit 语义<br>= 时间跨度/interval"]
    N --> B

    style C fill:#ff6b6b,color:#fff
    style E fill:#ff4444,color:#fff
    style I fill:#ffa07a,color:#fff
    style L fill:#ff4444,color:#fff
    style K fill:#4CAF50,color:#fff

优先级排序

优先级	BUG	严重性	触发条件	影响	修复复杂度
P1	BUG #1: 回补脚本 `+1ms` 分页	严重 🔴	回补 K 线时遇到数据空洞	空洞后数据全部丢失；正常情况数据重复	低（改用 `+interval_ms` 或 `range` 版本）
P2	BUG #2: `zscore_4h` 无条件赋值	中 🟠	`timeframe != '4h'` 时	z-score 字段静默错误（目前未触发）	极低（一行条件判断）
P3	BUG #3: `fetch_candles_with_retry` 遗留	低 🟡	任何使用该函数的调用方	语义混淆，新代码可能误用	低（统一到 `range` 版本）

根因总结

第五/六轮修复的遗漏：
  ┌─────────────────────────────────────────────────────────────┐
  │ 核心路径（kline_data_filler + hyperliquid_candles）           │
  │ → 3 个 BUG 全部已修复 ✅                                     │
  │ → fill_missing_data_precise 使用 range API ✅                │
  │ → 全局锁缩小到限流逻辑 ✅                                    │
  │ → 分页用 +interval_ms + 推进保护 ✅                           │
  └─────────────────────────────────────────────────────────────┘

  ┌─────────────────────────────────────────────────────────────┐
  │ 周边代码（backfill脚本 + repair执行器）                       │
  │ → 仍存在同类缺陷 ❌                                          │
  │ → backfill_all_data.py 未同步更新分页逻辑                     │
  │ → repair_executor._build_analysis_record 有字段赋值不一致     │
  │ → fetch_candles_with_retry 函数遗留未清理                     │
  └─────────────────────────────────────────────────────────────┘

与前六轮 BUG 的关系

轮次	发现	层级	状态
第一轮 (BUG1.md)	`_find_kline_gaps` 概念混淆	业务层	✅ 已修复
第二轮 (BUG2.md)	间隔硬编码 / 无新鲜度检查	业务层	✅ 已修复
第三轮 (BUG3.md)	SQL参数化 / 窗口边界	业务层	✅ 已修复
第四轮 (BUG4.md)	timeline 无限制 / cointegration 硬编码	业务层	❌ 待修复
第五轮 (BUG5.md)	limit→endTime 语义错误 + 全局锁 + 分页	基础设施层	✅ 已修复
第六轮 (BUG6.md)	同一根因在实时路径的表现	基础设施层	✅ 已修复
第七轮 (本轮)	回补脚本分页缺陷 + 修复器字段错误	周边代码	❌ 待修复

[!IMPORTANT]
本轮发现的 BUG 位于周边代码（回补脚本、修复执行器），而非核心生产路径。
BUG #1（回补分页）影响数据完整性，但仅在手动执行回补脚本时触发。
BUG #2（zscore_4h 字段）是潜伏 BUG，当前 repair 调用方固定使用 timeframe='4h'，
因此在生产中该 BUG 暂未被触发，但应尽快修复以防未来变更引入。

CCXT替换为Hyperliquid官方SDK存在的BUG4

SHI XIAOLONG

Hyperliquid SDK K 线请求 BUG — 第七轮因果链分析（现存 BUG 审计）

第五/六轮修复状态确认

BUG #1（严重 🔴）：`backfill_all_data.py` 分页推进用 `+1ms` — 与已修复的 BUG5 #3 同类缺陷

因果链

具体推演

触发场景

修复方案

BUG #2（中 🟠）：`repair_executor._build_analysis_record` 硬编码 `zscore_4h = zscore` — 错误写入 Z-score 字段

因果链

具体推演

当前影响评估

修复方案

BUG #3（低 🟡）：`fetch_candles_with_retry` 函数遗留 — `limit→endTime` 语义仍然错误

因果链

分析

修复方案

BUG 联合效应图

优先级排序

根因总结

与前六轮 BUG 的关系

Read more

跑步的技巧（滚动落地）

AMI的优越性

什么是：“世界模型（World Models）”

K线周期可配置化设计方案

Hyperliquid SDK K 线请求 BUG — 第七轮因果链分析（现存 BUG 审计）

第五/六轮修复状态确认

BUG #1（严重 🔴）：backfill_all_data.py 分页推进用 +1ms — 与已修复的 BUG5 #3 同类缺陷

因果链

具体推演

触发场景

修复方案

BUG #2（中 🟠）：repair_executor._build_analysis_record 硬编码 zscore_4h = zscore — 错误写入 Z-score 字段

因果链

具体推演

当前影响评估

修复方案

BUG #3（低 🟡）：fetch_candles_with_retry 函数遗留 — limit→endTime 语义仍然错误

因果链

分析

修复方案

BUG 联合效应图

优先级排序

根因总结

与前六轮 BUG 的关系

Read more

跑步的技巧（滚动落地）

AMI的优越性

什么是：“世界模型（World Models）”

K线周期可配置化设计方案

BUG #1（严重 🔴）：`backfill_all_data.py` 分页推进用 `+1ms` — 与已修复的 BUG5 #3 同类缺陷

BUG #2（中 🟠）：`repair_executor._build_analysis_record` 硬编码 `zscore_4h = zscore` — 错误写入 Z-score 字段

BUG #3（低 🟡）：`fetch_candles_with_retry` 函数遗留 — `limit→endTime` 语义仍然错误