Fix XLSX ingestion memory spikes with streaming parser by plasma16 · Pull Request #42 · VectifyAI/OpenKB

plasma16 · 2026-05-07T07:16:33Z

Summary

route .xlsx conversion through a streaming openpyxl reader (read_only=True, data_only=True)
cap scan bounds (max_rows=5000, max_cols=64) to prevent pathological worksheet ranges from exploding memory
stop scanning after sustained empty tails to avoid sparse-sheet runaway processing

Why

Some workbooks report huge used ranges (e.g. max_row=1048571) despite having very little real data, which can cause generic converters to consume excessive RAM.

Result

Significantly lower memory use during XLSX ingest while preserving useful sheet content for KB compilation.

Fix XLSX ingestion memory spikes with streaming parser

c3a1f11

plasma16 force-pushed the fix/xlsx-memory-spike branch from 9075e4c to c3a1f11 Compare May 7, 2026 07:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix XLSX ingestion memory spikes with streaming parser#42

Fix XLSX ingestion memory spikes with streaming parser#42
plasma16 wants to merge 1 commit intoVectifyAI:mainfrom
plasma16:fix/xlsx-memory-spike

plasma16 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

plasma16 commented May 7, 2026

Summary

Why

Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant