Mitigate PageIndex file-descriptor buildup during long PDF indexing by plasma16 · Pull Request #44 · VectifyAI/OpenKB

plasma16 · 2026-05-07T08:35:39Z

Purpose

Reduce risk of [Errno 24] Too many open files during long-PDF indexing with retries.

Changes

Add best-effort PageIndex client cleanup helper in openkb/indexer.py.
Create a fresh PageIndexClient per retry attempt.
Explicitly close backend storage after failed attempts and at function exit.
Trigger gc.collect() between failed attempts to accelerate descriptor release in long runs.

Why this helps

In local mode, repeated failed/retried indexing can leave resources alive longer than expected. This patch ensures resources are released promptly per attempt.

Test evidence

python3 -m py_compile openkb/indexer.py (pass)
Full pytest suite not run in this environment because pytest is not installed in system Python.

Notes

This is an OpenKB-side mitigation while upstream pageindex resource handling can be further hardened.

Fix XLSX ingestion memory spikes with streaming parser

linuxuser and others added 6 commits May 7, 2026 15:14

Fix XLSX ingestion memory spikes with streaming parser

c3a1f11

Incrementally resume long-PDF ingestion via cached PageIndex doc_id

06c5954

Merge pull request VectifyAI#1 from plasma16/fix/xlsx-memory-spike

554d93b

Fix XLSX ingestion memory spikes with streaming parser

Merge branch 'feat/long-pdf-resume' into main

c5e1752

Mitigate PageIndex file descriptor buildup on retries

7b9a00e

Harden PDF ingestion fallback and filename normalization

48f9041

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mitigate PageIndex file-descriptor buildup during long PDF indexing#44

Mitigate PageIndex file-descriptor buildup during long PDF indexing#44
plasma16 wants to merge 6 commits intoVectifyAI:mainfrom
plasma16:fix/pageindex-fd-cleanup

plasma16 commented May 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

plasma16 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Why this helps

Test evidence

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

plasma16 commented May 7, 2026 •

edited

Loading