Skip to content

Mitigate PageIndex file-descriptor buildup during long PDF indexing#44

Open
plasma16 wants to merge 6 commits intoVectifyAI:mainfrom
plasma16:fix/pageindex-fd-cleanup
Open

Mitigate PageIndex file-descriptor buildup during long PDF indexing#44
plasma16 wants to merge 6 commits intoVectifyAI:mainfrom
plasma16:fix/pageindex-fd-cleanup

Conversation

@plasma16
Copy link
Copy Markdown

@plasma16 plasma16 commented May 7, 2026

Purpose

Reduce risk of [Errno 24] Too many open files during long-PDF indexing with retries.

Changes

  • Add best-effort PageIndex client cleanup helper in openkb/indexer.py.
  • Create a fresh PageIndexClient per retry attempt.
  • Explicitly close backend storage after failed attempts and at function exit.
  • Trigger gc.collect() between failed attempts to accelerate descriptor release in long runs.

Why this helps

In local mode, repeated failed/retried indexing can leave resources alive longer than expected. This patch ensures resources are released promptly per attempt.

Test evidence

  • python3 -m py_compile openkb/indexer.py (pass)
  • Full pytest suite not run in this environment because pytest is not installed in system Python.

Notes

This is an OpenKB-side mitigation while upstream pageindex resource handling can be further hardened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant