Incrementally resume long-PDF ingestion using cached PageIndex doc_id by plasma16 · Pull Request #43 · VectifyAI/OpenKB

plasma16 · 2026-05-07T07:31:03Z

Summary

add long-PDF ingest checkpoint state in .openkb/long_pdf_jobs.json
cache doc_id and description after successful PageIndex indexing
on re-run, reuse cached doc_id for long PDFs and retry compilation directly
persist index/compile failure state for troubleshooting and incremental retry

Why

When long PDF ingestion fails after indexing, re-running currently re-indexes the same document. This change makes retries incremental for long PDFs while leaving existing skip behavior unchanged for other file types.

Scope

only long-document (long_pdf) ingestion path
no queue/cursor behavior for non-PDF files

Incrementally resume long-PDF ingestion via cached PageIndex doc_id

06c5954

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incrementally resume long-PDF ingestion using cached PageIndex doc_id#43

Incrementally resume long-PDF ingestion using cached PageIndex doc_id#43
plasma16 wants to merge 1 commit intoVectifyAI:mainfrom
plasma16:feat/long-pdf-resume

plasma16 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

plasma16 commented May 7, 2026

Summary

Why

Scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant