Problem
The cognify pipeline runs extract_graph_from_data and summarize_text sequentially for each batch of chunks. Both tasks are LLM-bound and independent of each other, so they could run concurrently.
Current flow (sequential):
extract_graph_from_data(chunks) # LLM calls
↓
summarize_text(chunks) # LLM calls
Proposed flow (parallel):
asyncio.gather(
extract_graph_from_data(chunks), # LLM calls
summarize_text(chunks), # LLM calls
)
Impact
On a dataset with multiple chunks, this roughly halves the LLM-bound processing time for the cognify pipeline. In our testing with FalkorDB and OpenRouter, a 6KB document went from ~45s to ~25s for the graph+summary phase.
Implementation Notes
- Both functions accept the same
data_chunks input and produce independent outputs
- The results need to be merged before passing to
add_data_points
- Care needed with shared state (e.g., chunk objects should not be mutated by both tasks simultaneously)
- Could be implemented as a configuration option (parallel vs sequential) for safety
Problem
The cognify pipeline runs
extract_graph_from_dataandsummarize_textsequentially for each batch of chunks. Both tasks are LLM-bound and independent of each other, so they could run concurrently.Current flow (sequential):
Proposed flow (parallel):
Impact
On a dataset with multiple chunks, this roughly halves the LLM-bound processing time for the cognify pipeline. In our testing with FalkorDB and OpenRouter, a 6KB document went from ~45s to ~25s for the graph+summary phase.
Implementation Notes
data_chunksinput and produce independent outputsadd_data_points