Commit 9761fb4
committed
feat(rag): add searchable metadata context to vector chunks for filename/path queries
Problem:
When codebases and documents are ingested into VectorRAG, content loses its
source context. Agents cannot answer queries like "What's in ChatWidget.swift?"
or "Show me PDFs about X" because filenames and paths are stored in metadata
but not searchable via semantic search.
Root Cause:
Vector RAG stores filename/path in chunk metadata dictionary, but semantic
search only searches the chunk CONTENT field. The context field exists but
wasn't being used for source identification - it just had generic text like
"Text from document.title" or "Code from document.title".
Solution:
Enhanced the context field (which IS searchable) for all chunk types with
structured metadata that enables filename, path, and source-based queries:
Code Files:
- Context: "File: {filename} | Path: {relativePath} | Type: code"
- Example: "File: ChatWidget.swift | Path: Sources/UserInterface/Chat/ | Type: code"
- Enables: "Show me Swift files in UserInterface", "What's in ChatWidget.swift?"
Documents:
- Context: "Document: {filename} | Type: {format}"
- Example: "Document: report.pdf | Type: PDF"
- Enables: "Find PDF documents about X", "Show me Word docs"
Web Content:
- Context: "Web: {title} | Source: {domain}"
- Example: "Web: Swift Documentation | Source: swift.org"
- Enables: "What did we research from swift.org?", "Show web content about X"
Conversations:
- Context: "Conversation: {title} | Turn: {turnNumber}"
- Example: "Conversation: Mermaid Fixes | Turn: 5"
- Enables: "What did we discuss in conversation X?", "Find turn 5"
Changes:
- VectorRAGService.chunkCodeDocument(): Enhanced code chunk context
- VectorRAGService.chunkTextDocument(): Enhanced document/web chunk context
- VectorRAGService.chunkConversationDocument(): Enhanced conversation context
Testing:
✅ Build: PASS
✅ All chunk types create context with source metadata
✅ Metadata remains backward compatible (stored in metadata dict)
Impact:
Agents can now answer filename and path-based queries without violating
vector DB semantic search patterns. The context field becomes a rich
source of structured metadata while maintaining its searchable nature.
Training Use Case:
When exporting training data, filenames are now included in context,
making it easier to correlate training examples with source files
for better model understanding of codebase structure.1 parent 8e122a1 commit 9761fb4
3 files changed
Lines changed: 98 additions & 13 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
5 | | - | |
6 | | - | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
7 | 7 | | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
8 | 14 | | |
9 | 15 | | |
10 | 16 | | |
11 | 17 | | |
12 | 18 | | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
13 | 25 | | |
14 | 26 | | |
15 | | - | |
16 | | - | |
17 | | - | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
18 | 67 | | |
19 | 68 | | |
20 | 69 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
810 | 810 | | |
811 | 811 | | |
812 | 812 | | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
813 | 825 | | |
814 | 826 | | |
815 | | - | |
| 827 | + | |
816 | 828 | | |
817 | 829 | | |
818 | 830 | | |
| |||
835 | 847 | | |
836 | 848 | | |
837 | 849 | | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
838 | 862 | | |
839 | 863 | | |
840 | | - | |
| 864 | + | |
841 | 865 | | |
842 | 866 | | |
843 | 867 | | |
| |||
933 | 957 | | |
934 | 958 | | |
935 | 959 | | |
| 960 | + | |
| 961 | + | |
| 962 | + | |
| 963 | + | |
| 964 | + | |
| 965 | + | |
936 | 966 | | |
937 | 967 | | |
938 | | - | |
| 968 | + | |
939 | 969 | | |
940 | 970 | | |
941 | 971 | | |
| |||
948 | 978 | | |
949 | 979 | | |
950 | 980 | | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
951 | 987 | | |
952 | 988 | | |
953 | | - | |
| 989 | + | |
954 | 990 | | |
955 | 991 | | |
956 | 992 | | |
| |||
969 | 1005 | | |
970 | 1006 | | |
971 | 1007 | | |
972 | | - | |
| 1008 | + | |
973 | 1009 | | |
974 | 1010 | | |
975 | 1011 | | |
| |||
0 commit comments