Commit 6fa6278
authored
Make benchmark dataset scrubbing metadata-driven and unify HDF5/MFD load behavior (#653)
- Added metadata-controlled benchmark loader behavior via `load_behavior` with `LEGACY_SCRUB` and `NO_SCRUB`.
- Introduced `DataSetUtils.processDataSet(...)` as the central processing path and kept the old scrubbing path behind a deprecated compatibility method.
- Updated both `DataSetLoaderHDF5` and `DataSetLoaderMFD` to carry full `DataSetProperties` through loading instead of reducing metadata to only similarity.
- Removed HDF5 filename-based similarity inference and made curated dataset metadata authoritative.
- Expanded `dataset_metadata.yml` to include explicit `similarity_function` and `load_behavior` entries for curated HDF5 and MFD datasets.
- Added console reporting of the dataset similarity function so the effective metadata-supplied similarity is visible during indexing runs.1 parent 18488b8 commit 6fa6278
17 files changed
Lines changed: 418 additions & 187 deletions
File tree
- jvector-examples
- src/main/java/io/github/jbellis/jvector/example
- benchmarks/datasets
- yaml-configs
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
236 | 236 | | |
237 | 237 | | |
238 | 238 | | |
| 239 | + | |
239 | 240 | | |
240 | 241 | | |
241 | 242 | | |
| |||
Lines changed: 18 additions & 40 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
20 | 19 | | |
21 | 20 | | |
22 | 21 | | |
| |||
41 | 40 | | |
42 | 41 | | |
43 | 42 | | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
48 | 46 | | |
49 | 47 | | |
50 | 48 | | |
| |||
57 | 55 | | |
58 | 56 | | |
59 | 57 | | |
60 | | - | |
61 | | - | |
| 58 | + | |
| 59 | + | |
62 | 60 | | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
| 61 | + | |
| 62 | + | |
67 | 63 | | |
68 | 64 | | |
69 | 65 | | |
70 | 66 | | |
71 | | - | |
72 | | - | |
| 67 | + | |
| 68 | + | |
73 | 69 | | |
74 | 70 | | |
75 | 71 | | |
| |||
103 | 99 | | |
104 | 100 | | |
105 | 101 | | |
106 | | - | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
107 | 108 | | |
108 | 109 | | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
| 110 | + | |
116 | 111 | | |
117 | 112 | | |
118 | | - | |
119 | | - | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | | - | |
124 | | - | |
125 | | - | |
126 | | - | |
127 | | - | |
128 | | - | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | | - | |
133 | | - | |
134 | | - | |
135 | | - | |
136 | | - | |
| 113 | + | |
| 114 | + | |
137 | 115 | | |
138 | 116 | | |
139 | 117 | | |
| |||
Lines changed: 8 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
21 | 20 | | |
22 | 21 | | |
23 | 22 | | |
| |||
67 | 66 | | |
68 | 67 | | |
69 | 68 | | |
70 | | - | |
| 69 | + | |
71 | 70 | | |
72 | 71 | | |
73 | | - | |
| 72 | + | |
74 | 73 | | |
75 | 74 | | |
76 | 75 | | |
| |||
204 | 203 | | |
205 | 204 | | |
206 | 205 | | |
207 | | - | |
| 206 | + | |
| 207 | + | |
208 | 208 | | |
209 | | - | |
210 | | - | |
211 | | - | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
212 | 212 | | |
213 | 213 | | |
214 | 214 | | |
215 | | - | |
| 215 | + | |
216 | 216 | | |
217 | 217 | | |
218 | 218 | | |
| |||
Lines changed: 24 additions & 13 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
56 | | - | |
| 56 | + | |
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| |||
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
75 | | - | |
76 | | - | |
| 75 | + | |
77 | 76 | | |
78 | 77 | | |
79 | 78 | | |
| |||
82 | 81 | | |
83 | 82 | | |
84 | 83 | | |
85 | | - | |
86 | | - | |
87 | | - | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
88 | 91 | | |
89 | 92 | | |
90 | 93 | | |
91 | 94 | | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
92 | 103 | | |
93 | | - | |
94 | | - | |
| 104 | + | |
| 105 | + | |
95 | 106 | | |
96 | | - | |
97 | | - | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
98 | 110 | | |
99 | | - | |
100 | | - | |
101 | | - | |
| 111 | + | |
| 112 | + | |
102 | 113 | | |
103 | 114 | | |
Lines changed: 39 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
54 | 69 | | |
55 | 70 | | |
56 | 71 | | |
| |||
97 | 112 | | |
98 | 113 | | |
99 | 114 | | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
100 | 127 | | |
101 | 128 | | |
102 | 129 | | |
| |||
222 | 249 | | |
223 | 250 | | |
224 | 251 | | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
225 | 264 | | |
226 | 265 | | |
0 commit comments