Skip to content

Pull requests: UKGovernmentBEIS/inspect_ai

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

test(extensions): correct ChatMessage/GenerateConfig types and add hook-ordering regression
#3865 opened May 7, 2026 by sjawhar Contributor Loading…
1 task done
test(conftest): dynamic ThreadedMotoServer port and xdist worker session-finish guard
#3864 opened May 7, 2026 by sjawhar Contributor Loading…
2 tasks done
fix(model): GenerateConfig() shared mutable default in get_model and resolve_models
#3863 opened May 7, 2026 by sjawhar Contributor Loading…
1 task done
fix(log/eval): preflight ETag check on S3 conditional write
#3862 opened May 7, 2026 by sjawhar Contributor Loading…
1 task done
fix(view): support FastAPI 0.118+ where fastapi._compat.v2 is removed
#3861 opened May 7, 2026 by sjawhar Contributor Loading…
1 task done
feat(model): per-attempt ModelEvent retry accounting and timing
#3860 opened May 7, 2026 by sjawhar Contributor Loading…
1 task done
Stop operator-interrupted samples from failing eval on scorer error
#3859 opened May 7, 2026 by rasmusfaber Contributor Loading…
1 task done
Add machine-readable logger format
#3857 opened May 6, 2026 by mindbomber Loading…
Add direct link to one-time front-end submodule setup in project README
#3856 opened May 6, 2026 by glasnt Loading…
1 of 5 tasks
bedrock: drop unsupported sampling params for Claude 4.7+ (#3766)
#3855 opened May 6, 2026 by WatchTree-19 Loading…
1 of 5 tasks
feat(sagemaker): Add prompt_logprobs support for SageMaker perplexity scoring
#3853 opened May 6, 2026 by avadali-amzn Contributor Loading…
2 of 5 tasks
Fix MMLU CLI command on Evals page (#3834)
#3852 opened May 6, 2026 by antnewman Loading…
2 of 5 tasks
Anthropic: skip top-level cache_control on Bedrock/Vertex
#3851 opened May 6, 2026 by jon-aisi Loading…
1 of 5 tasks
Add aggregate(key, agg=...) metric factory (#3735)
#3850 opened May 6, 2026 by antnewman Loading…
2 of 5 tasks
fix: preserve operator interrupt context when scoring fails
#3847 opened May 6, 2026 by revmischa Contributor Draft
3 tasks done
fix(eval-set): bump retry log timestamp to avoid clobbering failed log
#3837 opened May 5, 2026 by ransomr Collaborator Loading…
1 task done
hf_dataset: retry transient HF errors
#3836 opened May 5, 2026 by FazeelUsmani Contributor Loading…
2 of 5 tasks
Fetch pending sample data directly from S3 in viewer
#3835 opened May 5, 2026 by rasmusfaber Contributor Loading…
3 of 5 tasks
fix: store and aggregate results for cancelled eval runs
#3828 opened May 4, 2026 by PranshuSrivastava Loading…
1 of 5 tasks
Agent bridge: keep ChatMessageSystem.id stable across content mutations
#3806 opened Apr 30, 2026 by ezra-apollo Loading…
3 tasks done
Agent bridge: preserve ChatMessage and ToolCall ids across turns
#3805 opened Apr 30, 2026 by ezra-apollo Loading…
3 tasks done
Fix(scorer): strip % in numeric match for face-value comparisons
#3782 opened Apr 28, 2026 by RecreationalMath Contributor Loading…
2 of 5 tasks
add redirect to profiles.ini
#3702 opened Apr 17, 2026 by anthonyduong9 Contributor Draft
1 of 5 tasks
ProTip! Follow long discussions with comments:>50.