Skip to content

PyTerrier Integration#79

Open
seanmacavaney wants to merge 44 commits into
AnswerDotAI:mainfrom
seanmacavaney:pyterrier
Open

PyTerrier Integration#79
seanmacavaney wants to merge 44 commits into
AnswerDotAI:mainfrom
seanmacavaney:pyterrier

Conversation

@seanmacavaney

Copy link
Copy Markdown

This PR adds support for using rerankers in PyTerrier pipelines. It adds an optional [pyterrier] dependency and is constructed using a new .as_pyterrier_transformer() method (following the convention from LangChain).

The integration allows rerankers to be composed into larger retrieval experimental pipelines, e.g.,:

import pyterrier as pt
from rerankers import Reranker
from ir_measures import nDCG

dataset = pt.get_dataset('irds:msmarco-passage/trec-dl-2019/judged')
index = pt.Artifact.from_hf('macavaney/msmarco-passage.pisa') # load a PISA index for fast bm25 retrieval
reranker = Reranker('cross-encoder')
pipeline = index.bm25() % 100 >> dataset.text_loader() >> reranker.as_pyterrier_transformer()

pt.Experiment(
    [pipeline],
    dataset.get_topics(),
    dataset.get_qrels(),
    eval_metrics=[nDCG@10],
)

The pt_docs/index.rst is the documentation that will be pulled through to the PyTerrier documentation.

@Mandeep-Rathee (who co-authored it) and I have been using it for a while, and it's working great. We figure it would be helpful for others too.

seanmacavaney and others added 30 commits September 2, 2024 11:03
explicitely -> explicitly
* inference_mode instead of no_grad

* cleaner verbosity

* add image prep

* add image documents

* monovlm ranker

* version bump

* update pyproject

* wip

* wip

* example fully functional
…also fixed tests as Document has doc_id and not id. Last, add .conda/ to .gitignore (AnswerDotAI#44)
* Adding Pinecone support

* Adding documentation
* remove pydantic

* remove tqdm

* readme and project

* readme & version

* update broken test (pydantic)
Changed import paths:
- vicuna_reranker.py -> listwise/vicuna_reranker.py
- zephyr_reranker.py -> listwise/zephyr_reranker.py  
- rank_gpt.py -> listwise/rank_gpt.py
* wip

* modify prompt
* support TWOLAR

* support ecorank

* upr

* no ecorank for now

* update reranker

* update init

* add  flashrank kwargs

* add monobert support
* fix rankedrsults for transformer ranker

* add mxbai reranker v2 (qwen-based)

* version bump
* fix: removed no longer existent argument from call

* feat: add support for Isaacus reranking API
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.