You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our research will utilize the MS MARCO dataset, a comprehensive collection of 8.8 million documents. Efficiently retrieving relevant documents from this vast pool necessitates an indexing process, which we will execute using the \texttt{pyterrier} API.
56
56
57
-
Section \ref{sec:problem} will articulate the specific problem we aim to address. We will then elaborate on the methodologies behind our retrieval pipelines. Broadly, these pipelines can be broken down into stages: Query revision, single- or multi-pass retrieval, and reranking. Sections \ref{sec:baseline} and\ref{sec:advanced} will delve into our baseline and advanced techniques for each of these stages, respectively. Our report will conclude in section\ref{sec:results}, where we evaluate and compare the effectiveness of our systems, highlighting the strengths and potential areas of enhancement in our advanced approaches.
57
+
Section \ref{sec:problem} will articulate the specific problem we aim to address. In Section \ref{sec:related} we will explore literature relevant to our retrieval pipelines. We will then elaborate on the methodologies behind our retrieval pipelines in Sections \ref{sec:baseline}, \ref{sec:baseline+rm3},\ref{sec:doc2query-method}, and \ref{sec:doc2query-method+rm3}. Our report will conclude in Section\ref{sec:results}, where we evaluate and compare the effectiveness of our systems, highlighting the strengths and potential areas of enhancement in our advanced approaches.
58
58
59
59
The codebase for our project can be found on GitHub\footnote{URL: \url{https://github.com/CodingTil/2023_24---IRTM---Group-Project}}.
In this section, we delve into pertinent research encompassing the realms of conversational search engines and the broader area of information retrieval. While certain highlighted studies do not directly cater to conversational search engines or explicit information retrieval, their techniques remain invaluable in various stages of the conversational retrieval process.
74
+
In this Section, we delve into pertinent research encompassing the realms of conversational search engines and the broader area of information retrieval. While certain highlighted studies do not directly cater to conversational search engines or explicit information retrieval, their techniques remain invaluable in various stages of the conversational retrieval process.
75
75
76
76
\subsection*{Pseudo-Relevance Feedback by Query Expansion}\label{sec:prf}
77
77
@@ -161,7 +161,7 @@ \section{Incorporating Pseudo-Relevance Feedback into Our Baseline}\label{sec:ba
As stated in section\ref{sec:baseline}, the baseline method can be parameterized in a few different ways. For this evaluation, we utilized the following configurations: The document retrieval (\texttt{BM25}) used the default parameters from \texttt{pyterrier}\footnote{URL: \url{https://pyterrier.readthedocs.io/en/latest/terrier-retrieval.html}} to retrieve the 1000 most-relevant documents for each query. All 1000 documents were then reranked using the \texttt{monoT5} reranker. Because of the high computational cost of the \texttt{duoT5} reranker, only of those 1000 documents the best 50 documents were then reordered using this reranker.
212
+
As stated in Section\ref{sec:baseline}, the baseline method can be parameterized in a few different ways. For this evaluation, we utilized the following configurations: The document retrieval (\texttt{BM25}) used the default parameters from \texttt{pyterrier}\footnote{URL: \url{https://pyterrier.readthedocs.io/en/latest/terrier-retrieval.html}} to retrieve the 1000 most-relevant documents for each query. All 1000 documents were then reranked using the \texttt{monoT5} reranker. Because of the high computational cost of the \texttt{duoT5} reranker, only of those 1000 documents the best 50 documents were then reordered using this reranker.
213
213
214
214
For the extension of the baseline, the baseline + \texttt{RM3} method, we utilized the same configuration for these components. The \texttt{RM3} query expansion component was parameterized to expand the query by 26 terms, using the top 17 documents retrieved by the initial \texttt{BM25} retrieval.
Summarize and discuss different challenges you faced and how you solved those. Include interpretations of the key facts and trends you observed and pointed out in the Results section. Which method performed best, and why? Speculate: What could you have done differently, and what consequences would that have had?
236
+
Summarize and discuss different challenges you faced and how you solved those. Include interpretations of the key facts and trends you observed and pointed out in the Results Section. Which method performed best, and why? Speculate: What could you have done differently, and what consequences would that have had?
237
237
238
238
%%
239
239
%% If your work has an appendix, this is the place to put it.
0 commit comments