Cross-subject transfer learning via CrossSubjectEvaluation (calibration_size)#1093
Cross-subject transfer learning via CrossSubjectEvaluation (calibration_size)#1093bruAristimunha wants to merge 16 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: da2b30015c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| md["X_target_labeled" if labeled else "X_target_unlabeled"] = X[calib] | ||
| if labeled: | ||
| md["y_target_labeled"] = y[calib] |
There was a problem hiding this comment.
Reject labeled 100% calibration to avoid leaking test labels
When calibration_size=1.0 is combined with labeled=True, TransferSplitter makes calib identical to test, so this block routes y[test] as y_target_labeled before the same trials are scored. Any target-aware estimator that requests y_target_labeled can train on the labels of every evaluated sample, making the reported score invalid; please disallow this combination or avoid passing labels when the calibration slice overlaps the test slice.
Useful? React with 👍 / 👎.
…aluation
Run target-aware transfer protocols through the existing CrossSubjectEvaluation,
instead of a dedicated evaluation engine or a separate transfer module.
- CrossSubjectEvaluation gains calibration_size (+ calibration_labeled): when
> 0 it wraps its CrossSubjectSplitter in TransferSplitter, so each fold yields
(train, calibration, test).
- base.py consumes the split with `train, *cal, test` (parallel + serial paths);
the held-out calibration slice is routed RAW to the pipeline steps that
request it via sklearn metadata routing (subjects, X_target_unlabeled /
X_target_labeled). Plain pipelines request nothing and are unaffected -- the
estimator owns the target representation, so there is no transform-through-steps.
- TransferSplitter is the single generic transfer splitter (subject / session /
dataset).
Removes moabb/evaluations/transfer.py and CrossSubjectTransferSplitter.
Usage:
CrossSubjectEvaluation(..., calibration_size=0.2).process(pipelines)
da2b300 to
2b06658
Compare
- calibration_size / calibration_labeled now ride cv_kwargs instead of bespoke
CrossSubjectEvaluation __init__ params: read from self.cv_kwargs and stripped
before the inner CV. No __init__ override.
- Expose cv_class as a documented option (like WithinSessionEvaluation); it
composes with calibration.
- base.py / serial evaluate() read calibration_labeled from cv_kwargs.
Numerically identical to CrossSubjectTargetAwareEvaluation.process() on
BNCI2014_004 (max abs score diff 0.0) when the target-aware estimator covs the
raw target and declares matching fit/transform metadata requests.
Usage:
CrossSubjectEvaluation(..., cv_kwargs={"calibration_size": 0.2}).process(pipelines)
Move the transfer calibration into the splitter so CrossSubjectEvaluation's _create_splitter is the plain original (no .get / .pop / wrapper). - CrossSubjectSplitter gains a calibration_size param: yields (train, calib, test) when > 0, otherwise the usual (train, test). Removes TransferSplitter. - _resolve_cv now always merges self.cv_kwargs over the defaults (a latent fix), so calibration_size flows via cv_kwargs with the default cv_class too. - Drop calibration_labeled: _evaluate_fold offers all transfer kwargs (subjects, X_target_unlabeled, X_target_labeled, y_target_labeled) and metadata routing (consumes) keeps only what the estimator requested, so the estimator's set_fit_request decides labeled vs unlabeled. Numerically identical to CrossSubjectTargetAwareEvaluation.process() on BNCI2014_004 (max abs score diff 0.0).
|
I will need more time to analyze your code Monday. Here are my initial comments:
For example: would lead in the RPA transformer to: Not sure how this can be handled. I also remind you that we can do a Teams meeting and discuss. It will be easier to discuss some details in person. |
gcattan
left a comment
There was a problem hiding this comment.
Thanks guys for looking into this.
I see good comments on both side.
May be we can have a short call next week to synchronize, and decide a common implementation?
| with config_context(enable_metadata_routing=True): | ||
| step = _TransferRecorder().set_fit_request(subjects=True, X_target_unlabeled=True) | ||
| pipe = make_pipeline(Covariances("oas"), step, CSP(8), LDA()) | ||
| ds = FakeDataset(["left_hand", "right_hand"], n_subjects=3, n_sessions=2, seed=9) |
There was a problem hiding this comment.
@toncho11 does this lines answer your concern on RPA?
There was a problem hiding this comment.
No. Bacause the test does not check if X and X_target_unlabeled have the same representation. A real RPA step would need either transformed target covariance data, or it would need to compute target covariances internally from the raw calibration slice.
|
I am preparing some improvements to this PR. |
- Added separate prediction modes: blockwise vs trialwise. - Enabled strict one-trial-at-a-time testing. - Added unlabeled target adaptation with 20%, 50%, or 100% target data. - Added labeled target calibration with 20% or 50% target data. - Fixed unsafe 100% labeled target usage by forbidding it. - Ensured labels are routed only when the protocol allows labeled calibration. - Ensured unlabeled protocols never pass target labels. - Updated the splitter to produce optional `(train, calibration, test)` folds. - Made protocols easier to report clearly in benchmarks and papers.
|
@bruAristimunha I have created a new Pull Request to add to your current code. It is here: bruAristimunha#8 Next will be the adapatation of RPA. I am currently thinking on what is the best strategy. |
|
hey @toncho11! I merged, but as maintainer, you can direct commit here ;) https://tighten.com/insights/adding-commits-to-a-pull-request/ |
|
I'm happy to discuss today if you have any time, guys! |
|
Nice! Sending invitation for 3pm CET time, to discuss last point on RPA. |
|
So the problem is not about just RPA algorithm, it is about all Transfer Learnig classfiers / tansformers: If we use a pipeline like:
then RPA.fit(X, ...) receives X after Covariances(), so source data is covariance matrices. But the routed metadata X_target_unlabeled / X_target_labeled usually comes directly from the evaluation, so it is still raw epochs. Therefore RPA may receive: That is inconsistent. Solutions:
The problem is that the pipeline before RPA must be the fitted clone from the current CV fold, not the original pipeline object. Otherwise we risk using an unfitted transformer, the wrong fitted state, or leakage between folds. |
|
After our meeting: quick check on the solution proposed by @bruAristimunha I think using the MOABB paradigm's process pipeline could solve the representation problem only for preprocessing steps that do not need to be fitted. If the transformation is fixed or stateless, for example a deterministic conversion such as covariance computation with Covariances(), then both source data and target calibration data can be converted before entering the evaluation pipeline, and RPA will receive them in the same representation. But this does not work for preprocessing steps that need to be fitted on the training fold, such as CSP, PCA, scaling, feature selection, or other learned transformations. The MOABB paradigm's postprocess pipeline is fixed and is not trained with fit(), so it cannot learn fold-specific parameters. |
Runnable example: a target-aware Riemannian Alignment step plugs into the
standard CrossSubjectEvaluation using cv_kwargs={"calibration_size": ...}
for the (train, calibration, test) split, metadata routing (set_fit_request)
to receive `subjects` + the unlabeled target slice, and
Pipeline(transform_input=["X_target_unlabeled"]) so the raw slice is
transformed through Covariances() before the alignment step -- no dedicated
evaluation engine, no separate transfer module.
|
Thanks @toncho11 — I went deep on all three points. Short version: each is addressable on the existing 1. Trialwise / one-shotFor any inductive estimator (every standard Riemannian/CSP source-only pipeline), If you want the ironclad runtime guarantee that the estimator only ever sees one trial, that's a tiny reusable wrapper at the pipeline level — 2. Named modes for comparabilityAgreed that a free float fragments benchmarks. We can ship named presets so papers cite a name, not a raw number: from moabb.evaluations.protocols import CALIB_20, CALIB_50 # {"calibration_size": 0.2}, ...
CrossSubjectEvaluation(cv_kwargs=CALIB_20)Sugar over 3. Representation (RPA after
|
|
Thank you Bruno! Here are my quick thoughts: 1) Trialwise / one-shotIt was never about different results. I never doubted that. It was about preventing the client (of the MOABB library) from accessing the whole block. It is about imposing a restriction over the user. If you do not have this rule, somebody might say “OK, I have access to the whole, block, so I am allowed to use it”. It is about preventing cheating and confusion. The user gets only what he is allowed to receive. So the responsibility should be on the evaluation/benchmark side, not the user. The objective is to prevent both “cheating” and unintended accidental leakage. Somebody gets better results, he publishes and later for everyone’s disappointment, he was using unfairly (for the category/mode he has selected) extra data, he was not supposed to. 2) Named modes for comparabilityThis is a bit related to 1). The idea is to impose clear rules and limitations for fair, proper benchmarking. The idea is that once you select a mode, you get only what you are allowed and not what you request. You might request labeled data, but if the official benchmark mode you have selected says “unlabeled_50” then you will get only unlabeled 50%. The idea in general is to select not only calibration size, but also labeled/unlabeled, trialwise or blockwise. Also, I cannot say right now if with this new way you suggest we can easily set all the modes/protocols that we need. In my PR we have almost maximum flexibility which modes to define - now and in the future. I predict there will be changes. Someone or me might ask for adding more modes/protocols. 3) Representation (RPA after Covariances())Sounds great! I did not know this was possible with sklearn. Thank you! I need to study it further and test it. I am also busy with my other projects. So, I might react with delays. And I agree that a meeting is a good idea :) |
|
On the blockwise vs trialwise: If |
|
Ok, I think everyone agree on Point 2 and 3 (nice finding btw). On the blockwise vs trialwise, I see it requires more discussion. @bruAristimunha @toncho11 same time next week? |
…cross_val_predict/LeaveOneOut)
|
Follow-up on point 1 (trialwise) — there's a pure scikit-learn recipe, no custom wrapper: from sklearn.frozen import FrozenEstimator # >=1.6, blocks re-fit
from sklearn.model_selection import cross_val_predict, LeaveOneOut
pred = cross_val_predict(FrozenEstimator(source_model), X_target, y_target,
cv=LeaveOneOut(), method="predict")
|
Remove the public predict_mode API and related prediction-mode enum/wrapper.Keep trialwise evaluation as an official CsMode preset instead of exposing it as a general user-configurable option. The trialwise behavior is now handled by a small private helper function at scoring time, not by a public wrapper class.
|
Thank you Bruno. I have been working on improvements of my own this morning. I agree that the I took into account some of your comments and I removed the broader public
This keeps the API simpler while also still allowing MOABB to take responsibility for the trialwise benchmark protocol. |
|
Also: MOABB still supports both:
|
|
Quck summary of my last changes:
|
|
I think I get the point. Even not on purpose, user can still do honest mistake. |
|
Here are my thoughts on two points:
I can work/discuss on this only Wednesday. So we can make a meeting Wednesday or 16 July. I will be on vacation next week. Wednesday could be to just sync ideas and then we continue around 16 of July. |
|
let's talk wed in the morning? |
|
OK, I have sent you a Teams invitation for 10h30 this morning on your gmail. |
|
Or it might be better if you create a meeting and I join? If this works better. |
|
|
we keep the enum functionality |
…algorithm to adapt to the selected Cross Subject benchmark mode and also to refuse running in certain benchmark modes for which it is was designed for.
… the RPA, but now it also uses the cs_mode. The second is subject-prototype MDM classifier, which also allows to be used only in TRAIN_TRIALWISE mode. Some small comment updates to protocols.py.
|
My last contributions are:
|
Addresses #1077. The split-based alternative to the dedicated evaluation engine in #1091, following the direction discussed in #1077: target-aware transfer learning runs through the existing
CrossSubjectEvaluationwith a few lines of change — no new evaluation class, no separate transfer module.Usage
What it adds
TransferSplitter(base_splitter, calibration_size)(splitters.py) — a generic wrapper that carves the firstcalibration_sizefraction off the held-out group of any leave-one-group-out splitter, yielding(train, calibration, test). Gives subject-, session-, and dataset-transfer from one mechanism.CrossSubjectEvaluationgainscalibration_size(+calibration_labeled); when> 0,_create_splitterwraps itsCrossSubjectSplitterinTransferSplitter.base.pyconsumes the split withtrain, *cal, test(parallel + serial paths). The held-out calibration slice is routed raw to the pipeline steps that request it via scikit-learn metadata routing (set_fit_request):subjects,X_target_unlabeled/X_target_labeled. Plain pipelines request nothing →{}→ the fit is unchanged.Design notes
base.pydoes not transform it through the pipeline (no transform-through-steps). The transfer estimator owns the target representation. This is what keeps the change minimal.subjects/target data to estimators.Verification
Full evaluation + splitter test suites pass. Integration test (
test_cross_subject_calibration_*): a target-aware step receives the routedsubjects+ non-emptyX_target_unlabeledon every fold, and a plain pipeline runs untouched atcalibration_size=0.5.cc @toncho11 — concrete, minimal counter-proposal to #1091; an existing target-aware estimator (e.g. RPA) drops in by declaring
set_fit_request(subjects=True, X_target_unlabeled=True, ...).