fix: adjust NDR threshold behavior. Previously, NDR = 1 would overrid… by SuiYue-2308 · Pull Request #559 · GoekeLab/bambu

SuiYue-2308 · 2026-04-15T07:19:22Z

fix: adjust NDR threshold behavior

Type of Change

Bug fix (non-breaking change which fixes an issue)

Description

Previously, NDR = 1 would override remove.subset and min.sampleNumber. Now, NDR = 0.999 and NDR = 1 produce identical results except for the NDR threshold itself.

Implementation Details

Before, if the read class is a subset or low confidence transcript (can not pass the min.sampleNumber), it's NDR will be assigned to 1, which will be further reported when setting NDR = 1. Now, I set their NDR to NA, and add another check step before report novel transcript (!is.na(NDR)) to make sure the subset and low confidence are not included.

Impact of Changes

Briefly describe the impact of the implemented changes. For example:

Results are identical to the previous version when NDR = 0.999.
Now "NDR = 1" will not report subset and low confidence transcripts

Checklist

My code follows the style guidelines of this project.
I have performed a self-review of my own code.
I have commented my code, particularly in hard-to-understand areas.
I have tested the code on a full dataset, and any differences have been described in the Impact of Changes section.

…e `remove.subset` and `min.sampleNumber`. Now, NDR = 0.999 and NDR = 1 produce identical results except for the NDR threshold itself.

cying111 · 2026-04-15T08:30:28Z

Adding in my PR review comment here (also with assistance of claude code), my summary is issue 1, 3 worth noting, issue 4 is a comment adding, maybe good to have, but also fine to not have

Prompt used:
now help me review this PR: #559

Worth addressing before merge

bambu.R:164 sets NDR <- 1 for mode == "fusion". This fix
therefore changes fusion-mode behaviour: fusion mode now respects
remove.subsetTx and min.sampleNumber, where previously it didn't.
That may be correct (and may be exactly what refine the NDR usage #560 wants) but it's not
mentioned in the PR description. Either:
- confirm it's intended and call it out in the PR body / NEWS, or
- switch fusion mode to a sentinel that preserves the old permissive
  behaviour (e.g. pass NDR = Inf plus a separate remove.subsetTx = FALSE in fusion mode).
No test. The PR's central claim — "NDR = 0.999 and NDR = 1 produce
identical results except for the NDR threshold itself" — is easy to
encode as a testthat expectation against the existing fixture
(seIsoReCombined_SGNex_..._chr9_1_1000000.rds + the annotation GR).
The table above is effectively that test; porting it would guard against
a silent regression.
test_setNDR.R line 68–82 (pre-existing). The block
test_that("setNDR works when no lowConfidenceTranscripts are present", ...)
computes extendedAnnotations via isore.extendAnnotations(..., NDR = 1, ...)
but then asserts on the pre-saved extendedAnnotationsExpected fixture
— the freshly computed extendedAnnotations is never used. Not a
blocker for this PR, but this was the test most likely to have caught
the pre-fix bug, and it silently passed because of that oversight. A
follow-up to rewire that test against the computed object is worth a
separate commit.

Minor

The new predicate !is.na(x) & x <= NDR is a clean way to kill off NAs,
but it's worth a brief code comment that NA NDR = "deliberately filtered"
(subset or scoreless) so a future reader doesn't "simplify" it back.

jonathangoeke · 2026-04-15T10:04:01Z

Code review

Top 3 issues by confidence score (lower threshold than usual):

(score: 75) setNDR() includeRef=FALSE branch lacks !is.na() guards. With this PR, transcripts in lowConfidenceTranscripts can have NDR = NA. The includeRef=TRUE branch guards against NA, but the default includeRef=FALSE branch does not — NA <= NDR propagates NA into toAdd/toRemove, then into logical subsetting of the GRangesList, which can produce corrupted/NA entries.

bambu/R/bambu-extendAnnotations-utilityExtend.R

Lines 855 to 866 in 25aa679

    
           #If reference annotations should be filtered too (note that reference annotations with no read support arn't filtered) 
        
           if(includeRef){ 
        
               toRemove <- (!is.na(mcols(extendedAnnotations)$NDR) & mcols(extendedAnnotations)$NDR > NDR) 
        
               toAdd <- !is.na(mcols(metadata(extendedAnnotations)$lowConfidenceTranscripts)$NDR) &  
        
                   mcols(metadata(extendedAnnotations)$lowConfidenceTranscripts)$NDR <= NDR   
        
           } else { 
        
               toRemove <- (mcols(extendedAnnotations)$NDR > NDR &  
        
                   grepl(prefix, mcols(extendedAnnotations)$TXNAME)) 
        
               toAdd <- (mcols(metadata(extendedAnnotations)$lowConfidenceTranscripts)$NDR <= NDR &  
        
                   grepl(prefix, mcols(metadata(extendedAnnotations)$lowConfidenceTranscripts)$TXNAME))      
        
           }

(score: 50) Fusion mode side effect. bambu.R sets NDR <- 1 for fusion mode. Combined with the new filter at utilityExtend.R:115 (which now drops NDR == NA rows), fusion-mode output may differ from prior behavior in ways that aren't covered by tests.

bambu/R/bambu.R

Lines 161 to 167 in 25aa679

    
               processByChromosome <- TRUE 
        
           } 
        
           if(mode == "fusion"){ 
        
               NDR <- 1 
        
               fusionMode <- TRUE 
        
           } 
        
           if(mode == "debug"){

(score: 25) Semantic change to prior intent. Commit 0b47cb8 introduced NDR <- 1 for subset/low-confidence transcripts with the stated purpose "Report all possible transcripts if NDR = 1". This PR reverses that intent. Likely intentional given the PR description, but worth calling out explicitly so downstream users relying on NDR = 1 to recover all candidates know the behavior has changed.

bambu/R/bambu-extendAnnotations-utilityExtend.R

Lines 223 to 229 in 25aa679

    
                     message("NDR will be approximated as: (1 - Transcript Model Prediction Score)") 
        
               } else { 
        
                   combinedTranscripts$NDR <- calculateNDR(combinedTranscripts$maxTxScore, equal) 
        
               } 
        
               combinedTranscripts$NDR[combinedTranscripts$maxTxScore==-1] <- NA 
        
               return(combinedTranscripts) 
        
           }

Only #1 would have cleared the default score threshold (80); #2 and #3 are included at the user's request for a lower-threshold review.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

jonathangoeke

Did you confirm that you can reproduce results with NDR=1 (previous version) and NDR=1 (including subset and low confidence read classes)? Which parameters need to be selected to reproduce this behaviour?

Copilot

Pull request overview

Adjusts how Novel Discovery Rate (NDR) thresholding behaves during annotation extension so that NDR = 1 no longer unintentionally re-includes subset / low-confidence transcripts (e.g., those failing remove.subset or min.sampleNumber filters).

Changes:

Treat transcripts marked as filtered (maxTxScore == -1) as NDR = NA instead of NDR = 1.
Exclude NDR = NA transcripts from NDR-threshold filtering while still retaining "equal:compatible" classes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-16T01:37:45Z

        combinedTranscripts$NDR <- calculateNDR(combinedTranscripts$maxTxScore, equal)
    }
-    combinedTranscripts$NDR[combinedTranscripts$maxTxScore==-1] <- 1
+    combinedTranscripts$NDR[combinedTranscripts$maxTxScore==-1] <- NA


Changing filtered transcripts' NDR from 1 to NA means downstream thresholding logic must be NA-safe. In setNDR() (same file), the includeRef = FALSE branch builds toRemove/toAdd without !is.na(...), so transcripts with NDR = NA in metadata(extendedAnnotations)$lowConfidenceTranscripts can yield NA logical indices and break or mis-subset GRanges/GRangesList objects. Update those conditions to treat NA as FALSE (mirror the includeRef = TRUE branch by adding !is.na(mcols(...)$NDR) guards).

Copilot · 2026-04-16T01:37:45Z

+  filterSet <- ((!is.na(rowDataCombined$NDR) & rowDataCombined$NDR <= NDR) | rowDataCombined$readClassType == "equal:compatible")
  lowConfidenceTranscripts <- combindRowDataWithRanges(


This change alters the edge-case semantics for NDR = 1 vs NDR = 0.999 by excluding transcripts with NDR = NA (subset / min.sampleNumber-filtered). There isn’t currently a regression test covering this behavior (i.e., that NDR = 1 no longer re-includes subset/low-confidence transcripts, while NDR = 0.999 matches previous results). Consider adding a test that runs isore.extendAnnotations() with NDR = 1 and asserts subset/low-confidence transcripts are still excluded (e.g., via metadata(... )$lowConfidenceTranscripts / counts).

SuiYue-2308 · 2026-04-16T03:21:59Z

Did you confirm that you can reproduce results with NDR=1 (previous version) and NDR=1 (including subset and low confidence read classes)? Which parameters need to be selected to reproduce this behaviour?

Hi @jonathangoeke, Yes, we can get the exact same result with previous NDR = 1 by setting opt.discovery = list(remove.subsetTx = FALSE, min.readCount = 1, min.sampleNumber = 0) to include the subset and low confidence.

… fix

cying111 · 2026-04-16T03:24:10Z

        combinedTranscripts$NDR <- calculateNDR(combinedTranscripts$maxTxScore, equal)
    }
-    combinedTranscripts$NDR[combinedTranscripts$maxTxScore==-1] <- 1
+    combinedTranscripts$NDR[combinedTranscripts$maxTxScore==-1] <- NA


I have a similar comment as copilot, if set NDR equal to NA, then following commands related to NDR need to be NA aware. Need to check on this, and make sure all following commands pass.

Other than this, different scenarios have been tested, and all works as expected:
testing for isore.extendAnnotations reported set of transcripts are expectedly corrected to have subsetTranscripts kept even if rm.subsetTx = TRUE and all filtering pre-set thresholds
testing for whole bambu shows that quantification results impacted, cause reporting set of transcripts changed at last

Also one thing is that this fix does not only fix the override of return.subsetTx and min.SampleNumber, and also other base filtering, maybe can consider to make this more explicit

SuiYue-2308 · 2026-04-16T06:03:40Z

Hi @cying111 @jonathangoeke

Following the comments, I updated this pull request.

Fusion mode: previously, the fusion mode will set NDR = 1, which will include everything. Now I add the following setting to the fusion mode to return the same result with fusion mode.

opt.discovery$remove.subsetTx <- FALSE
opt.discovery$min.readCount <- 1
opt.discovery$min.sampleNumber <- 0

I have tested the fusion mode on a real data set, the rowRanges and the assays are identical, the only thing is that this branch not keep the subset in the metadata(rowRanges(se)). If we want to keep subset need to fix the fusion mode in future.

includeRef
I have removed this parameter since we never use this.
the third part from @jonathangoeke generated by code agent is not that clear to me.

cying111 · 2026-04-16T09:14:54Z

Hi @SuiYue-2308 , I think you are right, min.readCount is needed only for the change, but the se object changed in the end,
I think NDR, maxTxScore, maxTxScore.noFit changes are expected because change from 1 to NA, would cause these differences
The rowRanges(se_v4)$metadata vs rowRanges(se_pr)$metadata are details below as well, which I think are expected as well.

Error: Expected se_v4 to equal se_pr.
Differences:
Attributes: < Component “rowRanges”: Attributes: < Component “elementMetadata”: Attributes: < Component “listData”: Component “NDR”: Mean relative difference: 0.1815954 > > >
Attributes: < Component “rowRanges”: Attributes: < Component “elementMetadata”: Attributes: < Component “listData”: Component “maxTxScore”: Mean relative difference: 1.181595 > > >
Attributes: < Component “rowRanges”: Attributes: < Component “elementMetadata”: Attributes: < Component “listData”: Component “maxTxScore.noFit”: Mean relative difference: 1.181595 > > >
Attributes: < Component “rowRanges”: Attributes: < Component “metadata”: Names: 1 string mismatch > >
Attributes: < Component “rowRanges”: Attributes: < Component “metadata”: Length mismatch: comparison on first 2 components > >
Attributes: < Component “rowRanges”: Attributes: < Component “metadata”: Component 2: Lengths: 17, 0

> rowRanges(se_v4)@metadata
$NDRthreshold
[1] 1

$subsetTranscripts
GRangesList object of length 17:
$BambuTx1
GRanges object with 3 ranges and 2 metadata columns:
      seqnames      ranges strand | exon_rank exon_endRank
         <Rle>   <IRanges>  <Rle> | <integer>    <integer>
  [1]        9 14513-16876      - |         3            1
  [2]        9 16969-17166      - |         2            2
  [3]        9 17344-17410      - |         1            3
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

$BambuTx3
GRanges object with 3 ranges and 2 metadata columns:
      seqnames      ranges strand | exon_rank exon_endRank
         <Rle>   <IRanges>  <Rle> | <integer>    <integer>
  [1]        9 14517-14940      - |         3            1
  [2]        9 15081-15149      - |         2            2
  [3]        9 15909-16064      - |         1            3
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

$BambuTx5
GRanges object with 2 ranges and 2 metadata columns:
      seqnames      ranges strand | exon_rank exon_endRank
         <Rle>   <IRanges>  <Rle> | <integer>    <integer>
  [1]        9 14521-14940      - |         2            1
  [2]        9 15081-15750      - |         1            2
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

...
<14 more elements>

$lowConfidenceTranscripts
GRangesList object of length 0:
<0 elements>

> rowRanges(se_pr)@metadata
$NDRthreshold
[1] 1

$lowConfidenceTranscripts
GRangesList object of length 0:
<0 elements>

SuiYue-2308 · 2026-04-17T01:29:23Z

Thanks for the code review!

I will merge this pull request now, and we will need to look into the metadata to decide whether we still want to keep the subset and low confidence transcript.

fix: adjust NDR threshold behavior. Previously, NDR = 1 would overrid…

25aa679

…e `remove.subset` and `min.sampleNumber`. Now, NDR = 0.999 and NDR = 1 produce identical results except for the NDR threshold itself.

SuiYue-2308 requested review from jonathangoeke and lingminhao April 15, 2026 07:20

SuiYue-2308 linked an issue Apr 15, 2026 that may be closed by this pull request

refine the NDR usage #560

Open

SuiYue-2308 requested a review from cying111 April 15, 2026 09:07

jonathangoeke reviewed Apr 16, 2026

View reviewed changes

jonathangoeke requested a review from Copilot April 16, 2026 01:34

Copilot started reviewing on behalf of jonathangoeke April 16, 2026 01:34 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

SuiYue-2308 added 2 commits April 16, 2026 11:28

Remove unused includeRef parameter from setNDR()

1ff42f0

Fix fusion mode to include subset and low-count transcripts after NDR…

dd43303

… fix

cying111 reviewed Apr 16, 2026

View reviewed changes

cying111 approved these changes Apr 16, 2026

View reviewed changes

jonathangoeke approved these changes Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: adjust NDR threshold behavior. Previously, NDR = 1 would overrid…#559

fix: adjust NDR threshold behavior. Previously, NDR = 1 would overrid…#559
SuiYue-2308 wants to merge 3 commits intodevel_pre_v4from
devel_pre_v4_fix_NDR_1

SuiYue-2308 commented Apr 15, 2026 •

edited

Loading

Uh oh!

cying111 commented Apr 15, 2026 •

edited

Loading

Uh oh!

jonathangoeke commented Apr 15, 2026

Uh oh!

jonathangoeke left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

SuiYue-2308 commented Apr 16, 2026 •

edited

Loading

Uh oh!

cying111 Apr 16, 2026

Uh oh!

cying111 Apr 16, 2026

Uh oh!

SuiYue-2308 commented Apr 16, 2026 •

edited

Loading

Uh oh!

cying111 commented Apr 16, 2026 •

edited

Loading

Uh oh!

SuiYue-2308 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		filterSet <- ((!is.na(rowDataCombined$NDR) & rowDataCombined$NDR <= NDR) \| rowDataCombined$readClassType == "equal:compatible")
		lowConfidenceTranscripts <- combindRowDataWithRanges(

Conversation

SuiYue-2308 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of Change

Description

Implementation Details

Impact of Changes

Checklist

Uh oh!

cying111 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Worth addressing before merge

Minor

Uh oh!

jonathangoeke commented Apr 15, 2026

Code review

Uh oh!

jonathangoeke left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

SuiYue-2308 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cying111 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

cying111 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

SuiYue-2308 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cying111 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SuiYue-2308 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SuiYue-2308 commented Apr 15, 2026 •

edited

Loading

cying111 commented Apr 15, 2026 •

edited

Loading

SuiYue-2308 commented Apr 16, 2026 •

edited

Loading

SuiYue-2308 commented Apr 16, 2026 •

edited

Loading

cying111 commented Apr 16, 2026 •

edited

Loading