fix: adjust NDR threshold behavior. Previously, NDR = 1 would overrid…#559
fix: adjust NDR threshold behavior. Previously, NDR = 1 would overrid…#559SuiYue-2308 wants to merge 3 commits intodevel_pre_v4from
Conversation
…e `remove.subset` and `min.sampleNumber`. Now, NDR = 0.999 and NDR = 1 produce identical results except for the NDR threshold itself.
|
Adding in my PR review comment here (also with assistance of claude code), my summary is issue 1, 3 worth noting, issue 4 is a comment adding, maybe good to have, but also fine to not have Prompt used: Worth addressing before merge
Minor
|
Code reviewTop 3 issues by confidence score (lower threshold than usual):
bambu/R/bambu-extendAnnotations-utilityExtend.R Lines 855 to 866 in 25aa679
Lines 161 to 167 in 25aa679
bambu/R/bambu-extendAnnotations-utilityExtend.R Lines 223 to 229 in 25aa679 Only #1 would have cleared the default score threshold (80); #2 and #3 are included at the user's request for a lower-threshold review. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
jonathangoeke
left a comment
There was a problem hiding this comment.
Did you confirm that you can reproduce results with NDR=1 (previous version) and NDR=1 (including subset and low confidence read classes)? Which parameters need to be selected to reproduce this behaviour?
There was a problem hiding this comment.
Pull request overview
Adjusts how Novel Discovery Rate (NDR) thresholding behaves during annotation extension so that NDR = 1 no longer unintentionally re-includes subset / low-confidence transcripts (e.g., those failing remove.subset or min.sampleNumber filters).
Changes:
- Treat transcripts marked as filtered (
maxTxScore == -1) asNDR = NAinstead ofNDR = 1. - Exclude
NDR = NAtranscripts from NDR-threshold filtering while still retaining"equal:compatible"classes.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| combinedTranscripts$NDR <- calculateNDR(combinedTranscripts$maxTxScore, equal) | ||
| } | ||
| combinedTranscripts$NDR[combinedTranscripts$maxTxScore==-1] <- 1 | ||
| combinedTranscripts$NDR[combinedTranscripts$maxTxScore==-1] <- NA |
There was a problem hiding this comment.
Changing filtered transcripts' NDR from 1 to NA means downstream thresholding logic must be NA-safe. In setNDR() (same file), the includeRef = FALSE branch builds toRemove/toAdd without !is.na(...), so transcripts with NDR = NA in metadata(extendedAnnotations)$lowConfidenceTranscripts can yield NA logical indices and break or mis-subset GRanges/GRangesList objects. Update those conditions to treat NA as FALSE (mirror the includeRef = TRUE branch by adding !is.na(mcols(...)$NDR) guards).
| filterSet <- ((!is.na(rowDataCombined$NDR) & rowDataCombined$NDR <= NDR) | rowDataCombined$readClassType == "equal:compatible") | ||
| lowConfidenceTranscripts <- combindRowDataWithRanges( |
There was a problem hiding this comment.
This change alters the edge-case semantics for NDR = 1 vs NDR = 0.999 by excluding transcripts with NDR = NA (subset / min.sampleNumber-filtered). There isn’t currently a regression test covering this behavior (i.e., that NDR = 1 no longer re-includes subset/low-confidence transcripts, while NDR = 0.999 matches previous results). Consider adding a test that runs isore.extendAnnotations() with NDR = 1 and asserts subset/low-confidence transcripts are still excluded (e.g., via metadata(... )$lowConfidenceTranscripts / counts).
Hi @jonathangoeke, Yes, we can get the exact same result with previous NDR = 1 by setting |
| combinedTranscripts$NDR <- calculateNDR(combinedTranscripts$maxTxScore, equal) | ||
| } | ||
| combinedTranscripts$NDR[combinedTranscripts$maxTxScore==-1] <- 1 | ||
| combinedTranscripts$NDR[combinedTranscripts$maxTxScore==-1] <- NA |
There was a problem hiding this comment.
I have a similar comment as copilot, if set NDR equal to NA, then following commands related to NDR need to be NA aware. Need to check on this, and make sure all following commands pass.
There was a problem hiding this comment.
Other than this, different scenarios have been tested, and all works as expected:
testing for isore.extendAnnotations reported set of transcripts are expectedly corrected to have subsetTranscripts kept even if rm.subsetTx = TRUE and all filtering pre-set thresholds
testing for whole bambu shows that quantification results impacted, cause reporting set of transcripts changed at last
Also one thing is that this fix does not only fix the override of return.subsetTx and min.SampleNumber, and also other base filtering, maybe can consider to make this more explicit
|
Following the comments, I updated this pull request.
I have tested the fusion mode on a real data set, the rowRanges and the assays are identical, the only thing is that this branch not keep the subset in the metadata(rowRanges(se)). If we want to keep subset need to fix the fusion mode in future.
|
|
Hi @SuiYue-2308 , I think you are right, min.readCount is needed only for the change, but the se object changed in the end, Error: Expected |
|
Thanks for the code review! I will merge this pull request now, and we will need to look into the metadata to decide whether we still want to keep the subset and low confidence transcript. |
fix: adjust NDR threshold behavior
Type of Change
Description
Previously, NDR = 1 would override
remove.subsetandmin.sampleNumber. Now, NDR = 0.999 and NDR = 1 produce identical results except for the NDR threshold itself.Implementation Details
Before, if the read class is a subset or low confidence transcript (can not pass the min.sampleNumber), it's NDR will be assigned to 1, which will be further reported when setting NDR = 1. Now, I set their NDR to NA, and add another check step before report novel transcript (!is.na(NDR)) to make sure the subset and low confidence are not included.
Impact of Changes
Briefly describe the impact of the implemented changes. For example:
Checklist