Skip to content

Implement opt.rcAssignment#568

Open
SuiYue-2308 wants to merge 1 commit intodevel_pre_v4from
devel_pre_v4_rcAssignment
Open

Implement opt.rcAssignment#568
SuiYue-2308 wants to merge 1 commit intodevel_pre_v4from
devel_pre_v4_rcAssignment

Conversation

@SuiYue-2308
Copy link
Copy Markdown
Collaborator

@SuiYue-2308 SuiYue-2308 commented Apr 16, 2026

Reorganize the parameters that related to the read class - transcript assignment into opt.rcAssignment

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (please specify if the change breaks existing functionality)
    • Non breaking change (the feature doesn't change existing functionality)
    • Breaking change (the feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance optimization

Description

opt.rcAssignment is for summarizing the parameters that related to the read class - transcript assignment together. This can be used later to assign more read class to transcript, which is a relatively dependent part from transcript discovery. The default value of the parameters inside still need to be further evaluated.

Implementation Details

Extract 4 parameters (min.exonDistance, min.primarySecondaryDist, min.primarySecondaryDistStartEnd1/2) from opt.discovery into a new opt.rcAssignment parameter with its own setRcAssignmentParameters(). The default value of these parameters are still the same with previous.
I have tested this branch in 2 ways:

  • in default version, se objects from 2 branches are identical
  • change the same parameter from opt.discovery and opt.rcAssignment to the same value, and se objects from 2 branches are identical;
  • change the parameters in opt.rcAssignment into non-default setting, the transcript discovery and quantification results change, which means the parameters are working

Impact of Changes

  • Results are identical to the previous version.

Checklist

  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation (vignettes, man pages).
  • I have tested the code on a full dataset, and any differences have been described in the Impact of Changes section.

…DistStartEnd1/2 from opt.discovery into a new opt.rcAssignment parameter with its own setRcAssignmentParameters().
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new opt.rcAssignment option group to centralize parameters used in read-class ↔ transcript assignment, separating them from transcript discovery configuration.

Changes:

  • Added setRcAssignmentParameters() with defaults for exon-distance / primary-secondary distance thresholds.
  • Added opt.rcAssignment to bambu() and threaded rcAssignmentParameters into annotation extension and distance/assignment steps.
  • Updated distance table computation and annotation extension to read thresholds from rcAssignmentParameters instead of opt.discovery.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
R/bambu_utilityFunctions.R Adds setRcAssignmentParameters() and updates calculateDistTable() to use rcAssignmentParameters.
R/bambu.R Adds opt.rcAssignment, initializes rcAssignmentParameters, and passes it into downstream steps.
R/bambu-extendAnnotations.R Updates bambu.extendAnnotations() signature and uses rcAssignmentParameters for relevant thresholds.
R/bambu-assignDist.R Updates assignment entrypoint signature and forwards rcAssignmentParameters into distance table calculation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +255 to 256
calculateDistTable <- function(readClassList, annotations, isoreParameters, rcAssignmentParameters, verbose, returnDistTable){
readClassDist <- isore.estimateDistanceToAnnotations(readClassList, annotations,
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calculateDistTable() still accepts isoreParameters but it is no longer used in the function body after switching to rcAssignmentParameters. Consider removing the unused parameter (and updating the one call site), or using it only as a backward-compat fallback to populate rcAssignmentParameters when needed.

Copilot uses AI. Check for mistakes.
Comment thread R/bambu.R
Comment on lines +186 to 187
rcAssignmentParameters <- setRcAssignmentParameters(rcAssignmentParameters = opt.rcAssignment)
emParameters <- setEmParameters(emParameters = opt.em)
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing testthat snapshots/expected objects for bambu() behavior (e.g. distTable + assignment). Since opt.rcAssignment is a new user-facing configuration point and legacy opt.discovery keys are being moved, adding a small regression test covering (1) opt.rcAssignment overrides and (2) legacy opt.discovery keys being honored/migrated would help prevent silent behavior changes.

Copilot uses AI. Check for mistakes.
Comment thread R/bambu-assignDist.R
Comment on lines +5 to +9
assignReadClasstoTranscripts <- function(readClassList, annotations, isoreParameters, rcAssignmentParameters,
verbose, sampleMetadata, demultiplexed,
returnDistTable = FALSE, trackReads = TRUE) {
if (is.character(readClassList)) readClassList <- readRDS(file = readClassList)
metadata(readClassList)$readClassDist <- calculateDistTable(readClassList, annotations, isoreParameters, verbose, returnDistTable)
metadata(readClassList)$readClassDist <- calculateDistTable(readClassList, annotations, isoreParameters, rcAssignmentParameters, verbose, returnDistTable)
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assignReadClasstoTranscripts() no longer uses isoreParameters except to forward it into calculateDistTable(), which also doesn’t use it. After cleaning up calculateDistTable(), consider dropping isoreParameters from this function’s signature/call sites as well to avoid confusion about which option list controls read-class assignment.

Copilot uses AI. Check for mistakes.
Comment on lines +6 to 7
isoreParameters, rcAssignmentParameters, stranded, bpParameters, fusionMode = FALSE, verbose = FALSE) {
start.ptm_all <- proc.time()
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bambu.extendAnnotations() now requires rcAssignmentParameters, but there are existing internal call sites still using the old positional argument order (e.g. isore.extendAnnotations.clusters() in R/bambu-extendAnnotations-utilityExtend.R calls bambu.extendAnnotations(..., isoreParameters, stranded, bpParameters, ...)). With the new signature, stranded will be passed as rcAssignmentParameters, causing runtime failures when indexing rcAssignmentParameters[["min.exonDistance"]]. Update those call sites (or provide a default/migration inside bambu.extendAnnotations) to pass a real rcAssignmentParameters list.

Suggested change
isoreParameters, rcAssignmentParameters, stranded, bpParameters, fusionMode = FALSE, verbose = FALSE) {
start.ptm_all <- proc.time()
isoreParameters, rcAssignmentParameters = NULL, stranded, bpParameters,
fusionMode = FALSE, verbose = FALSE) {
start.ptm_all <- proc.time()
## Backward-compatibility for older positional call sites that used:
## bambu.extendAnnotations(..., isoreParameters, stranded, bpParameters,
## fusionMode = FALSE, verbose = FALSE)
if (is.logical(rcAssignmentParameters) && is.list(stranded)) {
oldStranded <- rcAssignmentParameters
oldBpParameters <- stranded
oldFusionMode <- bpParameters
oldVerbose <- fusionMode
stranded <- oldStranded
bpParameters <- oldBpParameters
if (is.logical(oldFusionMode)) {
fusionMode <- oldFusionMode
verbose <- oldVerbose
}
rcAssignmentParameters <- NULL
}
if (is.null(rcAssignmentParameters) || !is.list(rcAssignmentParameters)) {
rcAssignmentParameters <- list()
}
if (is.null(rcAssignmentParameters[["min.exonDistance"]])) {
rcAssignmentParameters[["min.exonDistance"]] <- NULL
}
if (is.null(rcAssignmentParameters[["min.primarySecondaryDist"]])) {
rcAssignmentParameters[["min.primarySecondaryDist"]] <- NULL
}
if (is.null(rcAssignmentParameters[["min.primarySecondaryDistStartEnd1"]])) {
rcAssignmentParameters[["min.primarySecondaryDistStartEnd1"]] <- NULL
}

Copilot uses AI. Check for mistakes.
Comment thread R/bambu.R
@@ -183,7 +183,7 @@
isoreParameters <- setIsoreParameters(isoreParameters = opt.discovery)
#below line is to be compatible with earlier version of running bambu
if(!is.null(isoreParameters$max.txNDR)) NDR = isoreParameters$max.txNDR
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new opt.rcAssignment split will currently treat legacy opt.discovery values for min.exonDistance, min.primarySecondaryDist, and min.primarySecondaryDistStartEnd1/2 as “parameters that do not exist” (see updateParameters()), and those values are also ignored by downstream code now that these thresholds come from rcAssignmentParameters. To keep this change truly non-breaking, consider migrating these keys from opt.discovery into rcAssignmentParameters when opt.rcAssignment is NULL/doesn’t specify them (and optionally emit a deprecation warning).

Suggested change
if(!is.null(isoreParameters$max.txNDR)) NDR = isoreParameters$max.txNDR
if(!is.null(isoreParameters$max.txNDR)) NDR = isoreParameters$max.txNDR
legacyRcAssignmentKeys <- c("min.exonDistance",
"min.primarySecondaryDist",
"min.primarySecondaryDistStartEnd1",
"min.primarySecondaryDistStartEnd2")
migratedLegacyRcAssignmentKeys <- character(0)
if(!is.null(opt.discovery)){
if(is.null(opt.rcAssignment)) opt.rcAssignment <- list()
for(paramName in legacyRcAssignmentKeys){
legacyValue <- opt.discovery[[paramName]]
rcAssignmentValue <- opt.rcAssignment[[paramName]]
if(!is.null(legacyValue) && is.null(rcAssignmentValue)){
opt.rcAssignment[[paramName]] <- legacyValue
migratedLegacyRcAssignmentKeys <- c(migratedLegacyRcAssignmentKeys, paramName)
}
}
}
if(length(migratedLegacyRcAssignmentKeys) > 0){
warning(paste0(
"The following legacy opt.discovery parameter(s) are deprecated and have been migrated to opt.rcAssignment: ",
paste(migratedLegacyRcAssignmentKeys, collapse = ", "),
". Please pass them via opt.rcAssignment instead."
))
}

Copilot uses AI. Check for mistakes.
Comment thread R/bambu.R
Comment on lines 143 to 145
bambu <- function(reads, annotations = NULL, genome = NULL, NDR = NULL,
mode = NULL, opt.discovery = NULL, opt.em = NULL, rcOutDir = NULL, discovery = TRUE,
mode = NULL, opt.discovery = NULL, opt.rcAssignment = NULL, opt.em = NULL, rcOutDir = NULL, discovery = TRUE,
assignDist = TRUE, quant = TRUE, stranded = FALSE, ncore = 1, yieldSize = NULL,
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bambu() gained a new public argument opt.rcAssignment, but the roxygen docs for opt.discovery still document the moved parameters (min.exonDistance, min.primarySecondaryDist, min.primarySecondaryDistStartEnd1/2) and there is no @param opt.rcAssignment section. This will confuse users and typically triggers R CMD check warnings about undocumented arguments. Please update the roxygen parameter docs to reflect the new option structure.

Copilot uses AI. Check for mistakes.
@jonathangoeke
Copy link
Copy Markdown
Member

Code review

Found 4 issues:

  1. isore.extendAnnotations.clusters not updated with rcAssignmentParameters (positional argument mismatch bug)

bambu.extendAnnotations() now expects rcAssignmentParameters between isoreParameters and stranded, but both call sites inside isore.extendAnnotations.clusters() pass the old argument list without it. This causes stranded to be positionally matched to rcAssignmentParameters, bpParameters to stranded, etc. -- producing runtime errors or silent wrong behavior when the clusters code path is taken. The call from bambu() to isore.extendAnnotations.clusters() also does not pass rcAssignmentParameters.

Updated signature:

bambu.extendAnnotations <- function(readClassList, annotations, NDR,
isoreParameters, rcAssignmentParameters, stranded, bpParameters, fusionMode = FALSE, verbose = FALSE) {
start.ptm_all <- proc.time()

Stale call sites:

rcfs.clusters[[names(clusters)[i]]] <- rcf.filt
annotations.clusters[[names(clusters)[i]]] <- bambu.extendAnnotations(list(rcf.filt), annotations, NDR,
isoreParameters, stranded, bpParameters, fusionMode, verbose)
}
if(length(rcfs.clusters)>0){
print("--- Merging all individual clusters ---")
annotations.clusters[["merged"]] <- bambu.extendAnnotations(rcfs.clusters, annotations, NDR,
isoreParameters, stranded, bpParameters, fusionMode, verbose)
}

Caller missing param:

bambu/R/bambu.R

Lines 230 to 234 in dac871e

if(!is.null(clusters)){
annotations.clusters <- isore.extendAnnotations.clusters(readClassList,
annotations, clusters, NDR,
isoreParameters, stranded, bpParameters, fusionMode, verbose = FALSE)
metadata(extendedAnnotations)$clusters <- annotations.clusters

  1. Silent breaking change for users passing moved parameters via opt.discovery

Users who currently set min.exonDistance, min.primarySecondaryDist, min.primarySecondaryDistStartEnd1, or min.primarySecondaryDistStartEnd2 via opt.discovery will have their values silently ignored. updateParameters() emits only a message (not an error), and no migration or forwarding logic exists. Consider detecting these keys in opt.discovery and either forwarding them to opt.rcAssignment with a deprecation warning or raising an explicit error.

bambu/R/bambu.R

Lines 185 to 187 in dac871e

if(!is.null(isoreParameters$max.txNDR)) NDR = isoreParameters$max.txNDR
rcAssignmentParameters <- setRcAssignmentParameters(rcAssignmentParameters = opt.rcAssignment)
emParameters <- setEmParameters(emParameters = opt.em)

  1. Missing @param opt.rcAssignment roxygen documentation

The new opt.rcAssignment parameter (line 144) has no @param entry. The @param opt.discovery block (lines 19-64) still documents min.exonDistance, min.primarySecondaryDist, min.primarySecondaryDistStartEnd1, and min.primarySecondaryDistStartEnd2 as belonging to opt.discovery. This will cause R CMD check warnings about undocumented parameters and mislead users.

bambu/R/bambu.R

Lines 19 to 64 in dac871e

#' @param opt.discovery A list of controlling parameters for isoform
#' reconstruction process:
#' \describe{
#' \item{remove.subsetTx}{indicating whether filter to remove read classes
#' which are a subset of known transcripts(), defaults to TRUE}
#' \item{min.readCount}{specifying minimum read count to consider a read
#' class valid in a sample, defaults to 2}
#' \item{min.readFractionByGene}{specifying minimum relative read count per
#' gene, highly expressed genes will have many high read count low relative
#' abundance transcripts that can be filtered, defaults to 0.05}
#' \item{min.sampleNumber}{specifying minimum sample number with minimum read
#' count, defaults to 1}
#' \item{min.exonDistance}{specifying minum distance to known transcript
#' to be considered valid as new, defaults to 35bp}
#' \item{min.exonOverlap}{specifying minimum number of bases shared with
#' annotation to be assigned to the same gene id, defaults to 10bp}
#' \item{min.primarySecondaryDist}{specifying the minimum number of distance
#' threshold, defaults to 5bp}
#' \item{min.primarySecondaryDistStartEnd1}{specifying the minimum number
#' of distance threshold, used for extending annotation, defaults to 5bp}
#' \item{min.primarySecondaryDistStartEnd2}{specifying the minimum number
#' of distance threshold, used for estimating distance to annotation,
#' defaults to 5bp}
#' \item{min.txScore.multiExon}{specifying the minimum transcript level
#' threshold for multi-exon transcripts during sample combining,
#' defaults to 0}
#' \item{min.txScore.singleExon}{specifying the minimum transcript level
#' threshold for single-exon transcripts during sample combining, defaults
#' to 1}
#' \item{fitReadClassModel}{ A boolean specifying if Bambu should attempt
#' to train a transcript discovery model for all samples. Defaults to TRUE}
#' \item{defaultModels}{A model object obtained by code{\link{trainBambu}}
#' or when returnModel is TRUE}
#' \item{returnModel}{A boolean specifying if the trained model is output
#' with the readclass files. Defaults to FALSE}
#' \item{baselineFDR}{A number between 0 - 1, specifying the false discovery
#' rate used during NDR recomendation. Defaults to 0.1}
#' \item{min.readFractionByEqClass}{indicating the minimum relative read
#' count of a subset transcript compared to all superset transcripts
#' (ie the relative read count within the minimum equivalent class). This
#' filter is applied on the set of annotations across all samples using the
#' total read count, this is not a per-sample filter. Please use with
#' caution. defaults to 0}
#' \item{prefix}{specifying prefix for new gene Ids (genePrefix.number),
#' defaults to "Bambu"}
#' }

  1. isoreParameters is now unused in calculateDistTable()

After the refactoring, calculateDistTable() still accepts isoreParameters but never reads from it in the function body -- all parameter access now goes through rcAssignmentParameters. This dead parameter propagates up through assignReadClasstoTranscripts() as well.

#' Calculate the dist table used for Bambu Quantification
calculateDistTable <- function(readClassList, annotations, isoreParameters, rcAssignmentParameters, verbose, returnDistTable){
readClassDist <- isore.estimateDistanceToAnnotations(readClassList, annotations,
min.exonDistance = rcAssignmentParameters[["min.exonDistance"]],
min.primarySecondaryDist = rcAssignmentParameters[['min.primarySecondaryDist']],
min.primarySecondaryDistStartEnd = rcAssignmentParameters[['min.primarySecondaryDistStartEnd2']],
verbose = verbose)
metadata(readClassDist)$distTable <- modifyIncompatibleAssignment(metadata(readClassDist)$distTable)

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Copy link
Copy Markdown
Member

@jonathangoeke jonathangoeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I suggest to change this to define parameters for discovery step and read class assignment step separately, even though that means the same function can be called in different modules with different arguments
  • add documentation for bambu in code, add readme updates later (document required changes)
  • add depcrecation warning in bambu()

Comment thread R/bambu-assignDist.R
#' @import data.table
#' @noRd
assignReadClasstoTranscripts <- function(readClassList, annotations, isoreParameters,
assignReadClasstoTranscripts <- function(readClassList, annotations, isoreParameters, rcAssignmentParameters,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove isoreParameters

Comment thread R/bambu.R
@@ -141,7 +141,7 @@
#' genome = fa.file, discovery = TRUE, quant = TRUE)
#' @export
bambu <- function(reads, annotations = NULL, genome = NULL, NDR = NULL,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update in-code documentation, general documation can be updated later, but create issue/draft issue to list all docuemntation changes needed to be updated as a consequence of this PR, link PR to draft isssue

Comment thread R/bambu.R
message("--- Start extending annotations ---")
extendedAnnotations <- bambu.extendAnnotations(readClassList, annotations, NDR,
isoreParameters, stranded, bpParameters, fusionMode, verbose)
isoreParameters, rcAssignmentParameters, stranded, bpParameters, fusionMode, verbose)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we split arguments to have only one for transcript discovery (isoreParameter) and one for reacClass assignment (rcAssignmentParameters)? They might call the same function, but can be different values in the different parts then? just default is the same is ok?

Comment thread R/bambu.R
isoreParameters = isoreParameters,
verbose = verbose,
annotations = annotations,
isoreParameters = isoreParameters,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can isoreparameters be removed here? so we match parameters to distinct bambu modules?

Comment on lines +43 to +47
rcAssignmentParameters.default <- list(
min.exonDistance = 35,
min.primarySecondaryDist = 5,
min.primarySecondaryDistStartEnd1 = 5, # for extending annotations
min.primarySecondaryDistStartEnd2 = 5) # for read class assignment
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe OK to have parameters twice if they call the same function, one for discovery one for assigning reads?


#' Calculate the dist table used for Bambu Quantification
calculateDistTable <- function(readClassList, annotations, isoreParameters, verbose, returnDistTable){
calculateDistTable <- function(readClassList, annotations, isoreParameters, rcAssignmentParameters, verbose, returnDistTable){
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only call rcAssignmentParameters. this funciton is only called in assignDist

Comment thread R/bambu.R
returnDistTable <- TRUE
}
}
if(lowMemory)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add a code part here for deprecation warnings. include a warning for the change in arguments here, and by when this will not be supported anymore

min.exonDistance = 35,
min.primarySecondaryDist = 5,
min.primarySecondaryDistStartEnd1 = 5, # for extending annotations
min.primarySecondaryDistStartEnd2 = 5) # for read class assignment
Copy link
Copy Markdown
Collaborator

@lingminhao lingminhao Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min.primarySecondaryDistStartEnd1 & min.primarySecondaryDistStartEnd2 always very confusing to me. What does 1 and 2 mean, there's no context here

Comment thread R/bambu-assignDist.R
#' @import data.table
#' @noRd
assignReadClasstoTranscripts <- function(readClassList, annotations, isoreParameters,
assignReadClasstoTranscripts <- function(readClassList, annotations, isoreParameters, rcAssignmentParameters,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am correct, isoreParameters refers to the parameters related to opt.discovery. But the name is also not very clear (something like rcAssignmentParameters is much clearer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

implement opt.rcAssignment which integrates the parameters related to assigning read classes to transcripts

4 participants