Skip to content

Add TrackingValidation workflow for tracking performance studies#4

Open
ArinaPon wants to merge 18 commits intokey4hep:mainfrom
ArinaPon:pr-tracking-validation
Open

Add TrackingValidation workflow for tracking performance studies#4
ArinaPon wants to merge 18 commits intokey4hep:mainfrom
ArinaPon:pr-tracking-validation

Conversation

@ArinaPon
Copy link
Copy Markdown
Collaborator

BEGINRELEASENOTES

  • Add TrackingValidation consumer for track-finder and track-fitter performance studies
  • Add finder-level validation through particle-to-track and track-to-particle association trees
  • Provide fitter-level validation with residuals computed with respect to MC truth
  • Support optional comparison to perfectly associated fitted tracks
  • Add summary plots written to the output ROOT file (tracking efficiency, d0 resolution, momentum resolution, transverse-momentum resolution)
  • Support configurable validation modes (full, finder-only, fitter-only)
  • Support configurable finder-efficiency definitions and purity threshold
  • Add helper and plotting utilities used by the validation workflow
  • Add test steering and end-to-end test for reconstruction and validation
  • Add documentation in the README
    ENDRELEASENOTES

Hello ! This PR introduces the TrackingValidation workflow for studying tracking performance in a flexible and reusable way.

The implementation includes:

  • validation of track-finder output through truth-based hit associations,
  • validation of fitted tracks against MC truth,
  • optional comparison with perfect tracking, i.e. tracks fitted using the correct simhits from the particle truth information,
  • summary performance plots stored directly in the output ROOT file.

The test setup was also designed to run the full chain, with configurable options to enable or disable simulation, digitization, track finding, track fitting, perfect tracking, validation, and DCH usage.

A follow-up improvement that I plan to add is a more robust treatment of the residual distributions, using double-Gaussian approach for a more stable description of core and tail behaviour.

@andread3vita
Copy link
Copy Markdown

andread3vita commented Mar 20, 2026

Thank you @ArinaPon for this PR! Just a comment on the tests: @BrieucF, should we also include the CI tests here? Of course they will fail because we don't have the fitter and perfect track finder yet, but as soon as the other PR is accepted, they should succeed

@BrieucF
Copy link
Copy Markdown
Collaborator

BrieucF commented Mar 23, 2026

Hi! I would indeed prefer that we add the CI tests directly. And that we get the track fitting merged before to merge this one.

@BrieucF
Copy link
Copy Markdown
Collaborator

BrieucF commented Mar 23, 2026

We also need to add the truth track finding to k4RecTracker

Copy link
Copy Markdown
Collaborator

@BrieucF BrieucF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this! I had a first quick look and made already some comments. I propose to wait to have everything needed in the nightlies before to do a more thorough review. Thanks again!

Comment thread TrackingPerformance/README.md Outdated
Comment thread TrackingPerformance/README.md Outdated
Comment thread TrackingPerformance/test/runTrackingValidation.py
MODEL_FILE="${1:-}"

# Optional shell-level control of the simulation step
RUN_SIM="${RUN_SIM:-1}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the same comment applies to the other variables) Shall we ass some more "scope" to these environment variables? I mean that similar names might be used from somewhere else and we might end-up with conflicts. Something like TrackingPerf_RUN_SIM?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the custom shell override variables RUN_SIM and INPUT_FILE_OVERRIDE to TRACKINGPERF_RUN_SIM and TRACKINGPERF_INPUT_FILE_OVERRIDE to avoid possible conflicts, thank you.

Comment thread TrackingPerformance/test/testTrackingValidation.sh
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To minimize maintenance, it would be nice to take this file directly from $FCCCONFIG/FullSim/IDEA/IDEA_o1_v03/run_digi_reco.py and apply there the needed modifications

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I understand the maintenance point. I’d like to discuss this part with Andrea first and then come back with the best way to align it with the standard steering.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will also ensure we are aligned with e.g. the collection names (now Fitted_tracks_with_filtered_hits should be used)

Copy link
Copy Markdown
Member

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prompted by your presentation today, I had a quick look and I have left a few comments below.

It's mostly minor things, but I think the considerations about the inputs might be worth thinking about also in light of Davids comments at the meeting.

Some others are probably mainly on a "good to know" for now basis, but should probably be fixed at some point (e.g. the threading issues the current code has).

Comment on lines +52 to +53
/// Build a unique integer key from a podio object identifier
uint64_t oidKey(const podio::ObjectID& id);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

podio::ObjectID is hashable (since podio v01-03) and it should be possible to directly use it in e.g. std::unordered_map as a key type. There should be no need for constructing a uint64_t for this purpose.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! That makes much more sense. I updated the code to use podio::ObjectID directly as the map key and removed the oidKey helper.

float refX, float refY, float refZ);

/// Retrieve the track state stored at the interaction point
bool getAtIPState(const edm4hep::Track& trk, edm4hep::TrackState& out);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could return a std::optional instead of using an in-out parameter via reference. I have also opened key4hep/EDM4hep#493 which should make this obsolete.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I agree. I updated getAtIPState to return std::optional and adapted the call sites accordingly.

Comment on lines +139 to +140
const std::vector<const edm4hep::TrackerHitSimTrackerHitLinkCollection*>& planarLinksVec,
const std::vector<const edm4hep::TrackerHitSimTrackerHitLinkCollection*>& dchLinksVec,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at how this is handled below I don't see the need for two dedicated inputs here. Both of them go through the same loop and populate the same maps. Since the steering already has a --useDCH flag in any case already, just building the combined vector in python and then passing only one should work the same, but use less code below.

If one wants to take this one step further, one could in principle even make this a single collection argument only since there is a CollectionMerger which would allow to merge the collections up-front (again in python). The approach of using ObjectIDs to identify the elements will still work unchanged. However, that will only make sense if there is a canonical set of merged collections, and I don't know if that is the case here.

const edm4hep::TrackCollection& fittedTracks,
const std::vector<const edm4hep::TrackCollection*>& perfectFittedTracksVec) const override {

const int event = m_evt++;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need / want to link back this event to the event number from the input file? In that case you would probably have to add an EventHeaderCollection input and take the event number from there. If this is ever run on multiple threads, the current implementation will essentially make a random event assignment.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. Right now m_evt++ is only a local processing counter, not the real input event number. I agree that for a more robust implementation we should use EventHeaderCollection and read the event number from there, especially for multi-threaded running. I’ll keep this as a future improvement.

Comment on lines +731 to +737
mutable AssocTree m_finder_p2t;
mutable AssocTree m_finder_t2p;
mutable AssocTree m_perf_p2t;
mutable AssocTree m_perf_t2p;

mutable FitterTree m_fit_vs_mc;
mutable FitterTree m_fit_vs_perfect;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These do not look like they are synchronized in any way. Trying to run this algorithm on multiple threads will probably lead to a crash (or potentially just silently written garbage data). I am not sure marking the algorithm as non re-entrant will fix the problem because Gaudi might than just create two instances which both try to write to the same file.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, it's a very good point. Right now this is meant to run in serial mode, so it’s not thread-safe. I agree this should be improved in the future, so i will start working on it.

Comment on lines +35 to +37
float safeAtan2(float y, float x) {
return std::atan2(y, x);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't do any safety checks on top of std::atan2 (as the docstring claims)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, you’re right. The helper did not add any safety checks, so I removed it and call std::atan2 directly.

Comment on lines +26 to +37
parser.add_argument("--runDigi", type=int, default=1,
help="0=skip digitization, 1=run digitization")
parser.add_argument("--runFinder", type=int, default=1,
help="0=skip track finder, 1=run track finder")
parser.add_argument("--runFitter", type=int, default=1,
help="0=skip reco fitter, 1=run reco fitter")
parser.add_argument("--runPerfectTracking", type=int, default=1,
help="0=skip perfect tracking/perfect fitter, 1=run them")
parser.add_argument("--runValidation", type=int, default=1,
help="0=skip validation, 1=run TrackingValidation")
parser.add_argument("--useDCH", type=int, default=1,
help="0=disable DCH collections, 1=use DCH collections")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could potentially be

parser.add_argument("--runXXX", action="store_true", default=False, help="Do the XXX")

The main disadvantage of that is that explicitly specifying the other value becomes harder (would need more python code).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that would make sense of course, but I kept the explicit 0/1 style here because this steering is also called from shell scripts where passing both enabled and disabled states explicitly is more convenient I guess.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Using a proper bool type would still be more idiomatic and would simplify the logic later on by removing the need for explicit numeric comparisons (e.g., if args.runDigi: instead of if args.runDigi == 1:)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May not be the most elegant, but I used this in the past:

def str2bool(v):
    if v.lower() in ('yes', 'true', 't', 'y', '1', 'True'):
        return True
    elif v.lower() in ('no', 'false', 'f', 'n', '0', 'False'):
        return False
    else:
        raise argparse.ArgumentTypeError('Boolean value expected.')
     
parser.add_argument('-printlimits', dest='printlimits', type=str2bool, default="False", help='Print b2j3 and b2j4 to check run2 combination')

Comment on lines +40 to +43
parser.add_argument("--mode", type=int, default=0,
help="Validation mode: 0=Full, 1=FinderOnly, 2=FitterOnly")
parser.add_argument("--doPerfectFit", type=int, default=1,
help="0=do not fill fitter_vs_perfect, 1=fill fitter_vs_perfect")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a choices argument here would make it harder to provide non-supported values.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I added choices to all integer options with a fixed set of allowed values.

@BrieucF
Copy link
Copy Markdown
Collaborator

BrieucF commented Apr 30, 2026

Hi Arina, now that key4hep/k4RecTracker#77 has been merged and propagated to the nightlies, we can resume this PR!

parser.add_argument("--validationFile", default="validation.root",
help="Output ROOT file written by TrackingValidation")

parser.add_argument("--geom",
Copy link
Copy Markdown

@Victor-Schwan Victor-Schwan May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the PR uses --geom for the detector geometry XML file. However, other parts of the codebase, such as CLD and ILD reconstruction, use --compactFile.

As discussed in key4hep/k4FWCore#407, we could consider standardizing on commonly used arguments. In fact, I often use aliases like export CIF1="--compactFile k4gDir/FCCee/ILD_FCCee/compact/ILD_FCCee_v01/ILD_FCCee_v01.xml" to easily execute things for different detector models.

If --geom is not widely used by major projects, I suggest dropping it in favor of --compactFile to improve consistency and allow for reuse of existing aliases or configurations.

fi
fi

EOS_DIR="/eos/user/a/aponomar/Tracking/validation/${THETA_DEG}deg"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest replacing the hardcoded directory with a more flexible approach, such as using environment variables or built-in shell variables. For example, you can use $USER to get the current username:

EOS_DIR="/eos/user/${USER:0:1}/${USER}/Tracking/validation/${THETA_DEG}deg"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants