Add TrackingValidation workflow for tracking performance studies#4
Add TrackingValidation workflow for tracking performance studies#4ArinaPon wants to merge 18 commits intokey4hep:mainfrom
Conversation
|
Hi! I would indeed prefer that we add the CI tests directly. And that we get the track fitting merged before to merge this one. |
|
We also need to add the truth track finding to k4RecTracker |
BrieucF
left a comment
There was a problem hiding this comment.
Thank you very much for this! I had a first quick look and made already some comments. I propose to wait to have everything needed in the nightlies before to do a more thorough review. Thanks again!
| MODEL_FILE="${1:-}" | ||
|
|
||
| # Optional shell-level control of the simulation step | ||
| RUN_SIM="${RUN_SIM:-1}" |
There was a problem hiding this comment.
(the same comment applies to the other variables) Shall we ass some more "scope" to these environment variables? I mean that similar names might be used from somewhere else and we might end-up with conflicts. Something like TrackingPerf_RUN_SIM?
There was a problem hiding this comment.
I updated the custom shell override variables RUN_SIM and INPUT_FILE_OVERRIDE to TRACKINGPERF_RUN_SIM and TRACKINGPERF_INPUT_FILE_OVERRIDE to avoid possible conflicts, thank you.
There was a problem hiding this comment.
To minimize maintenance, it would be nice to take this file directly from $FCCCONFIG/FullSim/IDEA/IDEA_o1_v03/run_digi_reco.py and apply there the needed modifications
There was a problem hiding this comment.
Thanks, I understand the maintenance point. I’d like to discuss this part with Andrea first and then come back with the best way to align it with the standard steering.
There was a problem hiding this comment.
It will also ensure we are aligned with e.g. the collection names (now Fitted_tracks_with_filtered_hits should be used)
tmadlener
left a comment
There was a problem hiding this comment.
Prompted by your presentation today, I had a quick look and I have left a few comments below.
It's mostly minor things, but I think the considerations about the inputs might be worth thinking about also in light of Davids comments at the meeting.
Some others are probably mainly on a "good to know" for now basis, but should probably be fixed at some point (e.g. the threading issues the current code has).
| /// Build a unique integer key from a podio object identifier | ||
| uint64_t oidKey(const podio::ObjectID& id); |
There was a problem hiding this comment.
podio::ObjectID is hashable (since podio v01-03) and it should be possible to directly use it in e.g. std::unordered_map as a key type. There should be no need for constructing a uint64_t for this purpose.
There was a problem hiding this comment.
Thank you! That makes much more sense. I updated the code to use podio::ObjectID directly as the map key and removed the oidKey helper.
| float refX, float refY, float refZ); | ||
|
|
||
| /// Retrieve the track state stored at the interaction point | ||
| bool getAtIPState(const edm4hep::Track& trk, edm4hep::TrackState& out); |
There was a problem hiding this comment.
This could return a std::optional instead of using an in-out parameter via reference. I have also opened key4hep/EDM4hep#493 which should make this obsolete.
There was a problem hiding this comment.
Thanks, I agree. I updated getAtIPState to return std::optional and adapted the call sites accordingly.
| const std::vector<const edm4hep::TrackerHitSimTrackerHitLinkCollection*>& planarLinksVec, | ||
| const std::vector<const edm4hep::TrackerHitSimTrackerHitLinkCollection*>& dchLinksVec, |
There was a problem hiding this comment.
Looking at how this is handled below I don't see the need for two dedicated inputs here. Both of them go through the same loop and populate the same maps. Since the steering already has a --useDCH flag in any case already, just building the combined vector in python and then passing only one should work the same, but use less code below.
If one wants to take this one step further, one could in principle even make this a single collection argument only since there is a CollectionMerger which would allow to merge the collections up-front (again in python). The approach of using ObjectIDs to identify the elements will still work unchanged. However, that will only make sense if there is a canonical set of merged collections, and I don't know if that is the case here.
| const edm4hep::TrackCollection& fittedTracks, | ||
| const std::vector<const edm4hep::TrackCollection*>& perfectFittedTracksVec) const override { | ||
|
|
||
| const int event = m_evt++; |
There was a problem hiding this comment.
Do you need / want to link back this event to the event number from the input file? In that case you would probably have to add an EventHeaderCollection input and take the event number from there. If this is ever run on multiple threads, the current implementation will essentially make a random event assignment.
There was a problem hiding this comment.
Thanks for the comment. Right now m_evt++ is only a local processing counter, not the real input event number. I agree that for a more robust implementation we should use EventHeaderCollection and read the event number from there, especially for multi-threaded running. I’ll keep this as a future improvement.
| mutable AssocTree m_finder_p2t; | ||
| mutable AssocTree m_finder_t2p; | ||
| mutable AssocTree m_perf_p2t; | ||
| mutable AssocTree m_perf_t2p; | ||
|
|
||
| mutable FitterTree m_fit_vs_mc; | ||
| mutable FitterTree m_fit_vs_perfect; |
There was a problem hiding this comment.
These do not look like they are synchronized in any way. Trying to run this algorithm on multiple threads will probably lead to a crash (or potentially just silently written garbage data). I am not sure marking the algorithm as non re-entrant will fix the problem because Gaudi might than just create two instances which both try to write to the same file.
There was a problem hiding this comment.
Thank you, it's a very good point. Right now this is meant to run in serial mode, so it’s not thread-safe. I agree this should be improved in the future, so i will start working on it.
| float safeAtan2(float y, float x) { | ||
| return std::atan2(y, x); | ||
| } |
There was a problem hiding this comment.
This doesn't do any safety checks on top of std::atan2 (as the docstring claims)?
There was a problem hiding this comment.
Thanks, you’re right. The helper did not add any safety checks, so I removed it and call std::atan2 directly.
| parser.add_argument("--runDigi", type=int, default=1, | ||
| help="0=skip digitization, 1=run digitization") | ||
| parser.add_argument("--runFinder", type=int, default=1, | ||
| help="0=skip track finder, 1=run track finder") | ||
| parser.add_argument("--runFitter", type=int, default=1, | ||
| help="0=skip reco fitter, 1=run reco fitter") | ||
| parser.add_argument("--runPerfectTracking", type=int, default=1, | ||
| help="0=skip perfect tracking/perfect fitter, 1=run them") | ||
| parser.add_argument("--runValidation", type=int, default=1, | ||
| help="0=skip validation, 1=run TrackingValidation") | ||
| parser.add_argument("--useDCH", type=int, default=1, | ||
| help="0=disable DCH collections, 1=use DCH collections") |
There was a problem hiding this comment.
These could potentially be
parser.add_argument("--runXXX", action="store_true", default=False, help="Do the XXX")The main disadvantage of that is that explicitly specifying the other value becomes harder (would need more python code).
There was a problem hiding this comment.
Thank you, that would make sense of course, but I kept the explicit 0/1 style here because this steering is also called from shell scripts where passing both enabled and disabled states explicitly is more convenient I guess.
There was a problem hiding this comment.
Fair point. Using a proper bool type would still be more idiomatic and would simplify the logic later on by removing the need for explicit numeric comparisons (e.g., if args.runDigi: instead of if args.runDigi == 1:)
There was a problem hiding this comment.
May not be the most elegant, but I used this in the past:
def str2bool(v):
if v.lower() in ('yes', 'true', 't', 'y', '1', 'True'):
return True
elif v.lower() in ('no', 'false', 'f', 'n', '0', 'False'):
return False
else:
raise argparse.ArgumentTypeError('Boolean value expected.')
parser.add_argument('-printlimits', dest='printlimits', type=str2bool, default="False", help='Print b2j3 and b2j4 to check run2 combination')| parser.add_argument("--mode", type=int, default=0, | ||
| help="Validation mode: 0=Full, 1=FinderOnly, 2=FitterOnly") | ||
| parser.add_argument("--doPerfectFit", type=int, default=1, | ||
| help="0=do not fill fitter_vs_perfect, 1=fill fitter_vs_perfect") |
There was a problem hiding this comment.
Using a choices argument here would make it harder to provide non-supported values.
There was a problem hiding this comment.
Thank you! I added choices to all integer options with a fixed set of allowed values.
|
Hi Arina, now that key4hep/k4RecTracker#77 has been merged and propagated to the nightlies, we can resume this PR! |
| parser.add_argument("--validationFile", default="validation.root", | ||
| help="Output ROOT file written by TrackingValidation") | ||
|
|
||
| parser.add_argument("--geom", |
There was a problem hiding this comment.
I noticed that the PR uses --geom for the detector geometry XML file. However, other parts of the codebase, such as CLD and ILD reconstruction, use --compactFile.
As discussed in key4hep/k4FWCore#407, we could consider standardizing on commonly used arguments. In fact, I often use aliases like export CIF1="--compactFile k4gDir/FCCee/ILD_FCCee/compact/ILD_FCCee_v01/ILD_FCCee_v01.xml" to easily execute things for different detector models.
If --geom is not widely used by major projects, I suggest dropping it in favor of --compactFile to improve consistency and allow for reuse of existing aliases or configurations.
| fi | ||
| fi | ||
|
|
||
| EOS_DIR="/eos/user/a/aponomar/Tracking/validation/${THETA_DEG}deg" |
There was a problem hiding this comment.
I suggest replacing the hardcoded directory with a more flexible approach, such as using environment variables or built-in shell variables. For example, you can use $USER to get the current username:
EOS_DIR="/eos/user/${USER:0:1}/${USER}/Tracking/validation/${THETA_DEG}deg"
BEGINRELEASENOTES
ENDRELEASENOTES
Hello ! This PR introduces the TrackingValidation workflow for studying tracking performance in a flexible and reusable way.
The implementation includes:
The test setup was also designed to run the full chain, with configurable options to enable or disable simulation, digitization, track finding, track fitting, perfect tracking, validation, and DCH usage.
A follow-up improvement that I plan to add is a more robust treatment of the residual distributions, using double-Gaussian approach for a more stable description of core and tail behaviour.