Add adaption example to Auto_FL by ZiyueXu77 · Pull Request #4560 · NVIDIA/NVFlare

ZiyueXu77 · 2026-05-08T19:02:32Z

Fixes # .

Description

Add a section and example to show how the existing concept of auto_fl can be adopted to a new task and execution environment

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Quick tests passed locally by running ./runtest.sh.
In-line docstrings updated.
Documentation updated.

greptile-apps · 2026-05-08T19:06:55Z

Greptile Summary

This PR adds a vlm_local task profile showing how to adapt the Auto-FL research loop to a 3-site medical VLM federated learning scenario (Qwen3-VL LoRA on VQA-RAD, SLAKE, PathVQA) on a local single-GPU machine, reusing all shared scripts, ledger helpers, and aggregation utilities from the parent directory.

New profile files: client.py implements FedProx, FedDyn, and SAM training paths; job.py correctly forwards all client arguments including SAM/FedDyn args; model.py exposes only LoRA adapter tensors for efficient NVFlare aggregation; data/med_vlm_data_utils.py maps sites to VQA datasets.
Parent harness changes: extract_score.py prepends token_f1 to the metric key list; run_iteration.sh is parameterized with JOB_SCRIPT and CLIENT_CONTRACT_PATH so profiles can reuse the runner without copying it.

Confidence Score: 5/5

Safe to merge; changes are additive example files and a small backward-compatible parameterization of the shared runner script.

All changed files are either documentation, a new research example subdirectory, or a minor runner parameterization. Previously flagged bugs (torch_dtype parameter name, missing SAM/FedDyn forwarding) have been corrected. Remaining findings are defensive-coding suggestions with no impact on correctness for the normal execution path.

vlm_local/data/med_vlm_data_utils.py has a hardcoded Phase_3.1 subdirectory path worth tracking if the VLM_Benchmark repo structure changes.

Important Files Changed

Filename	Overview
research/auto-fl-research/vlm_local/client.py	Full NVFlare FL client for 3-site medical VLM LoRA training; implements FedProx, FedDyn, and SAM regularization paths. Uses correct torch_dtype= parameter. SAM/FedDyn args are forwarded from job.py. One silent-zero-division pattern in avg_loss computation when micro_steps=0.
research/auto-fl-research/vlm_local/data/med_vlm_data_utils.py	Medical VQA dataset bridge; correctly guards answers[0] with empty-list check. Hardcodes Phase_3.1 subdirectory when extending sys.path, fragile against VLM_Benchmark repo reorganization.
research/auto-fl-research/vlm_local/job.py	NVFlare FedAvgRecipe job generator; correctly forwards all client args including sam_rho, sam_eps, and feddyn_alpha through build_train_args.
research/auto-fl-research/vlm_local/model.py	Adapter-only state model for NVFlare aggregation; correctly isolates LoRA tensors with sanitized parameter names and guards RNG state around model init.
research/auto-fl-research/vlm_local/train_utils.py	Training utilities; correctly raises on empty dataset. Fallback token_f1 catches ImportError only. Silent avg_loss=0 path if micro_steps==0 is guarded by main().
research/auto-fl-research/scripts/run_iteration.sh	Parameterizes JOB_SCRIPT and CLIENT_CONTRACT_PATH so vlm_local and future task profiles can reuse the parent runner without copying it.
research/auto-fl-research/scripts/extract_score.py	Prepends token_f1 to METRIC_KEYS so the shared score extractor finds VLM evaluation results before falling back to CIFAR-10 accuracy keys.

Sequence Diagram

sequenceDiagram
    participant Runner as run_iteration.sh
    participant Job as vlm_local/job.py
    participant NVFlare as NVFlare Simulator
    participant Client as vlm_local/client.py
    participant VLM as Qwen3-VL + PEFT LoRA
    participant Server as NVFlare Server

    Runner->>Job: python $JOB_SCRIPT --cross_site_eval [args]
    Job->>Job: resolve_qwen3vl_adapter_shape()
    Job->>NVFlare: FedAvgRecipe.execute(SimEnv)

    loop num_rounds
        NVFlare->>Client: flare.receive() global adapter state
        Client->>VLM: load_state_dict via adapter_state_to_peft_state()
        Client->>VLM: local LoRA training (FedProx / FedDyn / SAM)
        Client->>Client: compute_model_diff(model, global_model)
        Client->>NVFlare: flare.send(ParamsType.DIFF + NUM_STEPS)
        NVFlare->>Server: aggregate DIFF tensors (weighted)
    end

    NVFlare->>Client: flare.is_evaluate() cross-site eval
    Client->>VLM: evaluate_vlm_generative() token_f1
    Client->>NVFlare: flare.send(metrics token_f1)
    NVFlare-->>Job: result_dir
    Job->>Runner: write AUTOFL_RESULT_DIR_FILE sidecar

_{Reviews (7): Last reviewed commit: "Handle empty VLM validation answers" | Re-trigger Greptile}

Copilot

Pull request overview

Adds a new “vlm_local” task profile under research/auto-fl-research/ to demonstrate how the existing Auto-FL NVFlare harness can be adapted to a local single-GPU, 3-site medical VLM LoRA-adapter workflow, while reusing the parent harness scripts/templates/aggregators.

Changes:

Introduces a VLM-local profile (vlm_local/) with its own client loop, job generator, adapter-only model state, dataset bridge, metric utilities, and mutation schema.
Updates the parent scripts/run_iteration.sh to allow selecting alternate profile entrypoints via JOB_SCRIPT and CLIENT_CONTRACT_PATH.
Expands the parent research/auto-fl-research/README.md with documentation on using and adapting the new VLM profile.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
research/auto-fl-research/vlm_local/train_utils.py	Adds VLM evaluation (generative token-F1) and DIFF helper utilities.
research/auto-fl-research/vlm_local/requirements.txt	Defines Python dependencies for running the VLM-local profile.
research/auto-fl-research/vlm_local/README.md	Documents how the VLM profile layers onto the parent Auto-FL harness.
research/auto-fl-research/vlm_local/program.md	Defines the VLM profile contract, scope, and fixed baseline budget.
research/auto-fl-research/vlm_local/mutation_schema.yaml	Constrains the mutation/edit surface for the VLM profile.
research/auto-fl-research/vlm_local/model.py	Adds an adapter-only (LoRA) server-side state model for aggregation.
research/auto-fl-research/vlm_local/job.py	Adds a Recipe-based job generator for the local 3-site medical VLM simulation.
research/auto-fl-research/vlm_local/data/med_vlm_data_utils.py	Adds deterministic site→dataset mapping and dataset/collator wiring to VLM_Benchmark.
research/auto-fl-research/vlm_local/data/init.py	Declares the VLM profile data package.
research/auto-fl-research/vlm_local/client.py	Implements the NVFlare client loop for adapter DIFF training/evaluation on VLM.
research/auto-fl-research/scripts/run_iteration.sh	Adds env-overridable job/client paths for profile-based runs.
research/auto-fl-research/README.md	Adds documentation for running and adapting the VLM-local profile.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

add adaption example

78c471b

Copilot AI review requested due to automatic review settings May 8, 2026 19:02

Copilot started reviewing on behalf of ZiyueXu77 May 8, 2026 19:03 View session

github-advanced-security AI found potential problems May 8, 2026

View reviewed changes

Comment thread research/auto-fl-research/vlm_local/client.py Fixed

Comment thread research/auto-fl-research/vlm_local/train_utils.py Fixed

Comment thread research/auto-fl-research/vlm_local/train_utils.py Fixed

Comment thread research/auto-fl-research/vlm_local/train_utils.py Fixed

Copilot AI reviewed May 8, 2026

View reviewed changes

Comment thread research/auto-fl-research/vlm_local/train_utils.py Outdated

Comment thread research/auto-fl-research/vlm_local/data/med_vlm_data_utils.py Outdated

Comment thread research/auto-fl-research/vlm_local/client.py

Comment thread research/auto-fl-research/vlm_local/client.py

greptile-apps Bot reviewed May 8, 2026

View reviewed changes

Comment thread research/auto-fl-research/vlm_local/client.py

Comment thread research/auto-fl-research/vlm_local/train_utils.py Outdated

ZiyueXu77 added 5 commits May 8, 2026 15:17

polish

af7cdb7

remove potential info leak

db888c6

Merge branch 'main' into auto_fl

43466eb

review fix

ba8b4db

review fix

dedb583

ZiyueXu77 requested a review from holgerroth May 8, 2026 19:47

holgerroth reviewed May 8, 2026

View reviewed changes

Comment thread research/auto-fl-research/README.md Outdated

holgerroth requested changes May 8, 2026

View reviewed changes

readme updates

3e1a420

greptile-apps Bot reviewed May 8, 2026

View reviewed changes

Comment thread research/auto-fl-research/vlm_local/data/med_vlm_data_utils.py Outdated

Handle empty VLM validation answers

f27e42b

ZiyueXu77 requested a review from holgerroth May 8, 2026 21:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add adaption example to Auto_FL#4560

Add adaption example to Auto_FL#4560
ZiyueXu77 wants to merge 8 commits intoNVIDIA:mainfrom
ZiyueXu77:auto_fl

ZiyueXu77 commented May 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented May 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ZiyueXu77 commented May 8, 2026

Description

Types of changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

greptile-apps Bot commented May 8, 2026 •

edited

Loading