Summary
Allow a single experiment / model / dataset to handle mixed sample modalities and task types by making task_type a sample-level property. The model/experiment should be able to accept samples of these input modalities: 2D image, 3D image (volumetric), 2D point cloud (projected / organized), 3D point cloud — and produce any combination of outputs per sample: classification (cls), segmentation (seg), and detection (det). All modalities and tasks must be manageable in one experiment and one dataset.
Motivation
- Current task_type often treated as a dataset/experiment-level property which prevents mixing modalities and tasks inside the same run.
- We want a flexible training/evaluation pipeline and model that can learn from and output different task types per sample (e.g., some samples with cls+seg, others with image_det, others with ptCloud_det).
- This simplifies multi-task, multi-modal research workflows and supports unified logging/metrics for experiments that combine 2D/3D and image/point-cloud data.
Proposal
- Sample schema
- Add/standardize a sample-level field
task_type (string or enum) describing the sample's required outputs. Examples: "image_cls", "image_seg", "image_det", "vol3d_cls", "vol3d_seg", "vol3d_det", "pc2d_cls", "pc2d_seg", "pc2d_det", "pc3d_cls", "pc3d_seg", "pc3d_det".
- Add a
modalities field that lists available inputs in the sample (e.g., ["image2d"], ["depth_vol"], ["pc3d"], ["image2d","pc3d"]).
- Add
targets object that contains the ground-truth data keyed by task/subtype (e.g., targets.class, targets.segmentation, targets.bboxes, targets.pc_bboxes). Each target should be optional and present only when the sample's task_type requires it.
- Allow
task_type to be set manually or inferred automatically from the presence of specific target fields (auto-detection fallback).
- Dataset format & ingestion
- When ingesting datasets, prefer reading
task_type from the sample metadata when present.
- If absent, auto-detect
task_type with a deterministic rule based on present targets fields (e.g., presence of "masks" => image_seg; presence of "bboxes" with 2D image input => image_det; presence of point-cloud bbox fields => pc3d_det).
- Maintain backward compatibility: if dataset-level task_type exists, allow it as a default for samples that do not specify
task_type.
- Experiment / model API
- Experiment batch sampler must be able to create mixed batches (or stacked micro-batches) where each sample carries its task_type and targets.
- Model forward must support conditional heads activated per-sample (or batch element), or use multi-head outputs where losses are masked by sample task_type.
- Training loop should compute losses only for the outputs relevant to each sample's task_type and aggregate them consistently (weighted sum, separate optimizers optional).
- Evaluation/metrics pipeline should compute per-task and aggregate metrics and allow filtering by modality/task_type.
- Seg rendering from task_type
- When rendering segmentation annotations (for visualization or loss), use the sample's
task_type to select which renderer to use (image masks vs. voxel masks vs. point-cloud labels).
- Provide a shared renderer interface that maps task_type to a rendering function.
- Detection (image and point cloud)
- Support both image_det and ptCloud_det targets. Use unified bbox/annotation schema with modality-specific coordinate frames.
- Document coordinate transforms expected by evaluation modules.
Acceptance criteria
Implementation notes / suggestions
- Add enums/constants for supported task_types in a central place (e.g., weightslab.datasets.tasks.TaskType).
- Make
targets a dict with well-known keys: class, mask_image, mask_volume, bboxes_2d, bboxes_3d, pc_labels, etc. Keep values optional.
- For training, have a wrapper Batch class that exposes per-sample masks for which loss terms should be computed.
- Consider micro-batching by task_type when mixed batches cause inefficient compute (e.g., different heads require different preprocessing).
- Provide utilities to migrate legacy datasets: a migration script that reads dataset metadata and emits sample-level
task_type when safe.
Risks / open questions
- Mixed-modality batches may complicate GPU memory patterns and cause inefficiencies; we might need to allow batching by dominant modality/task_type.
- Standardizing target field names requires careful migration and clear docs to avoid dataset errors.
- Deciding the canonical set of task_type strings needs agreement.
Next steps
- Design the sample JSON schema and add to repository docs (examples for each modality).
- Add task_type enum and target keys as constants in the codebase.
- Implement dataset ingestion changes and automatic inference rules.
- Update model training loop and evaluators to support per-sample masking.
- Add tests (unit and integration) and a migration tool for legacy datasets.
Notes from user
Task type management should be sample related, i.e., seg rendering from tasktype, cls, image_det, ptCloud_det, all from samples (manual or auto). The goal is one experiment, one model, one dataset that can include 2D image, 3D image, 2D point cloud, 3D point cloud samples and produce cls/seg/det as appropriate per sample.
Summary
Allow a single experiment / model / dataset to handle mixed sample modalities and task types by making task_type a sample-level property. The model/experiment should be able to accept samples of these input modalities: 2D image, 3D image (volumetric), 2D point cloud (projected / organized), 3D point cloud — and produce any combination of outputs per sample: classification (cls), segmentation (seg), and detection (det). All modalities and tasks must be manageable in one experiment and one dataset.
Motivation
Proposal
task_type(string or enum) describing the sample's required outputs. Examples: "image_cls", "image_seg", "image_det", "vol3d_cls", "vol3d_seg", "vol3d_det", "pc2d_cls", "pc2d_seg", "pc2d_det", "pc3d_cls", "pc3d_seg", "pc3d_det".modalitiesfield that lists available inputs in the sample (e.g., ["image2d"], ["depth_vol"], ["pc3d"], ["image2d","pc3d"]).targetsobject that contains the ground-truth data keyed by task/subtype (e.g., targets.class, targets.segmentation, targets.bboxes, targets.pc_bboxes). Each target should be optional and present only when the sample's task_type requires it.task_typeto be set manually or inferred automatically from the presence of specific target fields (auto-detection fallback).task_typefrom the sample metadata when present.task_typewith a deterministic rule based on presenttargetsfields (e.g., presence of "masks" => image_seg; presence of "bboxes" with 2D image input => image_det; presence of point-cloud bbox fields => pc3d_det).task_type.task_typeto select which renderer to use (image masks vs. voxel masks vs. point-cloud labels).Acceptance criteria
task_typevalues that inform downstream ingestion and training pipelines.task_typeis missing and follows deterministic rules.task_type.Implementation notes / suggestions
targetsa dict with well-known keys:class,mask_image,mask_volume,bboxes_2d,bboxes_3d,pc_labels, etc. Keep values optional.task_typewhen safe.Risks / open questions
Next steps
Notes from user
Task type management should be sample related, i.e., seg rendering from tasktype, cls, image_det, ptCloud_det, all from samples (manual or auto). The goal is one experiment, one model, one dataset that can include 2D image, 3D image, 2D point cloud, 3D point cloud samples and produce cls/seg/det as appropriate per sample.