Merge pull request #787 from birdnet-team/eval-tool-documentation

max-mauermann · web-flow · commit 78b2561f1054 · 2025-09-11T17:18:32.000+02:00
included documentation for the eval-tool by René
diff --git a/docs/best-practices.rst b/docs/best-practices.rst
@@ -7,4 +7,6 @@ Best practices
    best-practices/species-lists
    best-practices/segment-review
    best-practices/training
-   best-practices/embeddings
+   best-practices/evaluation-tool
+   best-practices/embeddings
+   
diff --git a/docs/best-practices/evaluation-tool.rst b/docs/best-practices/evaluation-tool.rst
@@ -0,0 +1,193 @@
+Evaluation Tool
+===============
+
+The Evaluation Tab in BirdNET Analyzer is a tool designed to assess the performance of deep learning models on bioacoustic data.
+Whether you are dealing with binary or multi-label classification tasks, this interface calculates and visualizes essential performance metrics. 
+This guide explains each component of the Evaluation Tab and offers step-by-step instructions to ensure a smooth evaluation process.
+
+1. Overview
+-----------------
+
+The Evaluation Tab works by comparing two primary inputs:
+
+* Annotation Files: Files that provide the ground truth labels using Raven selection tables.
+* Prediction Files: Files generated by the BirdNET Analyzer that contain your model’s prediction scores and labels.
+
+By aligning predictions with annotations over uniform time intervals, the system computes a range of performance metrics such as:
+
+* F1 Score
+* Recall
+* Precision
+* Average Precision (AP)
+* AUROC (Area Under the Receiver Operating Characteristic)
+* Accuracy
+
+These metrics help you evaluate how well your model performs on bioacoustic data.
+
+2. File selection
+-------------------
+
+Annotations
+#############
+
+* **Purpose**: Provide the true labels for evaluation.
+* **How to Use**: Upload one or more annotation files via the file dialog or simply drag-and-drop them into the designated area.
+
+Predictions
+#############
+
+* **Purpose**: Supply the model’s prediction data.
+* **How to Use**: Upload one or more prediction files using the same drag-and-drop or file dialog method.
+
+3. Column Mapping for Annotations and Predictions
+---------------------------------------------------
+
+Different input files may use different column names. To ensure the tool can correctly interpret your data, you can map the columns from your files to the expected parameters.
+
+Annotations Mapping
+####################
+
+* **Start Time**: Marks the beginning of the annotation.
+* **End Time**: Marks the end of the annotation.
+* **Class**: Contains the label or category.
+* **Recording**: Identifies the audio file.
+* **Duration**: Indicates the total duration of the audio file.
+
+Predictions Mapping
+######################
+
+* **Start Time**: Marks the beginning of the prediction.
+* **End Time**: Marks the end of the prediction.
+* **Class**: Contains the predicted label.
+* **Confidence**: Holds the confidence scores of the predictions.
+* **Recording**: Identifies the audio file.
+* **Duration**: Indicates the total duration of the audio file.
+
+.. note:: The system pre-populates these fields with default column names. If your files use different column names, simply select the appropriate ones from the drop-down menus.
+
+4. Class Mapping (Optional)
+------------------------------------
+
+If there is a discrepancy between class names in your annotation and prediction files, you can reconcile these differences using a JSON mapping file.
+
+* **Download Template**: Click the "Download Template" button to obtain a sample JSON file that shows how to map the predicted class names to the annotation class names.
+* **Upload Mapping File**: After editing the template to match your naming conventions, upload the updated file to standardize class names across your data.
+
+5. Classes and Recordings Selection
+----------------------------------------
+
+Once you have uploaded and mapped your files, the system automatically extracts the available classes and recordings.
+
+* **Select Classes**: Use the checkbox group to choose specific classes for evaluation. If no selection is made, all classes are included by default.
+* **Select Recordings**: Similarly, select the recordings you wish to evaluate to focus on specific data subsets.
+
+6. Parameters Configuration
+---------------------------------
+
+Customize the evaluation process by adjusting the following parameters:
+
+* **Sample Duration (s)**: The length of each audio segment. (Default: 3 seconds – matching BirdNET’s prediction segment.)
+* **Recording Duration **: Explicitly set the recording duration. (Default: The recording duration is automatically inferred from your files.)
+* **Minimum Overlap (s)**: The minimum time overlap between an annotation and a prediction for them to be considered a match. (Default: 0.5 seconds)
+* **Threshold**: The cut-off value to decide if a prediction is positive. (Default: 0.1)
+* **Class-wise Metrics**: Toggle this option if you want to compute performance metrics for each class individually. If disabled, metrics are averaged across all classes.
+
+7. Metrics Selection
+---------------------------------
+
+Select the performance metrics you want to compute and visualize. The available options include:
+
+* **AUROC**: Measures the probability that the model will rank a random positive case higher than a random negative one.
+
+    * Advantage: Provides an overall sense of the model’s discriminative power, especially with imbalanced data. 
+    * Disadvantage: Can be challenging to interpret.
+
+
+* **Precision**: Indicates how often the model’s positive predictions are correct. 
+    
+    * Advantage: Highlights the model’s accuracy in predicting positives. 
+    * Disadvantage: Does not account for missed positive cases.
+
+* **Recall**: Measures the percentage of actual positive cases the model correctly identifies.
+
+    * Advantage: Ensures that most positive cases are detected.
+    * Disadvantage: May lead to many false positives if not balanced with precision.
+
+* **F1 Score**: The harmonic mean of precision and recall, offering a balanced metric.
+
+    * Advantage: Combines both false positives and false negatives into one score.
+    * Disadvantage: Can be less intuitive if precision and recall values differ greatly.
+
+* **Average Precision (AP)**: Summarizes the precision-recall curve by averaging the precision at each recall level.
+
+    * Advantage: Provides a single metric across all thresholds.
+    * Disadvantage: Can be noisy for classes with few positive cases.
+
+* **Accuracy**: The overall percentage of correct predictions.
+
+    * Advantage: Simple to understand and calculate.
+    * Disadvantage: May be misleading in cases of class imbalance.
+
+8. Actions
+-----------------
+
+After configuring your files and parameters, use the action buttons to execute the evaluation and visualize the results.
+
+* **Calculate Metrics:** Processes your input files and computes the selected performance metrics.
+* **Plot Metrics:** Generates visualizations (line/bar plots) of the computed metrics.
+* **Plot Confusion Matrix**: Displays a confusion matrix showing the correct and incorrect predictions for each class.
+* **Plot Metrics All Thresholds**: Visualizes how performance metrics change across a range of threshold values, helping you understand trade-offs (e.g., between precision and recall).
+* **Download Results Table**: Exports a CSV file containing the computed metrics.
+* **Download Data Table**: Exports a CSV file with the processed data that details the alignment between annotations and predictions.
+
+9. Step-by-Step Usage
+---------------------------------
+
+1.	File Upload
+######################
+
+* Navigate to the File Selection section.
+* Upload your annotation and prediction files using the provided file dialog or drag-and-drop interface.
+
+2.	Column Mapping
+######################
+
+* Review and adjust the column mappings using the drop-down menus to match your file’s structure.
+
+3.	Optional Class Mapping
+######################
+
+* If your class names differ between annotation and prediction files, download the JSON template, update it, and then upload the class mapping file.
+
+4.	Select Classes and Recordings
+######################
+
+* Use the checkbox groups to select the specific classes and recordings you want to evaluate.
+
+5.	Set Parameters
+######################
+
+* Adjust the sample duration, recording duration, minimum overlap, and threshold values.
+* Toggle the Class-wise Metrics option if you require individual class evaluations.
+
+6.	Select Metrics
+######################
+
+* Check the boxes for the performance metrics (AUROC, Precision, Recall, F1 Score, AP, Accuracy) you wish to compute and visualize.
+
+7.	Execute Evaluation and Visualizations
+######################
+
+* Click Calculate Metrics to process the data.
+* Generate visualizations by clicking on Plot Metrics, Plot Confusion Matrix, or Plot Metrics All Thresholds 
+* Download the results or processed data tables as needed.
+
+.. note:: Before generating the visualizations, ensure that you have calculated the metrics by clicking the "Calculate Metrics" button. 
+
+
+10. Conclusion
+-----------------
+
+The Evaluation Tab in BirdNET Analyzer provides a comprehensive and flexible framework to assess the performance of bioacoustic classification models. 
+By following this guide, you can efficiently configure your inputs, adjust evaluation parameters, compute key performance metrics, and generate insightful visualizations. 
+This tool is designed to streamline your evaluation workflow and deepen your understanding of your model’s performance.