Skip to content

Commit c80af76

Browse files
committed
included documentation for the eval-tool by rené
1 parent 40d124a commit c80af76

2 files changed

Lines changed: 196 additions & 1 deletion

File tree

docs/best-practices.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,6 @@ Best practices
77
best-practices/species-lists
88
best-practices/segment-review
99
best-practices/training
10-
best-practices/embeddings
10+
best-practices/evaluation-tool
11+
best-practices/embeddings
12+
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
Evaluation Tool
2+
===============
3+
4+
The Evaluation Tab in BirdNET Analyzer is a tool designed to assess the performance of deep learning models on bioacoustic data.
5+
Whether you are dealing with binary or multi-label classification tasks, this interface calculates and visualizes essential performance metrics.
6+
This guide explains each component of the Evaluation Tab and offers step-by-step instructions to ensure a smooth evaluation process.
7+
8+
1. Overview
9+
-----------------
10+
11+
The Evaluation Tab works by comparing two primary inputs:
12+
13+
* Annotation Files: Files that provide the ground truth labels using Raven selection tables.
14+
* Prediction Files: Files generated by the BirdNET Analyzer that contain your model’s prediction scores and labels.
15+
16+
By aligning predictions with annotations over uniform time intervals, the system computes a range of performance metrics such as:
17+
18+
* F1 Score
19+
* Recall
20+
* Precision
21+
* Average Precision (AP)
22+
* AUROC (Area Under the Receiver Operating Characteristic)
23+
* Accuracy
24+
25+
These metrics help you evaluate how well your model performs on bioacoustic data.
26+
27+
2. File selection
28+
-------------------
29+
30+
Annotations
31+
#############
32+
33+
* **Purpose**: Provide the true labels for evaluation.
34+
* **How to Use**: Upload one or more annotation files via the file dialog or simply drag-and-drop them into the designated area.
35+
36+
Predictions
37+
#############
38+
39+
* **Purpose**: Supply the model’s prediction data.
40+
* **How to Use**: Upload one or more prediction files using the same drag-and-drop or file dialog method.
41+
42+
3. Column Mapping for Annotations and Predictions
43+
---------------------------------------------------
44+
45+
Different input files may use different column names. To ensure the tool can correctly interpret your data, you can map the columns from your files to the expected parameters.
46+
47+
Annotations Mapping
48+
####################
49+
50+
* **Start Time**: Marks the beginning of the annotation.
51+
* **End Time**: Marks the end of the annotation.
52+
* **Class**: Contains the label or category.
53+
* **Recording**: Identifies the audio file.
54+
* **Duration**: Indicates the total duration of the audio file.
55+
56+
Predictions Mapping
57+
######################
58+
59+
* **Start Time**: Marks the beginning of the prediction.
60+
* **End Time**: Marks the end of the prediction.
61+
* **Class**: Contains the predicted label.
62+
* **Confidence**: Holds the confidence scores of the predictions.
63+
* **Recording**: Identifies the audio file.
64+
* **Duration**: Indicates the total duration of the audio file.
65+
66+
.. note:: The system pre-populates these fields with default column names. If your files use different column names, simply select the appropriate ones from the drop-down menus.
67+
68+
4. Class Mapping (Optional)
69+
------------------------------------
70+
71+
If there is a discrepancy between class names in your annotation and prediction files, you can reconcile these differences using a JSON mapping file.
72+
73+
* **Download Template**: Click the "Download Template" button to obtain a sample JSON file that shows how to map the predicted class names to the annotation class names.
74+
* **Upload Mapping File**: After editing the template to match your naming conventions, upload the updated file to standardize class names across your data.
75+
76+
5. Classes and Recordings Selection
77+
----------------------------------------
78+
79+
Once you have uploaded and mapped your files, the system automatically extracts the available classes and recordings.
80+
81+
* **Select Classes**: Use the checkbox group to choose specific classes for evaluation. If no selection is made, all classes are included by default.
82+
* **Select Recordings**: Similarly, select the recordings you wish to evaluate to focus on specific data subsets.
83+
84+
6. Parameters Configuration
85+
---------------------------------
86+
87+
Customize the evaluation process by adjusting the following parameters:
88+
89+
* **Sample Duration (s)**: The length of each audio segment. (Default: 3 seconds – matching BirdNET’s prediction segment.)
90+
* **Recording Duration **: Explicitly set the recording duration. (Default: The recording duration is automatically inferred from your files.)
91+
* **Minimum Overlap (s)**: The minimum time overlap between an annotation and a prediction for them to be considered a match. (Default: 0.5 seconds)
92+
* **Threshold**: The cut-off value to decide if a prediction is positive. (Default: 0.1)
93+
* **Class-wise Metrics**: Toggle this option if you want to compute performance metrics for each class individually. If disabled, metrics are averaged across all classes.
94+
95+
7. Metrics Selection
96+
---------------------------------
97+
98+
Select the performance metrics you want to compute and visualize. The available options include:
99+
100+
* **AUROC**: Measures the probability that the model will rank a random positive case higher than a random negative one.
101+
102+
* Advantage: Provides an overall sense of the model’s discriminative power, especially with imbalanced data.
103+
* Disadvantage: Can be challenging to interpret.
104+
105+
106+
* **Precision**: Indicates how often the model’s positive predictions are correct.
107+
108+
* Advantage: Highlights the model’s accuracy in predicting positives.
109+
* Disadvantage: Does not account for missed positive cases.
110+
111+
* **Recall**: Measures the percentage of actual positive cases the model correctly identifies.
112+
113+
* Advantage: Ensures that most positive cases are detected.
114+
* Disadvantage: May lead to many false positives if not balanced with precision.
115+
116+
* **F1 Score**: The harmonic mean of precision and recall, offering a balanced metric.
117+
118+
* Advantage: Combines both false positives and false negatives into one score.
119+
* Disadvantage: Can be less intuitive if precision and recall values differ greatly.
120+
121+
* **Average Precision (AP)**: Summarizes the precision-recall curve by averaging the precision at each recall level.
122+
123+
* Advantage: Provides a single metric across all thresholds.
124+
* Disadvantage: Can be noisy for classes with few positive cases.
125+
126+
* **Accuracy**: The overall percentage of correct predictions.
127+
128+
* Advantage: Simple to understand and calculate.
129+
* Disadvantage: May be misleading in cases of class imbalance.
130+
131+
8. Actions
132+
-----------------
133+
134+
After configuring your files and parameters, use the action buttons to execute the evaluation and visualize the results.
135+
136+
* **Calculate Metrics:** Processes your input files and computes the selected performance metrics.
137+
* **Plot Metrics:** Generates visualizations (line/bar plots) of the computed metrics.
138+
* **Plot Confusion Matrix**: Displays a confusion matrix showing the correct and incorrect predictions for each class.
139+
* **Plot Metrics All Thresholds**: Visualizes how performance metrics change across a range of threshold values, helping you understand trade-offs (e.g., between precision and recall).
140+
* **Download Results Table**: Exports a CSV file containing the computed metrics.
141+
* **Download Data Table**: Exports a CSV file with the processed data that details the alignment between annotations and predictions.
142+
143+
9. Step-by-Step Usage
144+
---------------------------------
145+
146+
1. File Upload
147+
######################
148+
149+
* Navigate to the File Selection section.
150+
* Upload your annotation and prediction files using the provided file dialog or drag-and-drop interface.
151+
152+
2. Column Mapping
153+
######################
154+
155+
* Review and adjust the column mappings using the drop-down menus to match your file’s structure.
156+
157+
3. Optional Class Mapping
158+
######################
159+
160+
* If your class names differ between annotation and prediction files, download the JSON template, update it, and then upload the class mapping file.
161+
162+
4. Select Classes and Recordings
163+
######################
164+
165+
* Use the checkbox groups to select the specific classes and recordings you want to evaluate.
166+
167+
5. Set Parameters
168+
######################
169+
170+
* Adjust the sample duration, recording duration, minimum overlap, and threshold values.
171+
* Toggle the Class-wise Metrics option if you require individual class evaluations.
172+
173+
6. Select Metrics
174+
######################
175+
176+
* Check the boxes for the performance metrics (AUROC, Precision, Recall, F1 Score, AP, Accuracy) you wish to compute and visualize.
177+
178+
7. Execute Evaluation and Visualizations
179+
######################
180+
181+
* Click Calculate Metrics to process the data.
182+
* Generate visualizations by clicking on Plot Metrics, Plot Confusion Matrix, or Plot Metrics All Thresholds
183+
* Download the results or processed data tables as needed.
184+
185+
.. note:: Before generating the visualizations, ensure that you have calculated the metrics by clicking the "Calculate Metrics" button.
186+
187+
188+
10. Conclusion
189+
-----------------
190+
191+
The Evaluation Tab in BirdNET Analyzer provides a comprehensive and flexible framework to assess the performance of bioacoustic classification models.
192+
By following this guide, you can efficiently configure your inputs, adjust evaluation parameters, compute key performance metrics, and generate insightful visualizations.
193+
This tool is designed to streamline your evaluation workflow and deepen your understanding of your model’s performance.

0 commit comments

Comments
 (0)