|
| 1 | +Evaluation Tool |
| 2 | +=============== |
| 3 | + |
| 4 | +The Evaluation Tab in BirdNET Analyzer is a tool designed to assess the performance of deep learning models on bioacoustic data. |
| 5 | +Whether you are dealing with binary or multi-label classification tasks, this interface calculates and visualizes essential performance metrics. |
| 6 | +This guide explains each component of the Evaluation Tab and offers step-by-step instructions to ensure a smooth evaluation process. |
| 7 | + |
| 8 | +1. Overview |
| 9 | +----------------- |
| 10 | + |
| 11 | +The Evaluation Tab works by comparing two primary inputs: |
| 12 | + |
| 13 | +* Annotation Files: Files that provide the ground truth labels using Raven selection tables. |
| 14 | +* Prediction Files: Files generated by the BirdNET Analyzer that contain your model’s prediction scores and labels. |
| 15 | + |
| 16 | +By aligning predictions with annotations over uniform time intervals, the system computes a range of performance metrics such as: |
| 17 | + |
| 18 | +* F1 Score |
| 19 | +* Recall |
| 20 | +* Precision |
| 21 | +* Average Precision (AP) |
| 22 | +* AUROC (Area Under the Receiver Operating Characteristic) |
| 23 | +* Accuracy |
| 24 | + |
| 25 | +These metrics help you evaluate how well your model performs on bioacoustic data. |
| 26 | + |
| 27 | +2. File selection |
| 28 | +------------------- |
| 29 | + |
| 30 | +Annotations |
| 31 | +############# |
| 32 | + |
| 33 | +* **Purpose**: Provide the true labels for evaluation. |
| 34 | +* **How to Use**: Upload one or more annotation files via the file dialog or simply drag-and-drop them into the designated area. |
| 35 | + |
| 36 | +Predictions |
| 37 | +############# |
| 38 | + |
| 39 | +* **Purpose**: Supply the model’s prediction data. |
| 40 | +* **How to Use**: Upload one or more prediction files using the same drag-and-drop or file dialog method. |
| 41 | + |
| 42 | +3. Column Mapping for Annotations and Predictions |
| 43 | +--------------------------------------------------- |
| 44 | + |
| 45 | +Different input files may use different column names. To ensure the tool can correctly interpret your data, you can map the columns from your files to the expected parameters. |
| 46 | + |
| 47 | +Annotations Mapping |
| 48 | +#################### |
| 49 | + |
| 50 | +* **Start Time**: Marks the beginning of the annotation. |
| 51 | +* **End Time**: Marks the end of the annotation. |
| 52 | +* **Class**: Contains the label or category. |
| 53 | +* **Recording**: Identifies the audio file. |
| 54 | +* **Duration**: Indicates the total duration of the audio file. |
| 55 | + |
| 56 | +Predictions Mapping |
| 57 | +###################### |
| 58 | + |
| 59 | +* **Start Time**: Marks the beginning of the prediction. |
| 60 | +* **End Time**: Marks the end of the prediction. |
| 61 | +* **Class**: Contains the predicted label. |
| 62 | +* **Confidence**: Holds the confidence scores of the predictions. |
| 63 | +* **Recording**: Identifies the audio file. |
| 64 | +* **Duration**: Indicates the total duration of the audio file. |
| 65 | + |
| 66 | +.. note:: The system pre-populates these fields with default column names. If your files use different column names, simply select the appropriate ones from the drop-down menus. |
| 67 | + |
| 68 | +4. Class Mapping (Optional) |
| 69 | +------------------------------------ |
| 70 | + |
| 71 | +If there is a discrepancy between class names in your annotation and prediction files, you can reconcile these differences using a JSON mapping file. |
| 72 | + |
| 73 | +* **Download Template**: Click the "Download Template" button to obtain a sample JSON file that shows how to map the predicted class names to the annotation class names. |
| 74 | +* **Upload Mapping File**: After editing the template to match your naming conventions, upload the updated file to standardize class names across your data. |
| 75 | + |
| 76 | +5. Classes and Recordings Selection |
| 77 | +---------------------------------------- |
| 78 | + |
| 79 | +Once you have uploaded and mapped your files, the system automatically extracts the available classes and recordings. |
| 80 | + |
| 81 | +* **Select Classes**: Use the checkbox group to choose specific classes for evaluation. If no selection is made, all classes are included by default. |
| 82 | +* **Select Recordings**: Similarly, select the recordings you wish to evaluate to focus on specific data subsets. |
| 83 | + |
| 84 | +6. Parameters Configuration |
| 85 | +--------------------------------- |
| 86 | + |
| 87 | +Customize the evaluation process by adjusting the following parameters: |
| 88 | + |
| 89 | +* **Sample Duration (s)**: The length of each audio segment. (Default: 3 seconds – matching BirdNET’s prediction segment.) |
| 90 | +* **Recording Duration **: Explicitly set the recording duration. (Default: The recording duration is automatically inferred from your files.) |
| 91 | +* **Minimum Overlap (s)**: The minimum time overlap between an annotation and a prediction for them to be considered a match. (Default: 0.5 seconds) |
| 92 | +* **Threshold**: The cut-off value to decide if a prediction is positive. (Default: 0.1) |
| 93 | +* **Class-wise Metrics**: Toggle this option if you want to compute performance metrics for each class individually. If disabled, metrics are averaged across all classes. |
| 94 | + |
| 95 | +7. Metrics Selection |
| 96 | +--------------------------------- |
| 97 | + |
| 98 | +Select the performance metrics you want to compute and visualize. The available options include: |
| 99 | + |
| 100 | +* **AUROC**: Measures the probability that the model will rank a random positive case higher than a random negative one. |
| 101 | + |
| 102 | + * Advantage: Provides an overall sense of the model’s discriminative power, especially with imbalanced data. |
| 103 | + * Disadvantage: Can be challenging to interpret. |
| 104 | + |
| 105 | + |
| 106 | +* **Precision**: Indicates how often the model’s positive predictions are correct. |
| 107 | + |
| 108 | + * Advantage: Highlights the model’s accuracy in predicting positives. |
| 109 | + * Disadvantage: Does not account for missed positive cases. |
| 110 | + |
| 111 | +* **Recall**: Measures the percentage of actual positive cases the model correctly identifies. |
| 112 | + |
| 113 | + * Advantage: Ensures that most positive cases are detected. |
| 114 | + * Disadvantage: May lead to many false positives if not balanced with precision. |
| 115 | + |
| 116 | +* **F1 Score**: The harmonic mean of precision and recall, offering a balanced metric. |
| 117 | + |
| 118 | + * Advantage: Combines both false positives and false negatives into one score. |
| 119 | + * Disadvantage: Can be less intuitive if precision and recall values differ greatly. |
| 120 | + |
| 121 | +* **Average Precision (AP)**: Summarizes the precision-recall curve by averaging the precision at each recall level. |
| 122 | + |
| 123 | + * Advantage: Provides a single metric across all thresholds. |
| 124 | + * Disadvantage: Can be noisy for classes with few positive cases. |
| 125 | + |
| 126 | +* **Accuracy**: The overall percentage of correct predictions. |
| 127 | + |
| 128 | + * Advantage: Simple to understand and calculate. |
| 129 | + * Disadvantage: May be misleading in cases of class imbalance. |
| 130 | + |
| 131 | +8. Actions |
| 132 | +----------------- |
| 133 | + |
| 134 | +After configuring your files and parameters, use the action buttons to execute the evaluation and visualize the results. |
| 135 | + |
| 136 | +* **Calculate Metrics:** Processes your input files and computes the selected performance metrics. |
| 137 | +* **Plot Metrics:** Generates visualizations (line/bar plots) of the computed metrics. |
| 138 | +* **Plot Confusion Matrix**: Displays a confusion matrix showing the correct and incorrect predictions for each class. |
| 139 | +* **Plot Metrics All Thresholds**: Visualizes how performance metrics change across a range of threshold values, helping you understand trade-offs (e.g., between precision and recall). |
| 140 | +* **Download Results Table**: Exports a CSV file containing the computed metrics. |
| 141 | +* **Download Data Table**: Exports a CSV file with the processed data that details the alignment between annotations and predictions. |
| 142 | + |
| 143 | +9. Step-by-Step Usage |
| 144 | +--------------------------------- |
| 145 | + |
| 146 | +1. File Upload |
| 147 | +###################### |
| 148 | + |
| 149 | +* Navigate to the File Selection section. |
| 150 | +* Upload your annotation and prediction files using the provided file dialog or drag-and-drop interface. |
| 151 | + |
| 152 | +2. Column Mapping |
| 153 | +###################### |
| 154 | + |
| 155 | +* Review and adjust the column mappings using the drop-down menus to match your file’s structure. |
| 156 | + |
| 157 | +3. Optional Class Mapping |
| 158 | +###################### |
| 159 | + |
| 160 | +* If your class names differ between annotation and prediction files, download the JSON template, update it, and then upload the class mapping file. |
| 161 | + |
| 162 | +4. Select Classes and Recordings |
| 163 | +###################### |
| 164 | + |
| 165 | +* Use the checkbox groups to select the specific classes and recordings you want to evaluate. |
| 166 | + |
| 167 | +5. Set Parameters |
| 168 | +###################### |
| 169 | + |
| 170 | +* Adjust the sample duration, recording duration, minimum overlap, and threshold values. |
| 171 | +* Toggle the Class-wise Metrics option if you require individual class evaluations. |
| 172 | + |
| 173 | +6. Select Metrics |
| 174 | +###################### |
| 175 | + |
| 176 | +* Check the boxes for the performance metrics (AUROC, Precision, Recall, F1 Score, AP, Accuracy) you wish to compute and visualize. |
| 177 | + |
| 178 | +7. Execute Evaluation and Visualizations |
| 179 | +###################### |
| 180 | + |
| 181 | +* Click Calculate Metrics to process the data. |
| 182 | +* Generate visualizations by clicking on Plot Metrics, Plot Confusion Matrix, or Plot Metrics All Thresholds |
| 183 | +* Download the results or processed data tables as needed. |
| 184 | + |
| 185 | +.. note:: Before generating the visualizations, ensure that you have calculated the metrics by clicking the "Calculate Metrics" button. |
| 186 | + |
| 187 | + |
| 188 | +10. Conclusion |
| 189 | +----------------- |
| 190 | + |
| 191 | +The Evaluation Tab in BirdNET Analyzer provides a comprehensive and flexible framework to assess the performance of bioacoustic classification models. |
| 192 | +By following this guide, you can efficiently configure your inputs, adjust evaluation parameters, compute key performance metrics, and generate insightful visualizations. |
| 193 | +This tool is designed to streamline your evaluation workflow and deepen your understanding of your model’s performance. |
0 commit comments