RAC examples with empty `true_labels` are silently ignored, leading to incorrect classification

### Summary

When using the `ZeroShotClassificationPipeline` from `gliclass`, any examples in `rac_examples` that contain empty `true_labels` are **silently discarded**. This causes serious calibration issues and **inflated confidence in incorrect predictions**, especially in NLI-style classification tasks.

The situation worsens even when only **a single positive `true_label` is added** — the output becomes biased due to the lack of negative or neutral signals. The pipeline ends up treating this lone positive example as sufficient evidence, ignoring any counter-examples with no true labels.

---

### 🔬 Minimal Working Example

#### ❌ With `rac_examples` — incorrect, high-confidence result

```python
from transformers import AutoTokenizer
from gliclass import GLiClassModel, ZeroShotClassificationPipeline 
import torch

model_str = "knowledgator/gliclass-base-v2.0-rac-init"

model = GLiClassModel.from_pretrained(model_str)
tokenizer = AutoTokenizer.from_pretrained(model_str)

device = 'mps' if torch.backends.mps.is_available() else 'cuda:0' if torch.cuda.is_available() else 'cpu'

pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device=device)

example_1 = {
    "text": "I submitted my application last week but haven’t heard back yet.",
    "all_labels": ["this is about post-application"],
    "true_labels": ["this is about post-application"]
}

# ❌ This negative example is silently discarded
example_2 = {
    "text": "I was filling out the job application form when the site crashed.",
    "all_labels": ["this is about post-application"],
    "true_labels": []
}

premise = "The job portal crashed while I was still filling out the application."
hypotheses = ["this is about post-application"]

results = pipeline(premise, hypotheses, threshold=0.0, rac_examples=[example_1, example_2])[0]
print(results)
```

**Output:**
```json
[{'label': 'this is about post-application', 'score': 0.9948280453681946}]
```

> 🔍 Even though the premise is about the **pre-application** stage, the model outputs a high-confidence score for **post-application**, due to the lack of counterbalancing from `example_2`.

---

#### 🟢 Without `rac_examples` — correct behavior

```python
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device=device)

premise = "The job portal crashed while I was still filling out the application."
hypotheses = ["this is about post-application"]

results = pipeline(premise, hypotheses, threshold=0.0)[0]
print(results)
```

**Output:**
```json
[{'label': 'this is about post-application', 'score': 0.10260037332773209}]
```

> ✅ Without the misleading calibration, the model gives a low score — as expected.

---

### ⚠️ SINGLE TRUE LABEL ADDED — Still bad behavior

Even using just a **single positive RAC example** (no counter-examples), we see the same high-confidence issue:

```python
example_1 = {
    "text": "I submitted my application last week but haven’t heard back yet.",
    "all_labels": ["this is about post-application"],
    "true_labels": ["this is about post-application"]
}

results = pipeline(premise, hypotheses, threshold=0.0, rac_examples=[example_1])[0]
print(results)
```

**Output:**
```json
[{'label': 'this is about post-application', 'score': 0.9948280453681946}]
```

---

### ✅ Expected Behavior

- Examples with `true_labels=[]` should:
  - ❗ Act as **negative signals**, indicating “this text is *not* about the listed labels”; **OR**
  - ⚠️ Trigger a **clear warning** that the example will be ignored, so users can avoid false calibration.

---

### 💡 Why This Matters

- In zero-shot or few-shot setups, users **expect every example to contribute** to the output decision.
- Silently discarding negative or neutral examples **skews predictions**, especially when examples are few.
- This reduces **trust**, interpretability, and can yield **confidently wrong classifications** — a critical issue in real-world deployments.

---

### 🧪 Environment

- Dependency snapshot: [uv.lock.txt](https://github.com/user-attachments/files/19771666/uv.lock.txt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAC examples with empty `true_labels` are silently ignored, leading to incorrect classification #21

Summary

🔬 Minimal Working Example

❌ With `rac_examples` — incorrect, high-confidence result

🟢 Without `rac_examples` — correct behavior

⚠️ SINGLE TRUE LABEL ADDED — Still bad behavior

✅ Expected Behavior

💡 Why This Matters

🧪 Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

RAC examples with empty true_labels are silently ignored, leading to incorrect classification #21

Description

Summary

🔬 Minimal Working Example

❌ With rac_examples — incorrect, high-confidence result

🟢 Without rac_examples — correct behavior

⚠️ SINGLE TRUE LABEL ADDED — Still bad behavior

✅ Expected Behavior

💡 Why This Matters

🧪 Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

RAC examples with empty `true_labels` are silently ignored, leading to incorrect classification #21

❌ With `rac_examples` — incorrect, high-confidence result

🟢 Without `rac_examples` — correct behavior