Skip to content

fix(qwen): fix CFG failing when passing neg prompt embeds with none mask#13379

Merged
sayakpaul merged 12 commits intohuggingface:mainfrom
Sunhill666:fix/qwen-image-cfg-mask
Apr 17, 2026
Merged

fix(qwen): fix CFG failing when passing neg prompt embeds with none mask#13379
sayakpaul merged 12 commits intohuggingface:mainfrom
Sunhill666:fix/qwen-image-cfg-mask

Conversation

@Sunhill666
Copy link
Copy Markdown
Contributor

@Sunhill666 Sunhill666 commented Apr 1, 2026

What does this PR do?

Fixes #13377

This PR fixes a bug in QwenImagePipeline and QwenImageEditPipeline where Classifier-Free Guidance (CFG) is incorrectly disabled when users manually pass negative_prompt_embeds alongside a None mask.

Root cause:
The encode_prompt method naturally optimizes an all-ones prompt_embeds_mask by converting and returning it as None. However, the has_neg_prompt condition in the __call__ method strictly required negative_prompt_embeds_mask is not None to activate CFG. This logical mismatch caused do_true_cfg to evaluate to False even when valid negative embeddings were provided.

Changes:

  • Relaxed the has_neg_prompt condition.
  • It now correctly evaluates has_neg_prompt to True as long as negative_prompt_embeds is not None, since a None mask simply implies that no padding/masking is required.

Before submitting

Who can review?

@asomoza @yiyixuxu

@asomoza
Copy link
Copy Markdown
Member

asomoza commented Apr 2, 2026

@claude can you review this?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

Claude Code is working…

I'll analyze this and get back to you.

View job run

@asomoza
Copy link
Copy Markdown
Member

asomoza commented Apr 2, 2026

thanks! @sayakpaul should we add a test for this?

@sayakpaul
Copy link
Copy Markdown
Member

Yes there should be a test for it sure. I think this PR could handle it too.

@Sunhill666 Sunhill666 force-pushed the fix/qwen-image-cfg-mask branch from cc4ea27 to dbe4f95 Compare April 8, 2026 01:58
@github-actions github-actions bot added pipelines size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 8, 2026
@Sunhill666 Sunhill666 force-pushed the fix/qwen-image-cfg-mask branch from 9645089 to dbe4f95 Compare April 8, 2026 02:05
@github-actions github-actions bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 8, 2026
@github-actions github-actions bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels Apr 13, 2026
@github-actions github-actions bot added tests size/S PR with diff < 50 LOC modular-pipelines size/L PR with diff > 200 LOC and removed size/S PR with diff < 50 LOC labels Apr 13, 2026
@Sunhill666
Copy link
Copy Markdown
Contributor Author

@sayakpaul Thanks for the feedback! I completely agree.

I've pushed two commit that addresses the issues and adds the requested tests:

  1. Test Coverage: Added test_true_cfg_without_negative_prompt_embeds_mask to all QwenImage FastTests (base, controlnet, edit, edit_plus, img2img, inpaint).
  2. Fixed Legacy Mask Constraints: While adding the tests, I uncovered that some pipeline variants (like edit, edit_plus, layered, and controlnet_inpaint) had rigid raise ValueError blocks in check_inputs() preventing None masks. Their encode_prompt() also unconditionally called .repeat() on the mask, causing AttributeError when mask was None. I've thoroughly removed these blockers and safely guarded the tensor operations.
  3. Code Consistency: Addressed the review comment by standardizing the model_inputs accesses to uniformly use .get("...") (e.g., input_ids, attention_mask, pixel_values) in the edit pipelines.

Everything passes locally now. Let me know if there's anything else needed!

@github-actions github-actions bot removed size/S PR with diff < 50 LOC size/L PR with diff > 200 LOC labels Apr 16, 2026
@github-actions github-actions bot added the size/L PR with diff > 200 LOC label Apr 16, 2026
@github-actions github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 16, 2026
@Sunhill666 Sunhill666 force-pushed the fix/qwen-image-cfg-mask branch from ee1e84b to 7fa33ee Compare April 16, 2026 14:43
@github-actions github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 16, 2026
@Sunhill666
Copy link
Copy Markdown
Contributor Author

Hi @sayakpaul,

Thanks for the very helpful feedback! I have updated the PR to address your comments. Here is a summary of the changes I just pushed:

  1. Updated check_inputs across all QwenImage pipeline variants to emit a logger.warning(...) when prompt_embeds_mask is provided but negative_prompt_embeds_mask is missing.
  2. Refactored the dummy input setup in test_true_cfg_without_negative_prompt_embeds_mask. Instead of explicitly setting inputs["negative..."] = None, I'm now using inputs.pop("...", None) to more naturally simulate missing arguments without causing confusion.
  3. Replaced the hardcoded "cpu" explicitly with torch_device in the test files.
  4. Corrected the # Copied from markers which previously contained typos holding back make fix-copies. All duplicated code is now properly synchronized.

Let me know if anything else needs adjusting!

@sayakpaul
Copy link
Copy Markdown
Member

@claude could you do a review?

@sayakpaul sayakpaul requested a review from dg845 April 16, 2026 15:12
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 16, 2026

Claude finished @sayakpaul's task in 2m 44s —— View job


Review of PR #13379

  • Read repository guidelines (AGENTS.md, models.md, modular.md)
  • Review PR diff against origin/main
  • Review source changes in src/diffusers/
  • Review test changes
  • Verify # Copied from annotation correctness
  • Post review feedback

Summary

The core fix is correct: relaxing has_neg_prompt from negative_prompt_embeds is not None and negative_prompt_embeds_mask is not None to just negative_prompt_embeds is not None addresses the bug where CFG was silently disabled when encode_prompt optimized an all-ones mask to None. The change is applied consistently across all 9 pipeline variants.

Issues Found

1. Bug: # \Copied from`— malformed annotation inpipeline_qwenimage_controlnet.py`

src/diffusers/pipelines/qwenimage/pipeline_qwenimage_controlnet.py:104

# `Copied from` diffusers.pipelines.qwenimage.pipeline_qwenimage.calculate_shift

The backtick-quoted format `Copied from` will not be recognized by make fix-copies. The standard annotation is # Copied from ... (no backticks). Per AGENTS.md: "Many classes are kept in sync with a source via a # Copied from ... header comment" — this needs to match the exact format the tooling expects.

The pipeline_qwenimage_controlnet_inpaint.py:77 version is correct: # Copied from diffusers.pipelines.qwenimage.pipeline_qwenimage.calculate_shift.

Fix this →

2. Potential silent failure: model_inputs.get("attention_mask") can return None and crash _extract_masked_hidden

src/diffusers/modular_pipelines/qwenimage/encoders.py:128 and :185

split_hidden_states = _extract_masked_hidden(hidden_states, model_inputs.get("attention_mask"))

_extract_masked_hidden calls mask.bool() on its second argument. If the processor returns a BatchFeature without attention_mask (unlikely for text, but the .get() pattern implies it's possible), this will raise AttributeError: 'NoneType' object has no attribute 'bool'. The .get() change for pixel_values and image_grid_thw makes sense (those truly may be absent for text-only inputs), but input_ids and attention_mask should always be present in the processor output for text inputs.

I'd recommend either:

  • Keeping model_inputs.get() only for pixel_values and image_grid_thw (the optional keys), and keeping direct attribute access for input_ids and attention_mask (the required keys), OR
  • Adding a guard before the _extract_masked_hidden call.

The same concern applies to the pipeline-level _get_qwen_prompt_embeds_edit methods in pipeline_qwenimage_edit.py:257, pipeline_qwenimage_edit_plus.py:270, and pipeline_qwenimage_edit_inpaint.py:268.

3. Nit: Missing blank line between raise ValueError and if prompt_embeds in some variants

src/diffusers/pipelines/qwenimage/pipeline_qwenimage.py:312-313, pipeline_qwenimage_controlnet_inpaint.py:359-360, pipeline_qwenimage_edit.py:360-361, pipeline_qwenimage_edit_plus.py:375-376, pipeline_qwenimage_layered.py:385-386:

            )
        if prompt_embeds is not None and prompt_embeds_mask is None:

The previous raise ValueError block ends on ); the next if immediately follows without a blank line. The base pipeline (pipeline_qwenimage.py:312-313) and some variants have this, while pipeline_qwenimage_img2img.py and pipeline_qwenimage_inpaint.py correctly have a blank line. This should be consistent — make style should catch it, but worth verifying.

4. # Copied from annotation correctness — the paths look correct but should be validated with make fix-copies

The renamed annotations (from # Coped from to # Copied from) in pipeline_qwenimage_controlnet.py and pipeline_qwenimage_controlnet_inpaint.py now correctly reference:

  • QwenImagePipeline._extract_masked_hidden
  • QwenImagePipeline._get_qwen_prompt_embeds
  • QwenImagePipeline.encode_prompt

These paths match the actual source locations. However, since these are now proper # Copied from annotations (they were broken before as # Coped from), make fix-copies will start syncing them. Please verify the copied methods are still in sync with their sources — notably the encode_prompt in controlnet_inpaint.py has a slightly different docstring (extra blank line at :282). Running make fix-copies will reveal any drift.

Tests

The tests are well-structured and directly test the fixed behavior. A few observations:

  • The inputs.get("image") if "image" in inputs else None pattern in edit/edit_plus tests (e.g., test_qwenimage_edit.py:256) can be simplified to just inputs.get("image") since .get() already returns None for missing keys.
  • Tests correctly use torch_device (addressing an earlier review comment).
  • Tests exercise the exact scenario described in the issue: passing negative_prompt_embeds without a mask and expecting CFG to still work.

Verdict

The core bug fix is sound and well-applied across all variants. The main items to address before merging:

  1. Fix the backtick typo in pipeline_qwenimage_controlnet.py:104 (this will break make fix-copies sync).
  2. Reconsider using .get() for input_ids and attention_mask — these are required keys and using .get() may mask real errors.
  3. Run make fix-copies to validate the newly corrected # Copied from annotations sync properly.

@github-actions github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 17, 2026
@Sunhill666
Copy link
Copy Markdown
Contributor Author

@sayakpaul I've addressed all the points:

  1. Fixed the malformed # Copied from annotation (removed the backticks).
  2. Reverted .get() usage for required keys: Now using model_inputs["input_ids"] and model_inputs["attention_mask"] directly to avoid silent NoneType failures.
  3. Added missing blank lines for code style consistency.
  4. Simplified tests: Replaced .get("image") if "image" in inputs else None with just .get("image").
  5. Ran make fix-copies and make style successfully to sync the changes.

Comment on lines +315 to +317
logger.warning(
"If `prompt_embeds` are provided, `prompt_embeds_mask` also have to be passed. Make sure to generate `prompt_embeds_mask` from the same text encoder that was used to generate `prompt_embeds`."
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.warning(
"If `prompt_embeds` are provided, `prompt_embeds_mask` also have to be passed. Make sure to generate `prompt_embeds_mask` from the same text encoder that was used to generate `prompt_embeds`."
)
logger.warning(
"`prompt_embeds` is provided and `prompt_embeds_mask` is not provided, so the model will treat all"
" prompt tokens as valid. If `prompt_embeds` contains padding, you should provide the padding mask as"
" `prompt_embeds_mask`. Make sure to generate `prompt_embeds_mask` from the same text encoder that was"
" used to generate `prompt_embeds`."
)

Not sure about the exact wording, but I think the warning here should explain when not passing the corresponding mask is valid, and when passing it is necessary.

Comment on lines +320 to +322
logger.warning(
"If `negative_prompt_embeds` are provided, `negative_prompt_embeds_mask` also have to be passed. Make sure to generate `negative_prompt_embeds_mask` from the same text encoder that was used to generate `negative_prompt_embeds`."
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.warning(
"If `negative_prompt_embeds` are provided, `negative_prompt_embeds_mask` also have to be passed. Make sure to generate `negative_prompt_embeds_mask` from the same text encoder that was used to generate `negative_prompt_embeds`."
)
logger.warning(
"`negative_prompt_embeds` is provided and `negative_prompt_embeds_mask` is not provided, so the model"
" will treat all negative prompt tokens as valid. If `negative_prompt_embeds` contains padding, you"
" should provide the padding mask as `negative_prompt_embeds_mask`. Make sure to generate"
" `negative_prompt_embeds_mask` from the same text encoder that was used to generate"
" `negative_prompt_embeds`."
)

Analogous suggestion to #13379 (comment) for negative_prompt_embeds/negative_prompt_embeds_mask.

Copy link
Copy Markdown
Collaborator

@dg845 dg845 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left some small suggestions :).

@github-actions github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 17, 2026
@Sunhill666
Copy link
Copy Markdown
Contributor Author

Hi @dg845, I've updated the warning messages for both prompt_embeds and negative_prompt_embeds as suggested.

@github-actions github-actions bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 17, 2026
Copy link
Copy Markdown
Collaborator

@dg845 dg845 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@sayakpaul sayakpaul merged commit 8d30d05 into huggingface:main Apr 17, 2026
16 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] QwenImagePipeline silently disables CFG when passing negative_prompt_embeds if mask is None (which encode_prompt returns by default)

5 participants