HiDream Full nf4 quantized #11337

nitinmukesh · 2025-04-16T08:24:13Z

nitinmukesh
Apr 16, 2025

base_repo = "newgenai79/HiDream-I1-Fast-bnb-int4"
text_encoder_2, text_encoder_3 - int4

"hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4"
Meta-Llama-3.1-8B-Instruct-GPTQ - int4

base_repo = "newgenai79/HiDream-I1-Fast-bnb-int4"
transformer - int4

Please NOTE above are pre-quantized model(s) repo.

I couldn't test it as it runs OOM on 16 GB (8+8 GB) even for text encoder
Use the same repo for transformer/vae for inference.
If you don't want to use int4 for text_encoder_2, text_encoder_3, use the original repo by specifying base_repo =

import torch
import gc
import torch._dynamo
torch._dynamo.config.suppress_errors = True
from transformers import PreTrainedTokenizerFast, LlamaForCausalLM
from diffusers import (
    UniPCMultistepScheduler,
    HiDreamImagePipeline,
)
scheduler = UniPCMultistepScheduler(
    flow_shift=3.0,
    prediction_type="flow_prediction",
    use_flow_sigmas=True,
)

tokenizer_4 = PreTrainedTokenizerFast.from_pretrained(
    "hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4"
)
text_encoder_4 = LlamaForCausalLM.from_pretrained(
    "hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4",
    torch_dtype=torch.float16,
    device_map="auto"
)
base_repo = "newgenai79/HiDream-I1-Fast-bnb-int4"
pipe = HiDreamImagePipeline.from_pretrained(
    base_repo,
    scheduler=scheduler,
    tokenizer_4=tokenizer_4,
    text_encoder_4=text_encoder_4,
    transformer=None,
    vae=None,
    torch_dtype=torch.float16,
)
# .to('cuda')

# pipe.enable_sequential_cpu_offload()
pipe.enable_model_cpu_offload()

print("Generating prompt embeddings...")
prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds = pipe.encode_prompt(
    prompt="A cat holding a sign that says \"Hi-Dreams.ai\".",
    prompt_2=None,
    prompt_3=None,
    prompt_4=None,
)
print("Successfully generated prompt embeddings!")
print(prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds)

del pipe
gc.collect()
torch.cuda.empty_cache()

# Create pipeline again for inference

@Skquark
#11231 (comment)

nitinmukesh · 2025-04-16T08:30:16Z

nitinmukesh
Apr 16, 2025
Author

Also GGUF version is available now, if someone know how to use in diffusers. Appreciated if some code can be provided.
https://huggingface.co/calcuis/hidream-gguf

I am confused as there are 4 text encoder and here only 3.
clip_g_hidream_fp32-f16.gguf - 1.39 GB
clip_l_hidream_fp32-f16.gguf - 248 MB
llama-q2_k.gguf - 3.18 GB

Transformer
hidream-i1-full-fp8_e4m3fn.safetensors - 17.1 GB

VAE
pig_flux_schnell_vae_fp32-f16.gguf - 168 MB

2 replies

asomoza Apr 17, 2025
Maintainer

the T5 is there now: t5xxl_fp32-q4_0.gguf

nitinmukesh Apr 17, 2025
Author

the T5 is there now: t5xxl_fp32-q4_0.gguf

Unfortunately not supported in transformers
huggingface/transformers#37555

Skquark · 2025-04-16T10:03:29Z

Skquark
Apr 16, 2025

I'll use that method as an option, but problem while integrating is newgenai only has Fast model, and also needed Dev and Full. The other one I was going to use in the beginning is https://huggingface.co/azaneko/HiDream-I1-Full-nf4 and wondering if that'd be compatible with this method. I'd also love to get the GGUF implementation too, if that gets figured out, kinda liked that format. BTW, this is for my app AEIONic.com if you're curious.

3 replies

nitinmukesh Apr 16, 2025
Author

Looks interesting AEIONic.com.
I am not a developer but I created something for myself
https://github.com/newgenai79/sd-diffuser-webui

Skquark Apr 17, 2025

Wow, you've got a fairly prolific project yourself, some good code in there. Not bad at all, we're sharing most of the same features. I wasn't a big fan of the Gradio UI structure or the unComfy workflows, and I started my project using Flet/Flutter for myself before any of the other UIs came out, and like AI it keeps growing into a beast. I'm a multimedia guy, so I wanted the tools in every category, and like to play with dynamic interfaces, so AEIONic has been fun for me. Just wish it could make money, but that's the nature of using opensource in a competitive free market, so I haven't promoted it very much.
If you have any interest in contributing to my project, that's welcomed. I might pick through your source to implement better optimization methods, and any features I might be missing in mine, like maybe the video interpolation+upscale tool, SwD, Shuttle Jaguar, etc.

nitinmukesh Apr 17, 2025
Author

Thank you. This is what I could manage with my skills. :)
It's not high end app but does the trick for low VRAM users. I myself have very low-end laptop (8 +16) so it takes a lot of time to integrate and test a feature. I am waiting for Modular diffusers to finish and then I will rewrite the app from scratch and make interface very simple.
For monetization part, I had an idea for myself. With all the growing need of VRAM, you can monetize specific component of a model like text encoder or vae. Hlky created a very good function i.e. remote VAE. Also you can manage this functionality on low VRAM GPU with high RAM using apply_group_offload (created by Aryan). I don't think Comfy have this kind of feature but with diffusers you can separate the component(s) and can run on different machines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HiDream Full nf4 quantized #11337

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

HiDream Full nf4 quantized #11337

nitinmukesh Apr 16, 2025

Replies: 2 comments · 5 replies

nitinmukesh Apr 16, 2025 Author

asomoza Apr 17, 2025 Maintainer

nitinmukesh Apr 17, 2025 Author

Skquark Apr 16, 2025

nitinmukesh Apr 16, 2025 Author

Skquark Apr 17, 2025

nitinmukesh Apr 17, 2025 Author

nitinmukesh
Apr 16, 2025

Replies: 2 comments 5 replies

nitinmukesh
Apr 16, 2025
Author

asomoza Apr 17, 2025
Maintainer

nitinmukesh Apr 17, 2025
Author

Skquark
Apr 16, 2025

nitinmukesh Apr 16, 2025
Author

nitinmukesh Apr 17, 2025
Author