CivArchive
    ERNIE‑Image - Image Turbo
    Preview 127733290
    Preview 127733292
    Preview 127733293
    Preview 127733289

    Originally Posted: https://ernie.baidu.com/blog/posts/ernie-image

    ERNIE-Image is an open text-to-image model from the ERNIE-Image team at Baidu. Built on a single-stream Diffusion Transformer (DiT) with 8B parameters in a latent diffusion (LDM) framework, it ships with a lightweight Prompt Enhancer that expands brief inputs into richer, more structured prompts to better unlock the model's capabilities. With only 8B DiT parameters, ERNIE-Image achieves state-of-the-art performance among open weights text-to-image models — and it is built not just for visual appeal, but for controllability: accurate content depiction matters as much as aesthetics. In practice, it excels at complex instruction following, precise text rendering, and structured image generation — areas where many existing open weights models still fall short.

    Key Features

    • Competitive performance at compact scale: With only 8B DiT parameters, ERNIE-Image remains competitive with substantially larger models and achieves leading performance among open weights models on several challenging benchmarks.

    • Precise text rendering: ERNIE-Image handles dense, long-form, and layout-sensitive text especially well, producing readable and faithful results in Chinese, English, and other languages.

    • Robust instruction following: The model reliably handles complex prompts, multi-object relations, and knowledge-intensive descriptions, making it well suited for tasks that demand fine-grained control.

    • Structured visual generation: ERNIE-Image is especially effective on images with clear layout or narrative structure — posters, manga/anime storyboards, multi-panel compositions, and cohesive multi-element visuals.

    • Broad stylistic range: Beyond clean graphic design and illustration-style outputs, the model supports realistic photography and distinctive stylized aesthetics, including softer, more cinematic and film-like tones.

    • Easy to deploy and adapt: Thanks to its compact size, ERNIE-Image runs on consumer-grade hardware (24G VRAM), bringing high-quality image generation within reach for research and production use. The moderate parameter count also makes fine-tuning and adaptation straightforward for researchers and developers.

    Description

    Checkpoint
    Other

    Details

    Downloads
    9
    Platform
    CivitAI
    Platform Status
    Available
    Created
    4/16/2026
    Updated
    4/16/2026
    Deleted
    -

    Files

    ernieImage_imageTurbo.safetensors