PRX: Open Text-to-Image Generative Model
PRX (Photoroom Experimental) is a 1.3-billion-parameter text-to-image model trained entirely from scratch and released under an Apache 2.0 license.
It is part of Photoroom’s broader effort to open-source the complete process behind training large-scale text-to-image models — covering architecture design, optimization strategies, and post-training alignment. The goal is to make PRX both a strong open baseline and a transparent research reference for those developing or studying diffusion-transformer models.
For more information, please read our announcement blog post.
Model description
PRX is designed to be lightweight yet capable, easy to fine-tune or extend, and fully open.
PRX generates high-quality images from text using a simplified MMDiT architecture where text tokens don’t update through transformer blocks. It uses flow matching with discrete scheduling for efficient sampling and Google’s T5-Gemma-2B-2B-UL2 model for multilingual text encoding. The model has around 1.3B parameters and delivers fast inference without sacrificing quality. You can choose between Flux VAE for balanced quality and speed, or DC-AE for higher latent compression and faster processing.
This card in particular describes Photoroom/prx-1024-t2i-beta, a preview model from our upcoming 1024-pixel family and one of the PRX model variants:
- Resolution: 1024 pixels
- Architecture: PRX (MMDiT-like diffusion transformer variant)
- Latent backbone: Flux's VAE
- Text encoder: T5-Gemma-2B-2B-UL2
- Training stage: Supervised fine-tuning (SFT)
- License: Apache 2.0
For other checkpoints, browse the full PRX collection.
Example usage
You can use PRX directly in Diffusers:
from diffusers.pipelines.prx import PRXPipeline
import torch
pipe = PRXPipeline.from_pretrained(
"Photoroom/prx-1024-t2i-beta",
torch_dtype=torch.bfloat16
).to("cuda")
prompt = "A front-facing portrait of a lion in the golden savanna at sunset"
image = pipe(prompt, num_inference_steps=28, guidance_scale=5.0).images[0]
image.save("lion.png")
Visual examples and demo
Below are a few generations from this model (Photoroom/prx-1024-t2i-beta).

- Prompt
- A close-up portrait in a photography studio, multiple soft light sources creating gradients of shadow on her face, minimal background, cinematic 4 K realism, artistic focus on light and emotion rather than glamour.

- Prompt
- A massive black monolith standing alone in a mirror-like salt flat after rainfall, horizon dissolving into pastel pink and cyan, reflections perfect and infinite, minimalist 2.39:1 frame, cinematic atmosphere of silence, RED Komodo 6K capture, 35 mm lens, ND filter, high dynamic range, ultra-clean tones and soft ambient light.

- Prompt
- In the courtyard of a coastal house, white sheets flap slowly in the wind, a woman pauses between hanging clothes, eyes closed, light flickering through the fabric. A flock of seagulls turns sharply overhead, casting moving shadows on the walls. The sound of waves faintly audible, palette of whites, greys, and sun-bleached blues, evokes transience and memory.

- Prompt
- Rain has just ended on a green plain, puddles glistening under soft sunlight, an astronaut on horseback rides slowly through the mist, a vivid rainbow curving behind distant mountains, cinematic clarity, detailed wet suit reflections, volumetric light, sense of renewal and quiet beauty, captured as a wide 8K cinematic frame.

- Prompt
- A woman standing ankle-deep in the ocean at dawn, gentle waves touching her feet, mist and pastel horizon, cinematic wide composition, calm and contemplative mood, filmic color grading reminiscent of Terrence Malick’s imagery.

- Prompt
- A green hose lies coiled across a sunlit garden path, water still dripping, birdsong faint in the distance. A towel flutters on a clothesline behind it. Everything feels paused mid-movement, like someone left moments ago. Cinematic warm tones, shallow depth, serene but uncanny calm.

- Prompt
- Hundreds of paper lanterns drifting along a quiet river at dusk, soft orange light piercing cold blue mist, reflections trembling across rippled water, camera at water level with shallow DOF, cinematic color contrast of warm and cool tones, shot on Sony Venice 2 with Cooke S4 50 mm lens, f/1.8, ISO 800, graded on Kodak 2383 film LUT.

- Prompt
- Wide aerial shot over a black sand beach in Iceland, massive waves crashing with white foam, dramatic clouds opening to reveal a ray of sunlight, cinematic 16:9 composition, ultra-detailed texture of basalt cliffs, cool desaturated tones, evokes epic solitude.

- Prompt
- Ancient pagoda rising above clouds, morning mist rolling over forested mountains, golden sunrise light illuminating temple roof tiles, cinematic wide-angle composition, ethereal atmosphere, ultra-detailed realism with painterly undertone.
PRX Demo on Hugging Face Spaces — interactive text-to-image demo for Photoroom/prx-1024-t2i-beta.
Training details
PRX models were trained from scratch using recent advances in diffusion and flow-matching training. We experimented with a range of modern techniques for efficiency, stability, and alignment, which we’ll cover in more detail in our upcoming series of research posts:
- Part 0: Overview and release
- Part 1: Design experiments and architecture benchmark
- Part 2: Accelerating training (coming soon)
- Part 3: Post-pretraining (coming soon)
Other PRX models
You can find additional checkpoints in the PRX collection:
- Base – pretrained model before alignment; best starting point for fine-tuning or research
- SFT — supervised fine-tuned model; produces more aesthetically pleasing, ready-to-use generations
- Latent backbones — Flux's and DC-AE VAEs
- Distilled – 8-step generation with LADD
- Resolutions – 256, 512, and 1024 pixels
License
PRX is available under an Apache 2.0 license.
Use restrictions
You must not use PRX models for:
- any of the restricted uses set forth in the Gemma Prohibited Use Policy;
- or any activity that violates applicable laws or regulations.
- Downloads last month
- 738
