stepfun-ai
/

NextStep-1-Large-Edit

text-generation

Model card Files Files and versions

NextStep-1-Large-Edit / README.md

Malte0621's picture

Fix typo in README.md

b9bc6cc verified 7 months ago

|

3.22 kB

	---
	license: apache-2.0
	---

	## NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

	[Homepage](https://stepfun.ai/research/en/nextstep-1)  \| [GitHub](https://github.com/stepfun-ai/NextStep-1)  \| [Paper](https://github.com/stepfun-ai/NextStep-1/blob/main/nextstep_1_tech_report.pdf)

	We introduce NextStep-1, a 14B autoregressive model paired with a 157M flow matching head, training on discrete text tokens and continuous image tokens with next-token prediction objectives.
	NextStep-1 achieves state-of-the-art performance for autoregressive models in text-to-image generation tasks, exhibiting strong capabilities in high-fidelity image synthesis.

	<div align='center'>
	<img src="assets/teaser.jpg" class="interpolation-image" alt="arch." width="100%" />
	</div>

	## ENV Preparation

	To avoid potential errors when loading and running your models, we recommend using the following settings:

	```shell
	conda create -n nextstep python=3.11 -y
	conda activate nextstep

	pip install uv # optional

	# please check and download requirements.txt in this repo
	uv pip install -r requirements.txt

	# diffusers==0.34.0
	# einops==0.8.1
	# gradio==5.42.0
	# loguru==0.7.3
	# numpy==1.26.4
	# omegaconf==2.3.0
	# Pillow==11.0.0
	# Requests==2.32.4
	# safetensors==0.5.3
	# tabulate==0.9.0
	# torch==2.5.1
	# torchvision==0.20.1
	# tqdm==4.67.1
	# transformers==4.55.0
	```

	## Usage

	```python
	from PIL import Image
	from transformers import AutoTokenizer, AutoModel
	from models.gen_pipeline import NextStepPipeline
	from utils.aspect_ratio import center_crop_arr_with_buckets

	HF_HUB = "stepfun-ai/NextStep-1-Large-Edit"

	# load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained(HF_HUB, local_files_only=True, trust_remote_code=True,force_download=True)
	model = AutoModel.from_pretrained(HF_HUB, local_files_only=True, trust_remote_code=True,force_download=True)
	pipeline = NextStepPipeline(tokenizer=tokenizer, model=model).to(device=f"cuda")

	# set prompts
	positive_prompt = None
	negative_prompt = "Copy original image."
	example_prompt = "<image>" + "Add a pirate hat to the dog's head. Change the background to a stormy sea with dark clouds. Include the text 'NextStep-Edit' in bold white letters at the top portion of the image."

	# load and preprocess reference image
	IMG_SIZE = 512
	ref_image = Image.open("./assets/origin.jpg")
	ref_image = center_crop_arr_with_buckets(ref_image, buckets=[IMG_SIZE])

	# generate edited image
	image = pipeline.generate_image(
	example_prompt,
	images=[ref_image],
	hw=(IMG_SIZE, IMG_SIZE),
	num_images_per_caption=1,
	positive_prompt=positive_prompt,
	negative_prompt=negative_prompt,
	cfg=7.5,
	cfg_img=2,
	cfg_schedule="constant",
	use_norm=True,
	num_sampling_steps=50,
	timesteps_shift=3.2,
	seed=42,
	)[0]
	image.save(f"./assets/output.png")
	```

	## Citation

	If you find NextStep useful for your research and applications, please consider starring this repository and citing:

	```bibtex
	@misc{nextstep_1,
	title={NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale},
	author={NextStep Team},
	year={2025},
	url={https://github.com/stepfun-ai/NextStep-1},
	}
	```