LoRA MPS

LTX-Video

LTX-Video is a diffusion transformer designed for fast and real-time generation of high-resolution videos from text and images. The main feature of LTX-Video is the Video-VAE. The Video-VAE has a higher pixel to latent compression ratio (1:192) which enables more efficient video data processing and faster generation speed. To support and prevent finer details from being lost during generation, the Video-VAE decoder performs the latent to pixel conversion and the last denoising step.

You can find all the original LTX-Video checkpoints under the Lightricks organization.

[!TIP] Click on the LTX-Video models in the right sidebar for more examples of other video generation tasks.

The example below demonstrates how to generate a video optimized for memory or inference speed.

Refer to the [Reduce memory usage](../../optimization/memory) guide for more details about the various memory saving techniques. The LTX-Video model below requires ~10GB of VRAM. ```py import torch from diffusers import LTXPipeline, AutoModel from diffusers.hooks import apply_group_offloading from diffusers.utils import export_to_video # fp8 layerwise weight-casting transformer = AutoModel.from_pretrained( "Lightricks/LTX-Video", subfolder="transformer", torch_dtype=torch.bfloat16 ) transformer.enable_layerwise_casting( storage_dtype=torch.float8_e4m3fn, compute_dtype=torch.bfloat16 ) pipeline = LTXPipeline.from_pretrained("Lightricks/LTX-Video", transformer=transformer, torch_dtype=torch.bfloat16) # group-offloading onload_device = torch.device("cuda") offload_device = torch.device("cpu") pipeline.transformer.enable_group_offload(onload_device=onload_device, offload_device=offload_device, offload_type="leaf_level", use_stream=True) apply_group_offloading(pipeline.text_encoder, onload_device=onload_device, offload_type="block_level", num_blocks_per_group=2) apply_group_offloading(pipeline.vae, onload_device=onload_device, offload_type="leaf_level") prompt = """ A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage """ negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted" video = pipeline( prompt=prompt, negative_prompt=negative_prompt, width=768, height=512, num_frames=161, decode_timestep=0.03, decode_noise_scale=0.025, num_inference_steps=50, ).frames[0] export_to_video(video, "output.mp4", fps=24) ``` [Compilation](../../optimization/fp16#torchcompile) is slow the first time but subsequent calls to the pipeline are faster. [Caching](../../optimization/cache) may also speed up inference by storing and reusing intermediate outputs. ```py import torch from diffusers import LTXPipeline from diffusers.utils import export_to_video pipeline = LTXPipeline.from_pretrained( "Lightricks/LTX-Video", torch_dtype=torch.bfloat16 ) # torch.compile pipeline.transformer.to(memory_format=torch.channels_last) pipeline.transformer = torch.compile( pipeline.transformer, mode="max-autotune", fullgraph=True ) prompt = """ A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage """ negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted" video = pipeline( prompt=prompt, negative_prompt=negative_prompt, width=768, height=512, num_frames=161, decode_timestep=0.03, decode_noise_scale=0.025, num_inference_steps=50, ).frames[0] export_to_video(video, "output.mp4", fps=24) ```

Notes

LTXPipeline

[[autodoc]] LTXPipeline

LTXImageToVideoPipeline

[[autodoc]] LTXImageToVideoPipeline

LTXConditionPipeline

[[autodoc]] LTXConditionPipeline

LTXLatentUpsamplePipeline

[[autodoc]] LTXLatentUpsamplePipeline

LTXPipelineOutput

[[autodoc]] pipelines.ltx.pipeline_output.LTXPipelineOutput