Cache methods speedup diffusion transformers by storing and reusing intermediate outputs of specific layers, such as attention and feedforward layers, instead of recalculating them at each inference step.
[[autodoc]] CacheMixin
[[autodoc]] PyramidAttentionBroadcastConfig
[[autodoc]] apply_pyramid_attention_broadcast
[[autodoc]] FasterCacheConfig
[[autodoc]] apply_faster_cache
[[autodoc]] FirstBlockCacheConfig
[[autodoc]] apply_first_block_cache
[[autodoc]] TaylorSeerCacheConfig
[[autodoc]] apply_taylorseer_cache