LLM Laboratory

Date: 25.12.2025

FlashAttention, NVIDIA GPU architectures, and CUDA — quick cheat sheet

The table below shows compatibility FlashAttention intermidiate versions for each NVIDIA architecture generation.

FlashAttention focused on NVIDIA Ampere and higher architecture:

FA generation FA version CUDA GPU architecture PyTorch
FA1 1.0.9 ≥ 11.4 Turing, Ampere ≥ 1.12
FA2 2.0.9 ≥ 11.4 Ampere, Ada, Hopper ≥ 1.12
FA2 2.1.2.post3 ≥ 11.4 Ampere, Ada, Hopper ≥ 1.12
FA2 2.2.5 ≥ 11.4 Ampere, Ada, Hopper ≥ 1.12
FA2 2.3.6 ≥ 11.4 Ampere, Ada, Hopper ≥ 1.12
FA2 2.4.3.post1 ≥ 11.6 Ampere, Ada, Hopper ≥ 1.12
FA2 2.5.9.post1 ≥ 11.6 Ampere, Ada, Hopper ≥ 1.12
FA2 2.6.3 ≥ 11.6 Ampere, Ada, Hopper ≥ 1.12
FA3 2.7.4.post1 ≥ 12.0 Hopper ≥ 2.2
FA3 2.8.3 ≥ 12.0 Hopper ≥ 2.2