← Library
Kvax: fast Flash Attention for JAX with context parallelism
Nebius's open-source Flash Attention 2 implementation for JAX, built on Triton kernels with efficient document-mask computation and context parallelism for FSDP/HSDP-sharded long-sequence training on GPU clusters.aicloud
The full write-up lives on the original source — use the link above to read it.