Run a FlashInfer example#
2026-02-27
1 min read time
Applies to Linux
The examples folder in the ROCm/flashinfer repository has example code that you can run FlashInfer with. You can save the following code snippet to a Python script once you have FlashInfer installed and run the script to try it out.
Save the following code snippet to a Python script named
flashinfer_example.py.import torch import flashinfer kv_len = 2048 num_kv_heads = 32 head_dim = 128 k = torch.randn(kv_len, num_kv_heads, head_dim).half().to(0) v = torch.randn(kv_len, num_kv_heads, head_dim).half().to(0) # decode attention num_qo_heads = 32 q = torch.randn(num_qo_heads, head_dim).half().to(0) o = flashinfer.single_decode_with_kv_cache(q, k, v) # decode attention without RoPE on-the-fly o_rope_on_the_fly = flashinfer.single_decode_with_kv_cache(q, k, v, pos_encoding_mode="ROPE_LLAMA") # decode with LLaMA style RoPE on-the-fly
Run the script to use FlashInfer.
python flashinfer_example.py