Archives
All the articles I've archived.
2025 9
December 1
-
matmul
Creating a cuda kernel for image processing
November 4
-
2D Workloads
Creating a cuda kernel for image processing
-
Fused Softmax::P3::Cuda Kernel
Updated:Creating a cuda kernel for fused softmax
-
Fused Softmax::P2::Triton optimization
Debugging triton kernel optimization issue
-
Fused Softmax::P1::Naive & Triton Implementation
Implementing Softmax on Torch and Triton version
October 3
-
Vector Addition::P4::Optimizing
Optimizing Cuda vector addition kernels to match Triton & Torch
-
Vector Addition::P3::Benchmarking
Benchmarking vector addition kernels in Cuda, Triton & Torch
-
Vector Addition::P2::Cuda Kernel
Investing vector addition kernel in Cuda
September 1
-
Vector Addition::P1::Triton Kernel
Investing vector addition kernel in Triton