Matrix Multiplication: Unleashing the Power of Tensors! ⚡¶

"Behold! The sacred art of matrix multiplication - where dimensions dance and vectors bend to my will!" — Professor Victor py Torchenstein

The Attention Formula (Preview of Things to Come)¶

$$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$$

Where:

$Q$ is the Query matrix
$K$ is the Key matrix
$V$ is the Value matrix
$d_k$ is the dimension of the key vectors
$\text{softmax}$ normalizes the attention weights

Basic Matrix Operations¶

Let's start with the fundamentals before we conquer attention mechanisms!

Element-wise multiplication:

$C_{ij} = A_{ij} \times B_{ij}$

Matrix multiplication: $C_{ij} = \sum_{k} A_{ik} \times B_{kj}$

In [2]:

Copied!





import torch

# Create some matrices for experimentation
A = torch.randn(3, 4)
B = torch.randn(4, 2)

print("Matrix A shape:", A.shape)
print("Matrix B shape:", B.shape)

# Matrix multiplication
C = torch.matmul(A, B)
print("Result C shape:", C.shape)
print("\nMwahahaha! The matrices have been multiplied!")
import torch

# Create some matrices for experimentation
A = torch.randn(3, 4)
B = torch.randn(4, 2)

print("Matrix A shape:", A.shape)
print("Matrix B shape:", B.shape)

# Matrix multiplication
C = torch.matmul(A, B)
print("Result C shape:", C.shape)
print("\nMwahahaha! The matrices have been multiplied!")

Matrix A shape: torch.Size([3, 4])
Matrix B shape: torch.Size([4, 2])
Result C shape: torch.Size([3, 2])

Mwahahaha! The matrices have been multiplied!

PyTorch Matrix Multiplication Methods¶

Professor Torchenstein's arsenal includes multiple ways to multiply matrices:

torch.matmul() - The general matrix multiplication function
@ operator - Pythonic matrix multiplication (same as matmul)
torch.mm() - For 2D matrices only
torch.bmm() - Batch matrix multiplication

Mathematical Foundations¶

For matrices $A \in \mathbb{R}^{m \times n}$ and $B \in \mathbb{R}^{n \times p}$:

$$C = AB \quad \text{where} \quad C_{ij} = \sum_{k=1}^{n} A_{ik} B_{kj}$$

This operation is fundamental to:

Linear transformations
Neural network forward passes
Attention mechanisms in Transformers
And much more! 🧠⚡