Tensor Metamorphosis: Shape-Shifting Mastery¶

Module 1 | Lesson 2b

Professor Torchenstein's Grand Directive¶

Ah, my brilliant apprentice! Do you feel it? That electric tingle of mastery coursing through your neural pathways? You have learned to slice tensors with surgical precision and fuse them into magnificent constructions! But now... NOW we transcend mere cutting and pasting!

Today, we unlock the ultimate power: METAMORPHOSIS! We shall transform the very essence of tensor structure without disturbing a single precious datum within! Think of it as the most elegant magic—changing form while preserving the soul!

"Behold! We shall reshape() reality itself and make dimensions unsqueeze() from the void! The tensors... they will obey our geometric commands!"

Torchenstein holding motherboard

Your Mission Briefing¶

By the time you emerge from this metamorphosis chamber, you will command the arcane arts of:

🔄 The Great Reshape & View Metamorphosis: Transform tensor structures with torch.reshape() and torch.view() while understanding memory layout secrets.
🗜️ The Squeeze & Unsqueeze Dimension Dance: Add and remove dimensions of size 1 with surgical precision using squeeze() and unsqueeze().
📊 Specialized Shape Sorcery: Flatten complex structures into submission with torch.flatten() and restore them with torch.unflatten().

Estimated Time to Completion: 20 minutes of pure shape-shifting enlightenment.

What You'll Need:

The wisdom from our previous experiments: tensor summoning and tensor surgery.
A willingness to bend reality to your computational will!
Your PyTorch laboratory, humming with metamorphic potential.

Part 1: Memory Layout Foundations 🧱¶

The Deep Theory Behind Memory Layout Magic¶

Ah, my curious apprentice! To truly master tensor metamorphosis, you must understand the fundamental secret that lies beneath: how tensors live in your computer's memory! This knowledge will separate you from the mere code-monkeys and elevate you to the ranks of true PyTorch sorcerers!

The Universal Truth: Everything is a 1D Array! 📏 It is just a long, sequential line of storage locations:

Computer Memory (Always 1D):
[addr_0][addr_1][addr_2][addr_3][addr_4][addr_5][addr_6][addr_7]...

The Multi-Dimensional Illusion: When we have a "2D tensor" or "3D tensor," it's really just our interpretation of how to read this 1D memory! The computer doesn't care about rows and columns—that's just how WE choose to organize and access the data.

Row-Major vs Column-Major: The Ancient Battle! ⚔️¶

There are two ways to store multi-dimensional data in this 1D memory:

🇨 Row-Major (C-style) - PyTorch's Choice: Store data row by row, left to right, then move to the next row.

🇫 Column-Major (Fortran-style):
Store data column by column, top to bottom, then move to the next column.

Let's visualize this with a 3×4 matrix containing numbers 1-12:

Visual Matrix:
[ 1  2  3  4]
[ 5  6  7  8]  
[ 9 10 11 12]

Row-Major Memory Layout (PyTorch default):
Memory: [1][2][3][4][5][6][7][8][9][10][11][12]
        └─  row1  ─┘└─  row2  ─┘└─   row3   ──┘

Column-Major Memory Layout (Not PyTorch):
Memory: [1][5][9][2][6][10][3][7][11][4][8][12]
        └ col1  ┘└ col2   ┘└─ col3 ─┘└─ col4 ─┘

PyTorch uses Row-Major because it's the standard for C/C++ and most modern systems! This is not dependent on your OS or hardware—it's a software design choice.

What Makes Memory "Contiguous"? 🧩¶

Contiguous Memory access: You try to read the tensor's elements in the expected sequential order in the 1D memory array.

Non-Contiguous Memory access: You try to get the tensor's elements which are scattered—they exist in memory but not in the order you'd expect when reading row by row.

The Transpose Tragedy - Why Memory Becomes Non-Contiguous¶

Let's witness the moment when contiguous memory becomes scattered:

Original 3×4 Tensor (Contiguous):
Visual:           Memory Layout:
[ 1  2  3  4]     [1][2][3][4][5][6][7][8][9][10][11][12]
[ 5  6  7  8]  →  
[ 9 10 11 12]    

After Transpose to 4×3 (Non-Contiguous):
Visual:          Expected Memory for New Shape:
[ 1  5  9]       [1][5][9][2][6][10][3][7][11][4][8][12]
[ 2  6 10]  
[ 3  7 11]       But ACTUAL memory is still:
[ 4  8 12]       [1][2][3][4][5][6][7][8][9][10][11][12]

The Problem: To read row 1 of the transposed tensor [1, 5, 9], PyTorch must jump around in memory: address 0 → address 4 → address 8. This "jumping around" makes it non-contiguous!

In [1]:

Copied!





import torch

# Set the seed for cosmic consistency
torch.manual_seed(42)

print("🔬 MEMORY LAYOUT IN ACTION - ROW-MAJOR DEMONSTRATION")
print("=" * 65)

# Create our test subject: numbers 1-12 in sequential memory
data = torch.arange(1, 13)  
print("🧠 Raw Data in Computer Memory (1D Reality):")
print(f"   Memory: {data.tolist()}")
print(f"   Shape: {data.shape} ← This is how it ACTUALLY lives!")

print(f"\n📐 ROW-MAJOR INTERPRETATION AS 3×4 MATRIX:")
matrix_3x4 = data.reshape(3, 4)
print(f"   Same memory: {data.tolist()}")
print(f"   But interpreted as 3×4:")
print(matrix_3x4)
print(f"   💡 Row 1: [1,2,3,4] from memory positions 0-3")
print(f"   💡 Row 2: [5,6,7,8] from memory positions 4-7")
print(f"   💡 Row 3: [9,10,11,12] from memory positions 8-11")

print(f"\n🔄 DIFFERENT INTERPRETATION: 4×3 MATRIX:")
matrix_4x3 = data.reshape(4, 3)  
print(f"   Same memory: {data.tolist()}")
print(f"   But interpreted as 4×3:")
print(matrix_4x3)
print(f"   💡 Row 1: [1,2,3], Row 2: [4,5,6], Row 3: [7,8,9], Row 4: [10,11,12]")

print(f"\n✨ THE FUNDAMENTAL INSIGHT:")
print(f"   - Memory never changes: {data.tolist()}")
print(f"   - Only our INTERPRETATION changes!")
print(f"   - This is the foundation of tensor metamorphosis!")
import torch

# Set the seed for cosmic consistency
torch.manual_seed(42)

print("🔬 MEMORY LAYOUT IN ACTION - ROW-MAJOR DEMONSTRATION")
print("=" * 65)

# Create our test subject: numbers 1-12 in sequential memory
data = torch.arange(1, 13)  
print("🧠 Raw Data in Computer Memory (1D Reality):")
print(f"   Memory: {data.tolist()}")
print(f"   Shape: {data.shape} ← This is how it ACTUALLY lives!")

print(f"\n📐 ROW-MAJOR INTERPRETATION AS 3×4 MATRIX:")
matrix_3x4 = data.reshape(3, 4)
print(f"   Same memory: {data.tolist()}")
print(f"   But interpreted as 3×4:")
print(matrix_3x4)
print(f"   💡 Row 1: [1,2,3,4] from memory positions 0-3")
print(f"   💡 Row 2: [5,6,7,8] from memory positions 4-7")
print(f"   💡 Row 3: [9,10,11,12] from memory positions 8-11")

print(f"\n🔄 DIFFERENT INTERPRETATION: 4×3 MATRIX:")
matrix_4x3 = data.reshape(4, 3)  
print(f"   Same memory: {data.tolist()}")
print(f"   But interpreted as 4×3:")
print(matrix_4x3)
print(f"   💡 Row 1: [1,2,3], Row 2: [4,5,6], Row 3: [7,8,9], Row 4: [10,11,12]")

print(f"\n✨ THE FUNDAMENTAL INSIGHT:")
print(f"   - Memory never changes: {data.tolist()}")
print(f"   - Only our INTERPRETATION changes!")
print(f"   - This is the foundation of tensor metamorphosis!")

🔬 MEMORY LAYOUT IN ACTION - ROW-MAJOR DEMONSTRATION
=================================================================
🧠 Raw Data in Computer Memory (1D Reality):
   Memory: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
   Shape: torch.Size([12]) ← This is how it ACTUALLY lives!

📐 ROW-MAJOR INTERPRETATION AS 3×4 MATRIX:
   Same memory: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
   But interpreted as 3×4:
tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])
   💡 Row 1: [1,2,3,4] from memory positions 0-3
   💡 Row 2: [5,6,7,8] from memory positions 4-7
   💡 Row 3: [9,10,11,12] from memory positions 8-11

🔄 DIFFERENT INTERPRETATION: 4×3 MATRIX:
   Same memory: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
   But interpreted as 4×3:
tensor([[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12]])
   💡 Row 1: [1,2,3], Row 2: [4,5,6], Row 3: [7,8,9], Row 4: [10,11,12]

✨ THE FUNDAMENTAL INSIGHT:
   - Memory never changes: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
   - Only our INTERPRETATION changes!
   - This is the foundation of tensor metamorphosis!

🧟‍♂️ Remember, dear tensor alchemist! ✨

Shape and form are but illusions!
The memory remains unchanged—it's only our interpretation that morphs!

– Prof. Torchenstein

PyTorch's Memory Management System 🏭¶

Now that you understand how memory is fundamentally organized, prepare to witness PyTorch's DIABOLICAL system for managing that memory! This is where the magic happens, my apprentice—where PyTorch transforms from a simple library into a memory manipulation GENIUS!

🧠 The Trinity of Tensor Existence - Storage, Data Pointers, and Views¶

PyTorch has crafted an elegant three-tier system to encapsulate how tensor data lives, breathes, and transforms in memory. Understanding this trinity will separate you from the memory-blind masses forever!

🎭 tensor - The Mask of Interpretation

What it REALLY is: Your personal window into the memory abyss! A tensor is merely an interpretation layer that can represent the entire memory buffer, a clever view of it, or just a slice of the underlying numerical reality.
The Secret: Multiple tensors can wear different masks while peering into the SAME underlying memory vault!

📦 tensor.storage() - The Memory Vault Master

What it is: PyTorch's high-level Storage object—the supreme overlord that commands the actual data buffer in the memory depths!
When shared: Multiple tensor minions can pledge allegiance to the same Storage master, but each can gaze upon different regions of its domain (like examining different rows of the same data matrix)
Think of it as: The entire memory palace that hoards all your numerical treasures, while individual tensors are merely different keys to access various chambers within!

🎯 tensor.data_ptr() - The Exact Memory Coordinates

What it is: The raw memory address (a cold, hard integer) that points to the EXACT byte where this particular tensor's data journey begins in the vast memory ocean!
When different: When tensors are views gazing upon different territories of the same memory kingdom (like viewing different slices of the same storage empire)
Think of it as: The precise GPS coordinates within the memory warehouse—while .storage() tells you which warehouse, .data_ptr() tells you the exact shelf, row, and position!

⚡ The Torchenstein Memory Hierarchy:

🏰 Computer Memory (The Kingdom)
  └── 📦 Storage Object (The Memory Palace)  
      ├── 🎯 data_ptr() #1 (Throne Room) ← tensor_a points here
      ├── 🎯 data_ptr() #2 (Armory) ← tensor_b[10:] points here  
      └── 🎯 data_ptr() #3 (Treasury) ← tensor_c.view(...) points here

💡 The Memory Sharing Conspiracy Matrix:

Scenario	Same Storage?	Same data_ptr?	What's Really Happening	Example
True Copy	❌ No	❌ No	Complete independence—separate kingdoms!	`tensor.clone()`
Shape Change	✅ Yes	✅ Yes	Same palace, same throne room, different interpretation	`tensor.reshape(3,4)`
Slice View	✅ Yes	❌ No	Same palace, different room within it	`tensor[2:]`

The ultimate truth: PyTorch's genius lies in maximizing memory sharing while maintaining the illusion of independence! Mwahahaha!

Let's witness this diabolical PyTorch memory system in action and see the conspiracy unfold!

In [6]:

Copied!





print("🏭 PYTORCH'S MEMORY MANAGEMENT IN ACTION")
print("=" * 55)

# Create original tensor  
original = torch.arange(1, 13)
print(f"Original tensor: {original}")

# Scenario 1: Shape change (should share storage AND data_ptr)
reshaped = original.reshape(3, 4)
print(f"\n📐 SCENARIO 1: Shape Change (reshape)")
print(f"   Reshaped: \n{reshaped}")

print(f"   📦 Same storage? {original.storage().data_ptr()==reshaped.storage().data_ptr()} ")
print(f"\toriginal.storage().data_ptr()={original.storage().data_ptr()} \n\treshaped.storage().data_ptr()={reshaped.storage().data_ptr()}")
print(f"   🎯 Same data_ptr? {original.data_ptr() == reshaped.data_ptr()}")
print(f"\toriginal.data_ptr()={original.data_ptr()} \n\treshaped.data_ptr()={reshaped.data_ptr()}")

# Scenario 2: Slice view (should share storage but DIFFERENT data_ptr)
sliced = original[4:]  # Elements from index 4 onwards
print(f"\n✂️ SCENARIO 2: Slice View")
print(f"   Sliced tensor: {sliced}")
print(f"   📦 Same storage? {original.storage().data_ptr() == sliced.storage().data_ptr()}")
print(f"\toriginal.storage().data_ptr()={original.storage().data_ptr()} \n\tsliced.storage().data_ptr()={sliced.storage().data_ptr()}")
print(f"   🎯 Same data_ptr? {original.data_ptr() == sliced.data_ptr()}")
print(f"\toriginal.data_ptr()={original.data_ptr()} \n\tsliced.data_ptr()={sliced.data_ptr()}")

# Calculate the offset for sliced tensor
element_size = original.element_size()
offset = sliced.data_ptr() - original.data_ptr()
print(f"   🧮 Memory offset: {offset} bytes = {offset // element_size} elements")

# Scenario 3: True copy (different storage AND data_ptr)
copied = original.clone()
print(f"\n📋 SCENARIO 3: True Copy (clone)")
print(f"   Cloned tensor: {copied}")
print(f"   📦 Same storage? {original.storage().data_ptr() == copied.storage().data_ptr()}")
print(f"   🎯 Same data_ptr? {original.data_ptr() == copied.data_ptr()}")

print(f"\n💡 PYTORCH'S MEMORY EFFICIENCY:")
print(f"   - Reshape: FREE! (same memory, different interpretation)")
print(f"   - Slice: EFFICIENT! (same memory, different starting point)")  
print(f"   - Clone: EXPENSIVE! (new memory allocation)")
print("🏭 PYTORCH'S MEMORY MANAGEMENT IN ACTION")
print("=" * 55)

# Create original tensor  
original = torch.arange(1, 13)
print(f"Original tensor: {original}")

# Scenario 1: Shape change (should share storage AND data_ptr)
reshaped = original.reshape(3, 4)
print(f"\n📐 SCENARIO 1: Shape Change (reshape)")
print(f"   Reshaped: \n{reshaped}")

print(f"   📦 Same storage? {original.storage().data_ptr()==reshaped.storage().data_ptr()} ")
print(f"\toriginal.storage().data_ptr()={original.storage().data_ptr()} \n\treshaped.storage().data_ptr()={reshaped.storage().data_ptr()}")
print(f"   🎯 Same data_ptr? {original.data_ptr() == reshaped.data_ptr()}")
print(f"\toriginal.data_ptr()={original.data_ptr()} \n\treshaped.data_ptr()={reshaped.data_ptr()}")

# Scenario 2: Slice view (should share storage but DIFFERENT data_ptr)
sliced = original[4:]  # Elements from index 4 onwards
print(f"\n✂️ SCENARIO 2: Slice View")
print(f"   Sliced tensor: {sliced}")
print(f"   📦 Same storage? {original.storage().data_ptr() == sliced.storage().data_ptr()}")
print(f"\toriginal.storage().data_ptr()={original.storage().data_ptr()} \n\tsliced.storage().data_ptr()={sliced.storage().data_ptr()}")
print(f"   🎯 Same data_ptr? {original.data_ptr() == sliced.data_ptr()}")
print(f"\toriginal.data_ptr()={original.data_ptr()} \n\tsliced.data_ptr()={sliced.data_ptr()}")

# Calculate the offset for sliced tensor
element_size = original.element_size()
offset = sliced.data_ptr() - original.data_ptr()
print(f"   🧮 Memory offset: {offset} bytes = {offset // element_size} elements")

# Scenario 3: True copy (different storage AND data_ptr)
copied = original.clone()
print(f"\n📋 SCENARIO 3: True Copy (clone)")
print(f"   Cloned tensor: {copied}")
print(f"   📦 Same storage? {original.storage().data_ptr() == copied.storage().data_ptr()}")
print(f"   🎯 Same data_ptr? {original.data_ptr() == copied.data_ptr()}")

print(f"\n💡 PYTORCH'S MEMORY EFFICIENCY:")
print(f"   - Reshape: FREE! (same memory, different interpretation)")
print(f"   - Slice: EFFICIENT! (same memory, different starting point)")  
print(f"   - Clone: EXPENSIVE! (new memory allocation)")

🏭 PYTORCH'S MEMORY MANAGEMENT IN ACTION
=======================================================
Original tensor: tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

📐 SCENARIO 1: Shape Change (reshape)
   Reshaped: 
tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])
   📦 Same storage? True 
	original.storage().data_ptr()=2814071805632 
	reshaped.storage().data_ptr()=2814071805632
   🎯 Same data_ptr? True
	original.data_ptr()=2814071805632 
	reshaped.data_ptr()=2814071805632

✂️ SCENARIO 2: Slice View
   Sliced tensor: tensor([ 5,  6,  7,  8,  9, 10, 11, 12])
   📦 Same storage? True
	original.storage().data_ptr()=2814071805632 
	sliced.storage().data_ptr()=2814071805632
   🎯 Same data_ptr? False
	original.data_ptr()=2814071805632 
	sliced.data_ptr()=2814071805664
   🧮 Memory offset: 32 bytes = 4 elements

📋 SCENARIO 3: True Copy (clone)
   Cloned tensor: tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])
   📦 Same storage? False
   🎯 Same data_ptr? False

💡 PYTORCH'S MEMORY EFFICIENCY:
   - Reshape: FREE! (same memory, different interpretation)
   - Slice: EFFICIENT! (same memory, different starting point)
   - Clone: EXPENSIVE! (new memory allocation)

torch.view() - The Memory-Efficient Shape Changer 👁️¶

Now that we understand WHY we need shape transformation, let's master the first tool: torch.view()!

🎯 What is torch.view() and What is it FOR?¶

torch.view() is PyTorch's memory-efficient shape transformation method. It creates a new tensor with a different shape that shares the same underlying data as the original tensor.

🚀 Use view() when:

You want maximum performance (no data copying)
You know your tensor has contiguous memory layout
You need guaranteed memory sharing (changes to one tensor affect the other)

⚠️ Limitations:

Requires contiguous memory - fails if memory is scattered
Throws error rather than automatically fixing problems
Purist approach - no fallback mechanisms

📐 How view() Works: The Shape Mathematics¶

The Golden Rule: Total elements must remain constant!

Original shape: (A, B, C, D)  → Total elements: A × B × C × D
New shape:      (W, X, Y, Z)  → Total elements: W × X × Y × Z

Valid only if: A × B × C × D = W × X × Y × Z

🔢 The Magic -1 Parameter: Use -1 in one dimension to let PyTorch calculate it automatically:

tensor.view(batch_size, -1)  # PyTorch figures out the second dimension

Let's see view() in action with real examples!

In [ ]:

Copied!





print("👁️ TORCH.VIEW() MASTERCLASS")
print("=" * 40)

# Create a contiguous tensor for our experiments  
data = torch.arange(24)  # 24 elements: 0, 1, 2, ..., 23
print(f"Original data: {data}")
print(f"Shape: {data.shape}, Elements: {data.numel()}")

print(f"\n✅ SUCCESS SCENARIOS - view() works perfectly:")

# Scenario 1: 1D to 2D
matrix_4x6 = data.view(4, 6)
print(f"   1D→2D: {data.shape} → {matrix_4x6.shape}")
print(f"   Calculation: 24 elements = 4×6? {4*6 == 24} ✓")

# Scenario 2: Using -1 for automatic calculation
auto_matrix = data.view(3, -1)  # PyTorch calculates: 24/3 = 8
print(f"   Auto-calc: {data.shape} → {auto_matrix.shape}")
print(f"   PyTorch figured out: 24/3 = 8")

# Scenario 3: 1D to 3D (more complex)
cube_2x3x4 = data.view(2, 3, 4)
print(f"   1D→3D: {data.shape} → {cube_2x3x4.shape}")
print(f"   Calculation: 24 elements = 2×3×4? {2*3*4 == 24} ✓")

# Scenario 4: Memory sharing verification
print(f"\n🔗 MEMORY SHARING TEST:")
print(f"   Original data_ptr: {data.data_ptr()}")
print(f"   Matrix data_ptr:   {matrix_4x6.data_ptr()}")  
print(f"   Same memory? {data.data_ptr() == matrix_4x6.data_ptr()} ✓")

# Modify original - should affect the view!
data[0] = 999
print(f"   Changed data[0] to 999...")
print(f"   Matrix[0,0] is now: {matrix_4x6[0,0]} (shares memory!)")
print("👁️ TORCH.VIEW() MASTERCLASS")
print("=" * 40)

# Create a contiguous tensor for our experiments  
data = torch.arange(24)  # 24 elements: 0, 1, 2, ..., 23
print(f"Original data: {data}")
print(f"Shape: {data.shape}, Elements: {data.numel()}")

print(f"\n✅ SUCCESS SCENARIOS - view() works perfectly:")

# Scenario 1: 1D to 2D
matrix_4x6 = data.view(4, 6)
print(f"   1D→2D: {data.shape} → {matrix_4x6.shape}")
print(f"   Calculation: 24 elements = 4×6? {4*6 == 24} ✓")

# Scenario 2: Using -1 for automatic calculation
auto_matrix = data.view(3, -1)  # PyTorch calculates: 24/3 = 8
print(f"   Auto-calc: {data.shape} → {auto_matrix.shape}")
print(f"   PyTorch figured out: 24/3 = 8")

# Scenario 3: 1D to 3D (more complex)
cube_2x3x4 = data.view(2, 3, 4)
print(f"   1D→3D: {data.shape} → {cube_2x3x4.shape}")
print(f"   Calculation: 24 elements = 2×3×4? {2*3*4 == 24} ✓")

# Scenario 4: Memory sharing verification
print(f"\n🔗 MEMORY SHARING TEST:")
print(f"   Original data_ptr: {data.data_ptr()}")
print(f"   Matrix data_ptr:   {matrix_4x6.data_ptr()}")  
print(f"   Same memory? {data.data_ptr() == matrix_4x6.data_ptr()} ✓")

# Modify original - should affect the view!
data[0] = 999
print(f"   Changed data[0] to 999...")
print(f"   Matrix[0,0] is now: {matrix_4x6[0,0]} (shares memory!)")

In [9]:

Copied!





print(f"\n❌ FAILURE SCENARIOS - view() throws errors:")

# Reset data
data = torch.arange(24) 

# Error 1: Impossible shape (wrong total elements)
try:
    impossible = data.view(5, 5)  # 5×5=25, but we have 24 elements
    print("   Impossible shape: Success?!")
except RuntimeError as e:
    print(f"   ❌ Impossible shape (5×5=25≠24): {str(e)[:50]}...")

# Error 2: Non-contiguous memory (after transpose)
matrix = data.view(4, 6)
transposed = matrix.t()  # Creates non-contiguous memory
print(f"   Non-contiguous tensor: {transposed.is_contiguous()}")
try:
    flattened = transposed.view(-1)
    print("   view() on non-contiguous: Success?!")
except RuntimeError as e:
    print(f"   ❌ Non-contiguous memory: {str(e)}...")
print(f"\n❌ FAILURE SCENARIOS - view() throws errors:")

# Reset data
data = torch.arange(24) 

# Error 1: Impossible shape (wrong total elements)
try:
    impossible = data.view(5, 5)  # 5×5=25, but we have 24 elements
    print("   Impossible shape: Success?!")
except RuntimeError as e:
    print(f"   ❌ Impossible shape (5×5=25≠24): {str(e)[:50]}...")

# Error 2: Non-contiguous memory (after transpose)
matrix = data.view(4, 6)
transposed = matrix.t()  # Creates non-contiguous memory
print(f"   Non-contiguous tensor: {transposed.is_contiguous()}")
try:
    flattened = transposed.view(-1)
    print("   view() on non-contiguous: Success?!")
except RuntimeError as e:
    print(f"   ❌ Non-contiguous memory: {str(e)}...")

❌ FAILURE SCENARIOS - view() throws errors:
   ❌ Impossible shape (5×5=25≠24): shape '[5, 5]' is invalid for input of size 24...
   Non-contiguous tensor: False
   ❌ Non-contiguous memory: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead....

torch.reshape() - The Diplomatic Shape Changer 🤝¶

Now let's master torch.reshape() - the more forgiving, intelligent cousin of view()!

🎯 What is torch.reshape() and What is it FOR?¶

torch.reshape() is PyTorch's diplomatic shape transformation method. It tries to return a view when possible, but creates a copy when necessary to ensure the operation always succeeds.

🤝 Use reshape() when:

You want reliability over maximum performance
You're not sure if your tensor memory is contiguous
You want PyTorch to handle memory layout automatically
You're prototyping and want to avoid memory errors

✅ Advantages:

Always succeeds (if the shape math is valid)
Automatically handles contiguous vs non-contiguous memory
Beginner-friendly - less likely to cause frustrating errors
Smart fallback - returns view when possible, copy when necessary

⚠️ Trade-offs:

Less predictable performance - you don't know if it creates a copy
Potentially slower than view() in some cases
Less explicit about memory sharing

📊 reshape() vs view() - When to Use Which?¶

Scenario	Use `view()`	Use `reshape()`
Performance critical	✅ Guaranteed no copying	❌ Might copy data
Beginner-friendly	❌ Can throw errors	✅ Always works
Prototyping	❌ Interrupts workflow	✅ Smooth development
Production code	✅ Predictable behavior	⚠️ Less predictable
Memory sharing required	✅ Guaranteed sharing	⚠️ Depends on layout

Let's see how reshape() handles the scenarios where view() fails!

In [ ]:

Copied!





print("🤝 TORCH.RESHAPE() - THE DIPLOMATIC SOLUTION")
print("=" * 52)

# Create test data
data = torch.arange(24)
print(f"Original data: {data.shape} → {data[:6].tolist()}... (24 elements)")

print(f"\n✅ SCENARIO 1: Contiguous tensor (reshape returns view)")
matrix_4x6 = data.reshape(4, 6)
print(f"   Original data_ptr: {data.data_ptr()}")
print(f"   Reshaped data_ptr: {matrix_4x6.data_ptr()}")
print(f"   Same memory (view)? {data.data_ptr() == matrix_4x6.data_ptr()} ✓")

print(f"\n⚠️ SCENARIO 2: Non-contiguous tensor (reshape creates copy)")
# First transpose to make it non-contiguous
transposed = matrix_4x6.t()  # Now 6x4, non-contiguous
print(f"   Transposed contiguous? {transposed.is_contiguous()}")

# Now reshape the non-contiguous tensor
flattened = transposed.reshape(-1)  # This works! (unlike view)
print(f"   Transposed data_ptr: {transposed.data_ptr()}")
print(f"   Reshaped data_ptr:   {flattened.data_ptr()}")
print(f"   Same memory? {transposed.data_ptr() == flattened.data_ptr()}")
print(f"   Conclusion: reshape() created a COPY to make it work ✓")

print(f"\n🆚 DIRECT COMPARISON: view() vs reshape()")
print("   Testing on the same non-contiguous tensor...")

# Test view() - should FAIL
try:
    view_result = transposed.view(-1)
    print("   view(): SUCCESS (unexpected!)")
except RuntimeError as e:
    print(f"   view(): FAILED ❌ - {str(e)[:40]}...")

# Test reshape() - should SUCCEED  
try:
    reshape_result = transposed.reshape(-1)
    print(f"   reshape(): SUCCESS ✅ - Shape: {reshape_result.shape}")
except RuntimeError as e:
    print(f"   reshape(): FAILED - {e}")

print(f"\n🔍 INVESTIGATING: When does reshape() return view vs copy?")

# Case 1: Simple reshape of contiguous tensor
simple_data = torch.arange(12)
reshaped_simple = simple_data.reshape(3, 4)
shares_memory_1 = simple_data.data_ptr() == reshaped_simple.data_ptr()
print(f"   Contiguous reshape → View: {shares_memory_1}")

# Case 2: Reshape after making non-contiguous
non_contig = reshaped_simple.t()  # Non-contiguous
reshaped_non_contig = non_contig.reshape(-1)
shares_memory_2 = non_contig.data_ptr() == reshaped_non_contig.data_ptr()
print(f"   Non-contiguous reshape → View: {shares_memory_2} (Creates copy)")

print(f"\n💡 RESHAPE() WISDOM:")
print(f"   1. Always succeeds (if math is valid)")
print(f"   2. Returns view when memory layout allows")
print(f"   3. Creates copy when necessary")
print(f"   4. Perfect for beginners and prototyping")
print(f"   5. Use view() only when you need guaranteed performance")
print("🤝 TORCH.RESHAPE() - THE DIPLOMATIC SOLUTION")
print("=" * 52)

# Create test data
data = torch.arange(24)
print(f"Original data: {data.shape} → {data[:6].tolist()}... (24 elements)")

print(f"\n✅ SCENARIO 1: Contiguous tensor (reshape returns view)")
matrix_4x6 = data.reshape(4, 6)
print(f"   Original data_ptr: {data.data_ptr()}")
print(f"   Reshaped data_ptr: {matrix_4x6.data_ptr()}")
print(f"   Same memory (view)? {data.data_ptr() == matrix_4x6.data_ptr()} ✓")

print(f"\n⚠️ SCENARIO 2: Non-contiguous tensor (reshape creates copy)")
# First transpose to make it non-contiguous
transposed = matrix_4x6.t()  # Now 6x4, non-contiguous
print(f"   Transposed contiguous? {transposed.is_contiguous()}")

# Now reshape the non-contiguous tensor
flattened = transposed.reshape(-1)  # This works! (unlike view)
print(f"   Transposed data_ptr: {transposed.data_ptr()}")
print(f"   Reshaped data_ptr:   {flattened.data_ptr()}")
print(f"   Same memory? {transposed.data_ptr() == flattened.data_ptr()}")
print(f"   Conclusion: reshape() created a COPY to make it work ✓")

print(f"\n🆚 DIRECT COMPARISON: view() vs reshape()")
print("   Testing on the same non-contiguous tensor...")

# Test view() - should FAIL
try:
    view_result = transposed.view(-1)
    print("   view(): SUCCESS (unexpected!)")
except RuntimeError as e:
    print(f"   view(): FAILED ❌ - {str(e)[:40]}...")

# Test reshape() - should SUCCEED  
try:
    reshape_result = transposed.reshape(-1)
    print(f"   reshape(): SUCCESS ✅ - Shape: {reshape_result.shape}")
except RuntimeError as e:
    print(f"   reshape(): FAILED - {e}")

print(f"\n🔍 INVESTIGATING: When does reshape() return view vs copy?")

# Case 1: Simple reshape of contiguous tensor
simple_data = torch.arange(12)
reshaped_simple = simple_data.reshape(3, 4)
shares_memory_1 = simple_data.data_ptr() == reshaped_simple.data_ptr()
print(f"   Contiguous reshape → View: {shares_memory_1}")

# Case 2: Reshape after making non-contiguous
non_contig = reshaped_simple.t()  # Non-contiguous
reshaped_non_contig = non_contig.reshape(-1)
shares_memory_2 = non_contig.data_ptr() == reshaped_non_contig.data_ptr()
print(f"   Non-contiguous reshape → View: {shares_memory_2} (Creates copy)")

print(f"\n💡 RESHAPE() WISDOM:")
print(f"   1. Always succeeds (if math is valid)")
print(f"   2. Returns view when memory layout allows")
print(f"   3. Creates copy when necessary")
print(f"   4. Perfect for beginners and prototyping")
print(f"   5. Use view() only when you need guaranteed performance")

🤝 TORCH.RESHAPE() - THE DIPLOMATIC SOLUTION
====================================================
Original data: torch.Size([24]) → [0, 1, 2, 3, 4, 5]... (24 elements)

✅ SCENARIO 1: Contiguous tensor (reshape returns view)
   Original data_ptr: 5037313032192
   Reshaped data_ptr: 5037313032192
   Same memory (view)? True ✓

⚠️ SCENARIO 2: Non-contiguous tensor (reshape creates copy)
   Transposed contiguous? False
   Transposed data_ptr: 5037313032192
   Reshaped data_ptr:   5037313032384
   Same memory? False
   Conclusion: reshape() created a COPY to make it work ✓

🆚 DIRECT COMPARISON: view() vs reshape()
   Testing on the same non-contiguous tensor...
   view(): FAILED ❌ - view size is not compatible with input t...
   reshape(): SUCCESS ✅ - Shape: torch.Size([24])

🔍 INVESTIGATING: When does reshape() return view vs copy?
   Contiguous reshape → View: True
   Non-contiguous reshape → View: False (Creates copy)

💡 RESHAPE() WISDOM:
   1. Always succeeds (if math is valid)
   2. Returns view when memory layout allows
   3. Creates copy when necessary
   4. Perfect for beginners and prototyping
   5. Use view() only when you need guaranteed performance

🧩 The Element Flow Mystery: How Tensors Rearrange Themselves¶

THE CRUCIAL QUESTION: When you transform a tensor from one shape to another, exactly HOW do the elements flow into their new positions? This is where many apprentices stumble—they understand the math (6×8 = 48 = 2×3×8) but don't visualize the element migration patterns!

Fear not! Professor Torchenstein shall illuminate this dark mystery with surgical precision! Understanding element flow is THE difference between tensor confusion and tensor mastery!

🔍 The Row-Major Flow Principle¶

Remember our fundamental truth: PyTorch always reads and writes elements in row-major order—left to right, then top to bottom, like reading English text!

The Sacred Rule: Elements always flow in this order: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11...]

No matter what shape transformation you perform, elements maintain their reading order but get reinterpreted into new dimensional coordinates.

In [11]:

Copied!





print("🧩 ELEMENT FLOW MASTERCLASS - THE MIGRATION PATTERNS")
print("=" * 65)

# Create our test subject: 2D matrix with clearly identifiable elements
data_2d = torch.arange(24).view(6, 4)  # 6 rows × 4 columns
print("📊 STARTING POINT: 6×4 Matrix (24 elements)")
print(f"   Row-major memory order: {data_2d.flatten().tolist()}")
print(f"   Visual layout:\n{data_2d}")

print(f"\n🎯 TRANSFORMATION 1: 2D → 3D (6×4 → 2×3×4)")
print("   Question: How do elements flow into the new 3D structure?")
transform_3d_v1 = data_2d.view(2, 3, 4)
print(f"   Result shape: {transform_3d_v1.shape}")
print(f"   Element flow visualization:")
print(f"   📦 Batch 0 (elements 0-11):")
print(transform_3d_v1[0])
print(f"   📦 Batch 1 (elements 12-23):")
print(transform_3d_v1[1])
print(f"   💡 Pattern: First 12 elements → Batch 0, Next 12 elements → Batch 1")

print(f"\n🔄 TRANSFORMATION 2: 2D → 3D (6×4 → 3×2×4)")
print("   Same 24 elements, different 3D arrangement!")
transform_3d_v2 = data_2d.view(3, 2, 4)
print(f"   Result shape: {transform_3d_v2.shape}")
print(f"   Element flow visualization:")
for i in range(3):
    print(f"   📦 Batch {i} (elements {i*8}-{i*8+7}):")
    print(f"      {transform_3d_v2[i]}")
print(f"   💡 Pattern: Every 8 elements form a new batch!")

print(f"\n🎲 TRANSFORMATION 3: 2D → 3D (6×4 → 4×3×2)")
print("   Yet another way to slice the same 24 elements!")
transform_3d_v3 = data_2d.view(4, 3, 2)
print(f"   Result shape: {transform_3d_v3.shape}")
print(f"   Element flow visualization:")
for i in range(4):
    print(f"   📦 Batch {i} (elements {i*6}-{i*6+5}):")
    print(f"      {transform_3d_v3[i]}")
print(f"   💡 Pattern: Every 6 elements form a new batch!")

print(f"\n🧠 THE ELEMENT FLOW ALGORITHM:")
print(f"   1. Elements are read in row-major order: 0,1,2,3,4,5...")
print(f"   2. They fill the NEW shape dimensions from right to left:")
print(f"      - Last dimension fills first: [0,1,2,3] if last dim = 4")
print(f"      - Then second-to-last: next group of 4 elements")
print(f"      - Then third-to-last: next group of groups")
print(f"   3. The memory order NEVER changes, only the interpretation!")
print("🧩 ELEMENT FLOW MASTERCLASS - THE MIGRATION PATTERNS")
print("=" * 65)

# Create our test subject: 2D matrix with clearly identifiable elements
data_2d = torch.arange(24).view(6, 4)  # 6 rows × 4 columns
print("📊 STARTING POINT: 6×4 Matrix (24 elements)")
print(f"   Row-major memory order: {data_2d.flatten().tolist()}")
print(f"   Visual layout:\n{data_2d}")

print(f"\n🎯 TRANSFORMATION 1: 2D → 3D (6×4 → 2×3×4)")
print("   Question: How do elements flow into the new 3D structure?")
transform_3d_v1 = data_2d.view(2, 3, 4)
print(f"   Result shape: {transform_3d_v1.shape}")
print(f"   Element flow visualization:")
print(f"   📦 Batch 0 (elements 0-11):")
print(transform_3d_v1[0])
print(f"   📦 Batch 1 (elements 12-23):")
print(transform_3d_v1[1])
print(f"   💡 Pattern: First 12 elements → Batch 0, Next 12 elements → Batch 1")

print(f"\n🔄 TRANSFORMATION 2: 2D → 3D (6×4 → 3×2×4)")
print("   Same 24 elements, different 3D arrangement!")
transform_3d_v2 = data_2d.view(3, 2, 4)
print(f"   Result shape: {transform_3d_v2.shape}")
print(f"   Element flow visualization:")
for i in range(3):
    print(f"   📦 Batch {i} (elements {i*8}-{i*8+7}):")
    print(f"      {transform_3d_v2[i]}")
print(f"   💡 Pattern: Every 8 elements form a new batch!")

print(f"\n🎲 TRANSFORMATION 3: 2D → 3D (6×4 → 4×3×2)")
print("   Yet another way to slice the same 24 elements!")
transform_3d_v3 = data_2d.view(4, 3, 2)
print(f"   Result shape: {transform_3d_v3.shape}")
print(f"   Element flow visualization:")
for i in range(4):
    print(f"   📦 Batch {i} (elements {i*6}-{i*6+5}):")
    print(f"      {transform_3d_v3[i]}")
print(f"   💡 Pattern: Every 6 elements form a new batch!")

print(f"\n🧠 THE ELEMENT FLOW ALGORITHM:")
print(f"   1. Elements are read in row-major order: 0,1,2,3,4,5...")
print(f"   2. They fill the NEW shape dimensions from right to left:")
print(f"      - Last dimension fills first: [0,1,2,3] if last dim = 4")
print(f"      - Then second-to-last: next group of 4 elements")
print(f"      - Then third-to-last: next group of groups")
print(f"   3. The memory order NEVER changes, only the interpretation!")

🧩 ELEMENT FLOW MASTERCLASS - THE MIGRATION PATTERNS
=================================================================
📊 STARTING POINT: 6×4 Matrix (24 elements)
   Row-major memory order: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
   Visual layout:
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]])

🎯 TRANSFORMATION 1: 2D → 3D (6×4 → 2×3×4)
   Question: How do elements flow into the new 3D structure?
   Result shape: torch.Size([2, 3, 4])
   Element flow visualization:
   📦 Batch 0 (elements 0-11):
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
   📦 Batch 1 (elements 12-23):
tensor([[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]])
   💡 Pattern: First 12 elements → Batch 0, Next 12 elements → Batch 1

🔄 TRANSFORMATION 2: 2D → 3D (6×4 → 3×2×4)
   Same 24 elements, different 3D arrangement!
   Result shape: torch.Size([3, 2, 4])
   Element flow visualization:
   📦 Batch 0 (elements 0-7):
      tensor([[0, 1, 2, 3],
        [4, 5, 6, 7]])
   📦 Batch 1 (elements 8-15):
      tensor([[ 8,  9, 10, 11],
        [12, 13, 14, 15]])
   📦 Batch 2 (elements 16-23):
      tensor([[16, 17, 18, 19],
        [20, 21, 22, 23]])
   💡 Pattern: Every 8 elements form a new batch!

🎲 TRANSFORMATION 3: 2D → 3D (6×4 → 4×3×2)
   Yet another way to slice the same 24 elements!
   Result shape: torch.Size([4, 3, 2])
   Element flow visualization:
   📦 Batch 0 (elements 0-5):
      tensor([[0, 1],
        [2, 3],
        [4, 5]])
   📦 Batch 1 (elements 6-11):
      tensor([[ 6,  7],
        [ 8,  9],
        [10, 11]])
   📦 Batch 2 (elements 12-17):
      tensor([[12, 13],
        [14, 15],
        [16, 17]])
   📦 Batch 3 (elements 18-23):
      tensor([[18, 19],
        [20, 21],
        [22, 23]])
   💡 Pattern: Every 6 elements form a new batch!

🧠 THE ELEMENT FLOW ALGORITHM:
   1. Elements are read in row-major order: 0,1,2,3,4,5...
   2. They fill the NEW shape dimensions from right to left:
      - Last dimension fills first: [0,1,2,3] if last dim = 4
      - Then second-to-last: next group of 4 elements
      - Then third-to-last: next group of groups
   3. The memory order NEVER changes, only the interpretation!

🚀 Real-World Neural Network Shape Challenges¶

Now that you understand how elements flow, let's tackle the exact scenarios where neural network engineers use view() and reshape() every single day! These are the problems that can ONLY be solved with shape transformations (not permutation or other operations).

💡 Challenge 1: CNN Feature Maps → Linear Layer¶

The Problem: You've extracted features from images using CNN layers, but now you need to feed them into a fully connected (Linear) layer for classification. The shapes are incompatible!

In [17]:

Copied!





print("🚀 SOLVING REAL-WORLD NEURAL NETWORK SHAPE CHALLENGES")
print("=" * 65)

# =============================================================================
# CHALLENGE 1: CNN Feature Maps → Linear Layer
# =============================================================================
print("💡 CHALLENGE 1: CNN Feature Maps → Linear Layer")
print("-" * 55)

# The scenario: You've processed a batch of images through CNN layers
batch_size = 16
channels = 128  # Feature maps from CNN
height, width = 7, 7  # Spatial dimensions after convolutions

# This is what you get after CNN feature extraction
cnn_features = torch.randn(batch_size, channels, height, width)
print(f"📊 CNN output shape: {cnn_features.shape}")
print(f"   Interpretation: {batch_size} images, {channels} feature maps, {height}×{width} spatial size")

# The problem: Linear layer expects (batch_size, input_features) -> (batch_size, output)
print(f"\n🎯 Linear layer expects: (batch_size, {channels*height*width})")
print(f"❌ But we have: {cnn_features.shape}")

# THE SOLUTION: Flatten spatial dimensions while keeping batch dimension
flattened_features = cnn_features.view(batch_size, -1)
print(f"\n✅ SOLUTION: view({batch_size}, -1)")
print(f"   Result shape: {flattened_features.shape}")

# Verify the calculation
expected_features = channels * height * width
print(f"   Calculation: {channels} × {height} × {width} = {expected_features}")
print(f"   Matches? {flattened_features.shape[1] == expected_features}")
print("🚀 SOLVING REAL-WORLD NEURAL NETWORK SHAPE CHALLENGES")
print("=" * 65)

# =============================================================================
# CHALLENGE 1: CNN Feature Maps → Linear Layer
# =============================================================================
print("💡 CHALLENGE 1: CNN Feature Maps → Linear Layer")
print("-" * 55)

# The scenario: You've processed a batch of images through CNN layers
batch_size = 16
channels = 128  # Feature maps from CNN
height, width = 7, 7  # Spatial dimensions after convolutions

# This is what you get after CNN feature extraction
cnn_features = torch.randn(batch_size, channels, height, width)
print(f"📊 CNN output shape: {cnn_features.shape}")
print(f"   Interpretation: {batch_size} images, {channels} feature maps, {height}×{width} spatial size")

# The problem: Linear layer expects (batch_size, input_features) -> (batch_size, output)
print(f"\n🎯 Linear layer expects: (batch_size, {channels*height*width})")
print(f"❌ But we have: {cnn_features.shape}")

# THE SOLUTION: Flatten spatial dimensions while keeping batch dimension
flattened_features = cnn_features.view(batch_size, -1)
print(f"\n✅ SOLUTION: view({batch_size}, -1)")
print(f"   Result shape: {flattened_features.shape}")

# Verify the calculation
expected_features = channels * height * width
print(f"   Calculation: {channels} × {height} × {width} = {expected_features}")
print(f"   Matches? {flattened_features.shape[1] == expected_features}")

🚀 SOLVING REAL-WORLD NEURAL NETWORK SHAPE CHALLENGES
=================================================================
💡 CHALLENGE 1: CNN Feature Maps → Linear Layer
-------------------------------------------------------
📊 CNN output shape: torch.Size([16, 128, 7, 7])
   Interpretation: 16 images, 128 feature maps, 7×7 spatial size

🎯 Linear layer expects: (batch_size, 6272)
❌ But we have: torch.Size([16, 128, 7, 7])

✅ SOLUTION: view(16, -1)
   Result shape: torch.Size([16, 6272])
   Calculation: 128 × 7 × 7 = 6272
   Matches? True

⚡ Challenge 2: Multi-Head Attention Setup¶

The Problem: You have embeddings for a batch of text sequences, but you need to split the embedding dimension into multiple attention heads for parallel processing.

Let's solve these with code and see exactly how the transformations work:

In [16]:

Copied!





print("🚀 MULTI-HEAD ATTENTION TRANSFORMATION - 3D → 4D")
print("=" * 60)

# Simulate the exact scenario from real Transformers!
batch_size, seq_len, hidden_size = 2, 4, 8  # Small example for clarity
num_heads = 2
head_dim = hidden_size // num_heads  # 8 // 2 = 4

# Create embeddings tensor like in a real Transformer
embeddings_3d = torch.arange(batch_size * seq_len * hidden_size).view(batch_size, seq_len, hidden_size)
print(f"🧠 TRANSFORMER EMBEDDINGS: Shape {embeddings_3d.shape}")
print(f"   [batch_size, sequence_length, hidden_size] = [{batch_size}, {seq_len}, {hidden_size}]")
print(f"   This represents {batch_size} sequences, each with {seq_len} tokens, each token has {hidden_size} features")
print("\n   Embeddings tensor:")
for b in range(batch_size):
    print(f"   Batch {b}:")
    for s in range(seq_len):
        print(f"      Token {s}: {embeddings_3d[b, s].tolist()} (features for this token)")

# THE TRANSFORMATION: Split hidden_size into multiple attention heads
multi_head_4d = embeddings_3d.view(batch_size, seq_len, num_heads, head_dim)
print(f"\n⚡ MULTI-HEAD TRANSFORMATION:")
print(f"   Original: [{batch_size}, {seq_len}, {hidden_size}] → New: [{batch_size}, {seq_len}, {num_heads}, {head_dim}]")
print(f"   Translation: [batch, tokens, features] → [batch, tokens, heads, features_per_head]")

print(f"\n🔍 ELEMENT FLOW ANALYSIS:")
print(f"   Where do the original 8 features go for each token?")
for b in range(batch_size):
    for s in range(seq_len):
        original_features = embeddings_3d[b, s]
        print(f"\n   Batch {b}, Token {s} - Original features: {original_features.tolist()}")
        for h in range(num_heads):
            head_features = multi_head_4d[b, s, h]
            start_idx = h * head_dim
            end_idx = start_idx + head_dim
            print(f"      Head {h}: {head_features.tolist()} (original features [{start_idx}:{end_idx}])")

print(f"\n💡 THE ATTENTION HEAD PATTERN:")
print(f"   • Each token's {hidden_size} features get split into {num_heads} groups of {head_dim}")
print(f"   • Head 0 gets features [0:{head_dim}], Head 1 gets features [{head_dim}:{hidden_size}]")  
print(f"   • This allows each attention head to focus on different aspects!")
print(f"   • The element order is preserved: [0,1,2,3,4,5,6,7] → Head0:[0,1,2,3], Head1:[4,5,6,7]")

print(f"\n🎯 WHY THIS TRANSFORMATION IS GENIUS:")
print(f"   • Same memory, but now we can process {num_heads} attention heads in parallel")
print(f"   • Each head learns different patterns (grammar, semantics, etc.)")
print(f"   • This is the SECRET behind Transformer's incredible power!")
print(f"   • GPT, BERT, ChatGPT - they ALL use this exact transformation!")
print("🚀 MULTI-HEAD ATTENTION TRANSFORMATION - 3D → 4D")
print("=" * 60)

# Simulate the exact scenario from real Transformers!
batch_size, seq_len, hidden_size = 2, 4, 8  # Small example for clarity
num_heads = 2
head_dim = hidden_size // num_heads  # 8 // 2 = 4

# Create embeddings tensor like in a real Transformer
embeddings_3d = torch.arange(batch_size * seq_len * hidden_size).view(batch_size, seq_len, hidden_size)
print(f"🧠 TRANSFORMER EMBEDDINGS: Shape {embeddings_3d.shape}")
print(f"   [batch_size, sequence_length, hidden_size] = [{batch_size}, {seq_len}, {hidden_size}]")
print(f"   This represents {batch_size} sequences, each with {seq_len} tokens, each token has {hidden_size} features")
print("\n   Embeddings tensor:")
for b in range(batch_size):
    print(f"   Batch {b}:")
    for s in range(seq_len):
        print(f"      Token {s}: {embeddings_3d[b, s].tolist()} (features for this token)")

# THE TRANSFORMATION: Split hidden_size into multiple attention heads
multi_head_4d = embeddings_3d.view(batch_size, seq_len, num_heads, head_dim)
print(f"\n⚡ MULTI-HEAD TRANSFORMATION:")
print(f"   Original: [{batch_size}, {seq_len}, {hidden_size}] → New: [{batch_size}, {seq_len}, {num_heads}, {head_dim}]")
print(f"   Translation: [batch, tokens, features] → [batch, tokens, heads, features_per_head]")

print(f"\n🔍 ELEMENT FLOW ANALYSIS:")
print(f"   Where do the original 8 features go for each token?")
for b in range(batch_size):
    for s in range(seq_len):
        original_features = embeddings_3d[b, s]
        print(f"\n   Batch {b}, Token {s} - Original features: {original_features.tolist()}")
        for h in range(num_heads):
            head_features = multi_head_4d[b, s, h]
            start_idx = h * head_dim
            end_idx = start_idx + head_dim
            print(f"      Head {h}: {head_features.tolist()} (original features [{start_idx}:{end_idx}])")

print(f"\n💡 THE ATTENTION HEAD PATTERN:")
print(f"   • Each token's {hidden_size} features get split into {num_heads} groups of {head_dim}")
print(f"   • Head 0 gets features [0:{head_dim}], Head 1 gets features [{head_dim}:{hidden_size}]")  
print(f"   • This allows each attention head to focus on different aspects!")
print(f"   • The element order is preserved: [0,1,2,3,4,5,6,7] → Head0:[0,1,2,3], Head1:[4,5,6,7]")

print(f"\n🎯 WHY THIS TRANSFORMATION IS GENIUS:")
print(f"   • Same memory, but now we can process {num_heads} attention heads in parallel")
print(f"   • Each head learns different patterns (grammar, semantics, etc.)")
print(f"   • This is the SECRET behind Transformer's incredible power!")
print(f"   • GPT, BERT, ChatGPT - they ALL use this exact transformation!")

🚀 MULTI-HEAD ATTENTION TRANSFORMATION - 3D → 4D
============================================================
🧠 TRANSFORMER EMBEDDINGS: Shape torch.Size([2, 4, 8])
   [batch_size, sequence_length, hidden_size] = [2, 4, 8]
   This represents 2 sequences, each with 4 tokens, each token has 8 features

   Embeddings tensor:
   Batch 0:
      Token 0: [0, 1, 2, 3, 4, 5, 6, 7] (features for this token)
      Token 1: [8, 9, 10, 11, 12, 13, 14, 15] (features for this token)
      Token 2: [16, 17, 18, 19, 20, 21, 22, 23] (features for this token)
      Token 3: [24, 25, 26, 27, 28, 29, 30, 31] (features for this token)
   Batch 1:
      Token 0: [32, 33, 34, 35, 36, 37, 38, 39] (features for this token)
      Token 1: [40, 41, 42, 43, 44, 45, 46, 47] (features for this token)
      Token 2: [48, 49, 50, 51, 52, 53, 54, 55] (features for this token)
      Token 3: [56, 57, 58, 59, 60, 61, 62, 63] (features for this token)

⚡ MULTI-HEAD TRANSFORMATION:
   Original: [2, 4, 8] → New: [2, 4, 2, 4]
   Translation: [batch, tokens, features] → [batch, tokens, heads, features_per_head]

🔍 ELEMENT FLOW ANALYSIS:
   Where do the original 8 features go for each token?

   Batch 0, Token 0 - Original features: [0, 1, 2, 3, 4, 5, 6, 7]
      Head 0: [0, 1, 2, 3] (original features [0:4])
      Head 1: [4, 5, 6, 7] (original features [4:8])

   Batch 0, Token 1 - Original features: [8, 9, 10, 11, 12, 13, 14, 15]
      Head 0: [8, 9, 10, 11] (original features [0:4])
      Head 1: [12, 13, 14, 15] (original features [4:8])

   Batch 0, Token 2 - Original features: [16, 17, 18, 19, 20, 21, 22, 23]
      Head 0: [16, 17, 18, 19] (original features [0:4])
      Head 1: [20, 21, 22, 23] (original features [4:8])

   Batch 0, Token 3 - Original features: [24, 25, 26, 27, 28, 29, 30, 31]
      Head 0: [24, 25, 26, 27] (original features [0:4])
      Head 1: [28, 29, 30, 31] (original features [4:8])

   Batch 1, Token 0 - Original features: [32, 33, 34, 35, 36, 37, 38, 39]
      Head 0: [32, 33, 34, 35] (original features [0:4])
      Head 1: [36, 37, 38, 39] (original features [4:8])

   Batch 1, Token 1 - Original features: [40, 41, 42, 43, 44, 45, 46, 47]
      Head 0: [40, 41, 42, 43] (original features [0:4])
      Head 1: [44, 45, 46, 47] (original features [4:8])

   Batch 1, Token 2 - Original features: [48, 49, 50, 51, 52, 53, 54, 55]
      Head 0: [48, 49, 50, 51] (original features [0:4])
      Head 1: [52, 53, 54, 55] (original features [4:8])

   Batch 1, Token 3 - Original features: [56, 57, 58, 59, 60, 61, 62, 63]
      Head 0: [56, 57, 58, 59] (original features [0:4])
      Head 1: [60, 61, 62, 63] (original features [4:8])

💡 THE ATTENTION HEAD PATTERN:
   • Each token's 8 features get split into 2 groups of 4
   • Head 0 gets features [0:4], Head 1 gets features [4:8]
   • This allows each attention head to focus on different aspects!
   • The element order is preserved: [0,1,2,3,4,5,6,7] → Head0:[0,1,2,3], Head1:[4,5,6,7]

🎯 WHY THIS TRANSFORMATION IS GENIUS:
   • Same memory, but now we can process 2 attention heads in parallel
   • Each head learns different patterns (grammar, semantics, etc.)
   • This is the SECRET behind Transformer's incredible power!
   • GPT, BERT, ChatGPT - they ALL use this exact transformation!

🎯 Key Takeaways: When to Use view()/reshape() in Neural Networks¶

✅ Perfect for view()/reshape():

CNN → Linear: Flattening spatial dimensions (B, C, H, W) → (B, C×H×W)
Multi-head attention: Splitting features (B, S, E) → (B, S, H, E/H)
Batch reshaping: Organizing data (N×F) → (B, N/B, F)
Any scenario where total elements stay the same and no dimension reordering is needed

❌ NOT suitable for view()/reshape():

Dimension reordering: (H, W, C) → (C, H, W) (use permute() or transpose())
Broadcasting preparation: Adding singleton dimensions (use unsqueeze())
Changing data layout: Converting between different memory formats

🧠 Remember the Golden Rules:

Total elements must match: original.numel() == reshaped.numel()
Element flow follows row-major order: Last dimension fills first
Memory is shared: Changes to original affect all views
Use -1 for automatic calculation: Let PyTorch figure out one dimension

You now possess the complete knowledge of tensor shape transformation! These patterns appear in every modern neural network architecture. 🚀

🗜️ The Squeeze & Unsqueeze Dimension Dance¶

Welcome to the most elegant manipulation in all of tensor sorcery! While view() and reshape() rearrange elements, squeeze() and unsqueeze() perform a completely different kind of magic—they add and remove dimensions of size 1 without touching a single element!

Think of it as dimensional origami—folding and unfolding the very fabric of tensor space while keeping every number exactly where it is!

🎭 The Dimensional Masqueraders: What Are Size-1 Dimensions?¶

Size-1 dimensions are like invisible spacers in tensor shapes—they don't contain extra data, but they change how PyTorch interprets the structure:

tensor_1d = torch.tensor([1, 2, 3, 4])     # Shape: (4,)        - 1D tensor
tensor_2d = torch.tensor([[1, 2, 3, 4]])   # Shape: (1, 4)      - 2D tensor (1 row)
tensor_3d = torch.tensor([[[1, 2, 3, 4]]]) # Shape: (1, 1, 4)   - 3D tensor (1×1×4)

Same 4 numbers, completely different dimensional personalities! This is where squeeze/unsqueeze become your dimensional choreographers.

🔍 The Twin Operations¶

squeeze() - The Dimension Destroyer 🗜️

Removes dimensions of size 1
Makes tensors "smaller" dimensionally (but same data)
(1, 4, 1, 3) → (4, 3) (removes the size-1 dimensions)
Important: squeeze(dim) on non-size-1 dimensions does nothing (no error!)

unsqueeze() - The Dimension Creator 🎈

Adds dimensions of size 1 at specified positions
Makes tensors "bigger" dimensionally (but same data)
(4, 3) → (1, 4, 1, 3) (adds size-1 dimensions where you specify)

Let's witness this dimensional dance in action!

In [25]:

Copied!





print("🗜️ THE SQUEEZE & UNSQUEEZE DIMENSIONAL DANCE")
print("=" * 55)

# Create our test subject with some size-1 dimensions
original = torch.randn(1, 4, 1, 3, 1)  # Tensor with multiple size-1 dimensions
print(f"🎭 Original tensor shape: {original.shape}")
print(f"   Total elements: {original.numel()}")
print(f"   Contains dimensions of size 1 at positions: 0, 2, 4")

print(f"\n🗜️ SQUEEZE OPERATIONS - Removing Size-1 Dimensions")
print("-" * 35)

# Squeeze all size-1 dimensions
squeezed_all = original.squeeze()
print(f"   squeeze() → {squeezed_all.shape}")
print(f"   Removed ALL size-1 dimensions!")

# Squeeze specific dimension
squeezed_dim0 = original.squeeze(0)  # Remove dimension 0 (size 1)
print(f"   squeeze(0) → {squeezed_dim0.shape}")
print(f"   Removed only dimension 0")

squeezed_dim2 = original.squeeze(2)  # Remove dimension 2 (size 1)
print(f"   squeeze(2) → {squeezed_dim2.shape}")
print(f"   Removed only dimension 2")

# What happens if we try to squeeze a dimension that's NOT size 1?

print(f"\n⚠️ IMPORTANT BEHAVIOR: squeeze() on non-size-1 dimensions")
print("-" * 30)
test_behavior = torch.randn(2, 3, 4)  # No size-1 dimensions
print(f"   Test tensor: {test_behavior.shape}")
result = test_behavior.squeeze(1)  # Try to squeeze dimension 1 (size 3)
print(f"   After squeeze(1): {result.shape} (unchanged!)")
print(f"   squeeze() silently ignores non-size-1 dimensions - no error thrown!")
print("🗜️ THE SQUEEZE & UNSQUEEZE DIMENSIONAL DANCE")
print("=" * 55)

# Create our test subject with some size-1 dimensions
original = torch.randn(1, 4, 1, 3, 1)  # Tensor with multiple size-1 dimensions
print(f"🎭 Original tensor shape: {original.shape}")
print(f"   Total elements: {original.numel()}")
print(f"   Contains dimensions of size 1 at positions: 0, 2, 4")

print(f"\n🗜️ SQUEEZE OPERATIONS - Removing Size-1 Dimensions")
print("-" * 35)

# Squeeze all size-1 dimensions
squeezed_all = original.squeeze()
print(f"   squeeze() → {squeezed_all.shape}")
print(f"   Removed ALL size-1 dimensions!")

# Squeeze specific dimension
squeezed_dim0 = original.squeeze(0)  # Remove dimension 0 (size 1)
print(f"   squeeze(0) → {squeezed_dim0.shape}")
print(f"   Removed only dimension 0")

squeezed_dim2 = original.squeeze(2)  # Remove dimension 2 (size 1)
print(f"   squeeze(2) → {squeezed_dim2.shape}")
print(f"   Removed only dimension 2")

# What happens if we try to squeeze a dimension that's NOT size 1?

print(f"\n⚠️ IMPORTANT BEHAVIOR: squeeze() on non-size-1 dimensions")
print("-" * 30)
test_behavior = torch.randn(2, 3, 4)  # No size-1 dimensions
print(f"   Test tensor: {test_behavior.shape}")
result = test_behavior.squeeze(1)  # Try to squeeze dimension 1 (size 3)
print(f"   After squeeze(1): {result.shape} (unchanged!)")
print(f"   squeeze() silently ignores non-size-1 dimensions - no error thrown!")

🗜️ THE SQUEEZE & UNSQUEEZE DIMENSIONAL DANCE
=======================================================
🎭 Original tensor shape: torch.Size([1, 4, 1, 3, 1])
   Total elements: 12
   Contains dimensions of size 1 at positions: 0, 2, 4

🗜️ SQUEEZE OPERATIONS - Removing Size-1 Dimensions
-----------------------------------
   squeeze() → torch.Size([4, 3])
   Removed ALL size-1 dimensions!
   squeeze(0) → torch.Size([4, 1, 3, 1])
   Removed only dimension 0
   squeeze(2) → torch.Size([1, 4, 3, 1])
   Removed only dimension 2

⚠️ IMPORTANT BEHAVIOR: squeeze() on non-size-1 dimensions
------------------------------
   Test tensor: torch.Size([2, 3, 4])
   After squeeze(1): torch.Size([2, 3, 4]) (unchanged!)
   squeeze() silently ignores non-size-1 dimensions - no error thrown!

In [26]:

Copied!





print(f"\n🎈 UNSQUEEZE OPERATIONS - Adding Size-1 Dimensions")
print("-" * 38)

# Start with a simple 2D tensor
simple_2d = torch.randn(3, 4)
print(f"   Starting tensor: {simple_2d.shape}")

# Add dimensions at different positions
unsqueezed_0 = simple_2d.unsqueeze(0)  # Add dimension at position 0
print(f"   unsqueeze(0) → {unsqueezed_0.shape}")
print(f"   Added size-1 dimension at the beginning")

unsqueezed_1 = simple_2d.unsqueeze(1)  # Add dimension at position 1
print(f"   unsqueeze(1) → {unsqueezed_1.shape}")
print(f"   Added size-1 dimension in the middle")

unsqueezed_2 = simple_2d.unsqueeze(2)  # Add dimension at position 2 (end)
print(f"   unsqueeze(2) → {unsqueezed_2.shape}")
print(f"   Added size-1 dimension at the end")

# Negative indices work too!
unsqueezed_neg = simple_2d.unsqueeze(-1)  # Add at the last position
print(f"   unsqueeze(-1) → {unsqueezed_neg.shape}")
print(f"   Added size-1 dimension at the end (using negative indexing)")
print(f"\n🎈 UNSQUEEZE OPERATIONS - Adding Size-1 Dimensions")
print("-" * 38)

# Start with a simple 2D tensor
simple_2d = torch.randn(3, 4)
print(f"   Starting tensor: {simple_2d.shape}")

# Add dimensions at different positions
unsqueezed_0 = simple_2d.unsqueeze(0)  # Add dimension at position 0
print(f"   unsqueeze(0) → {unsqueezed_0.shape}")
print(f"   Added size-1 dimension at the beginning")

unsqueezed_1 = simple_2d.unsqueeze(1)  # Add dimension at position 1
print(f"   unsqueeze(1) → {unsqueezed_1.shape}")
print(f"   Added size-1 dimension in the middle")

unsqueezed_2 = simple_2d.unsqueeze(2)  # Add dimension at position 2 (end)
print(f"   unsqueeze(2) → {unsqueezed_2.shape}")
print(f"   Added size-1 dimension at the end")

# Negative indices work too!
unsqueezed_neg = simple_2d.unsqueeze(-1)  # Add at the last position
print(f"   unsqueeze(-1) → {unsqueezed_neg.shape}")
print(f"   Added size-1 dimension at the end (using negative indexing)")

🎈 UNSQUEEZE OPERATIONS - Adding Size-1 Dimensions
--------------------------------------
   Starting tensor: torch.Size([3, 4])
   unsqueeze(0) → torch.Size([1, 3, 4])
   Added size-1 dimension at the beginning
   unsqueeze(1) → torch.Size([3, 1, 4])
   Added size-1 dimension in the middle
   unsqueeze(2) → torch.Size([3, 4, 1])
   Added size-1 dimension at the end
   unsqueeze(-1) → torch.Size([3, 4, 1])
   Added size-1 dimension at the end (using negative indexing)

🚀 Real-World Neural Network Applications¶

Now for the moment of truth! When do neural network engineers reach for squeeze() and unsqueeze()? These operations are essential for making tensors compatible with different layers and operations. Let me reveal the most common scenarios:

🎯 Application 1: Broadcasting Preparation Many operations require specific dimensional structures for broadcasting. unsqueeze() adds the necessary dimensions.

🎯 Application 2: Layer Compatibility Different neural network layers expect different dimensional formats. squeeze() and unsqueeze() bridge these gaps.

🎯 Application 3: Batch Dimension Management Adding and removing batch dimensions when switching between single samples and batches.

🎯 Application 4: Loss Function Requirements Many loss functions expect specific shapes—these operations ensure compatibility.

Let's solve real problems that every PyTorch practitioner encounters!

In [31]:

Copied!





print("""=============================================================================
🎯APPLICATION 1: Broadcasting Preparation - Channel-wise Operations
=============================================================================""")


# Scenario: You have a batch of RGB images and want to apply different scaling to each channel
batch_images = torch.randn(32, 3, 224, 224)  # Batch of 32 RGB images
channel_scales = torch.tensor([0.8, 1.2, 1.0])  # Scale factors for R, G, B channels

print(f"📸 Images shape: {batch_images.shape} (batch, channels, height, width)")
print(f"⚖️ Channel scales: {channel_scales.shape} (just 3 values)")

# Problem: Can't broadcast (32, 3, 224, 224) with (3,)
# Solution: Use unsqueeze to make scales broadcastable
scales_broadcastable = channel_scales.unsqueeze(0).unsqueeze(2).unsqueeze(3)
print(f"✅ After unsqueeze(0,2,3): {scales_broadcastable.shape}")
print(f"   Now compatible for broadcasting: (32,3,224,224) * (1,3,1,1)")

# Apply channel-wise scaling
scaled_images = batch_images * scales_broadcastable
print(f"🎨 Scaled images shape: {scaled_images.shape}")

print(f"\n📊 Broadcasting Magic Explained:")
print(f"   Original scales: {channel_scales}")
print(f"   After unsqueeze:  shape {scales_broadcastable.shape}")
print(f"   Each channel gets its own scaling factor across all images!")

print("\n" + "=" * 70)

print("""=============================================================================
🎯APPLICATION 1: Broadcasting Preparation - Channel-wise Operations
=============================================================================""")


# Scenario: You have a batch of RGB images and want to apply different scaling to each channel
batch_images = torch.randn(32, 3, 224, 224)  # Batch of 32 RGB images
channel_scales = torch.tensor([0.8, 1.2, 1.0])  # Scale factors for R, G, B channels

print(f"📸 Images shape: {batch_images.shape} (batch, channels, height, width)")
print(f"⚖️ Channel scales: {channel_scales.shape} (just 3 values)")

# Problem: Can't broadcast (32, 3, 224, 224) with (3,)
# Solution: Use unsqueeze to make scales broadcastable
scales_broadcastable = channel_scales.unsqueeze(0).unsqueeze(2).unsqueeze(3)
print(f"✅ After unsqueeze(0,2,3): {scales_broadcastable.shape}")
print(f"   Now compatible for broadcasting: (32,3,224,224) * (1,3,1,1)")

# Apply channel-wise scaling
scaled_images = batch_images * scales_broadcastable
print(f"🎨 Scaled images shape: {scaled_images.shape}")

print(f"\n📊 Broadcasting Magic Explained:")
print(f"   Original scales: {channel_scales}")
print(f"   After unsqueeze:  shape {scales_broadcastable.shape}")
print(f"   Each channel gets its own scaling factor across all images!")

print("\n" + "=" * 70)

=============================================================================
🎯APPLICATION 1: Broadcasting Preparation - Channel-wise Operations
=============================================================================
📸 Images shape: torch.Size([32, 3, 224, 224]) (batch, channels, height, width)
⚖️ Channel scales: torch.Size([3]) (just 3 values)
✅ After unsqueeze(0,2,3): torch.Size([1, 3, 1, 1])
   Now compatible for broadcasting: (32,3,224,224) * (1,3,1,1)
🎨 Scaled images shape: torch.Size([32, 3, 224, 224])

📊 Broadcasting Magic Explained:
   Original scales: tensor([0.8000, 1.2000, 1.0000])
   After unsqueeze:  shape torch.Size([1, 3, 1, 1])
   Each channel gets its own scaling factor across all images!

======================================================================

In [ ]:

Copied!





print("""=============================================================================
🏗️ APPLICATION 2: Layer Compatibility - Removing Unwanted Dimensions
=============================================================================""")

# Scenario: Global Average Pooling output needs dimension cleanup
batch_size, channels = 16, 512
# After global average pooling, spatial dimensions become 1×1
pooled_features = torch.randn(batch_size, channels, 1, 1)
print(f"🏊 After Global Avg Pool: {pooled_features.shape}")
print(f"   Those 1×1 spatial dimensions are useless for classification!")

# Linear layer expects (batch_size, features) not (batch_size, features, 1, 1)
linear_classifier = torch.nn.Linear(channels, 10)  # 10 classes

# Solution: Squeeze out the spatial dimensions
flattened_features = pooled_features.squeeze(3).squeeze(2)
# Or equivalently: pooled_features.squeeze() removes ALL size-1 dimensions
print(f"✅ After squeeze(3,2): {flattened_features.shape}")

# Now it works with Linear layer
logits = linear_classifier(flattened_features)
print(f"🎯 Classification logits: {logits.shape} (ready for softmax!)")

print("\n" + "=" * 70)
print("""=============================================================================
🏗️ APPLICATION 2: Layer Compatibility - Removing Unwanted Dimensions
=============================================================================""")

# Scenario: Global Average Pooling output needs dimension cleanup
batch_size, channels = 16, 512
# After global average pooling, spatial dimensions become 1×1
pooled_features = torch.randn(batch_size, channels, 1, 1)
print(f"🏊 After Global Avg Pool: {pooled_features.shape}")
print(f"   Those 1×1 spatial dimensions are useless for classification!")

# Linear layer expects (batch_size, features) not (batch_size, features, 1, 1)
linear_classifier = torch.nn.Linear(channels, 10)  # 10 classes

# Solution: Squeeze out the spatial dimensions
flattened_features = pooled_features.squeeze(3).squeeze(2)
# Or equivalently: pooled_features.squeeze() removes ALL size-1 dimensions
print(f"✅ After squeeze(3,2): {flattened_features.shape}")

# Now it works with Linear layer
logits = linear_classifier(flattened_features)
print(f"🎯 Classification logits: {logits.shape} (ready for softmax!)")

print("\n" + "=" * 70)

=============================================================================
🏗️ APPLICATION 2: Layer Compatibility - Removing Unwanted Dimensions
=============================================================================
🏊 After Global Avg Pool: torch.Size([16, 512, 1, 1])
   Those 1×1 spatial dimensions are useless for classification!
✅ After squeeze(3,2): torch.Size([16, 512])
🎯 Classification logits: torch.Size([16, 10]) (ready for softmax!)

======================================================================

In [ ]:

Copied!





print("""=============================================================================
📦 APPLICATION 3: Batch Dimension Management
=============================================================================""")


# Scenario: You have a single image but model expects batches
single_image = torch.randn(3, 224, 224)  # Single RGB image
print(f"🖼️ Single image: {single_image.shape}")

# Model expects batch dimension
print(f"🤖 Model expects: (batch_size, channels, height, width)")

# Solution: Add batch dimension
batched_image = single_image.unsqueeze(0)
print(f"✅ After unsqueeze(0): {batched_image.shape}")
print(f"   Now it looks like a batch of 1 image!")

# After model processing, you might want to remove batch dimension
model_output = torch.randn(1, 1000)  # Pretend model output (1 sample, 1000 classes)
single_output = model_output.squeeze(0)
print(f"🎲 Model output: {model_output.shape}")
print(f"📤 After squeeze(0): {single_output.shape} (back to single sample)")
print("""=============================================================================
📦 APPLICATION 3: Batch Dimension Management
=============================================================================""")


# Scenario: You have a single image but model expects batches
single_image = torch.randn(3, 224, 224)  # Single RGB image
print(f"🖼️ Single image: {single_image.shape}")

# Model expects batch dimension
print(f"🤖 Model expects: (batch_size, channels, height, width)")

# Solution: Add batch dimension
batched_image = single_image.unsqueeze(0)
print(f"✅ After unsqueeze(0): {batched_image.shape}")
print(f"   Now it looks like a batch of 1 image!")

# After model processing, you might want to remove batch dimension
model_output = torch.randn(1, 1000)  # Pretend model output (1 sample, 1000 classes)
single_output = model_output.squeeze(0)
print(f"🎲 Model output: {model_output.shape}")
print(f"📤 After squeeze(0): {single_output.shape} (back to single sample)")

=============================================================================
📦 APPLICATION 3: Batch Dimension Management
=============================================================================
🖼️ Single image: torch.Size([3, 224, 224])
🤖 Model expects: (batch_size, channels, height, width)
✅ After unsqueeze(0): torch.Size([1, 3, 224, 224])
   Now it looks like a batch of 1 image!
🎲 Model output: torch.Size([1, 1000])
📤 After squeeze(0): torch.Size([1000]) (back to single sample)

======================================================================

📊 Specialized Shape Sorcery: Flatten & Unflatten¶

The final metamorphosis in our arsenal! While view() and reshape() require you to calculate dimensions manually, torch.flatten() and torch.unflatten() are intelligent specialists that handle common patterns automatically.

Think of them as smart assistants that understand exactly what you want to achieve without forcing you to do the dimensional mathematics!

🎯 The Flattening Specialists¶

torch.flatten() - The Intelligent Collapser 🗜️📏

Automatically flattens specified dimensions into a single dimension
No manual calculation needed - PyTorch figures out the math
Common neural network pattern: Multi-dimensional → 1D for linear layers
Flexible control: Flatten specific dimension ranges, not just everything

torch.unflatten() - The Dimension Restorer 🎈🔧

Reverses the flattening operation intelligently
Restores specific dimensional structure from flattened data
Perfect for converting 1D outputs back to multi-dimensional formats
Complements flatten() for round-trip transformations

✅ Core Mastery:

torch.flatten(start_dim, end_dim): Intelligent dimensional collapse without manual calculations
torch.unflatten(dim, sizes): Smart restoration of multi-dimensional structure

🧠 Why These Matter in Neural Networks¶

The Classic Problem: CNNs output (batch, channels, height, width) but Linear layers need (batch, features). Instead of manually calculating channels × height × width, just use flatten(start_dim=1)!

The Restoration Problem: Sometimes you need to convert flattened outputs back to spatial formats (like for visualization or further processing).

Let's see these intelligent operations in action!

In [43]:

Copied!





print("📊 SPECIALIZED SHAPE SORCERY: FLATTEN & UNFLATTEN")
print("=" * 60)

# =============================================================================
# BASIC FLATTEN OPERATIONS - The Intelligent Collapser
# =============================================================================
print("🗜️ TORCH.FLATTEN() - The Intelligent Collapser")
print("-" * 50)

# Create a 4D tensor like CNN output
cnn_output = torch.randn(8, 32, 16, 16)  # (batch, channels, height, width)
print(f"🖼️ CNN Output: {cnn_output.shape}")
print(f"   Interpretation: 8 images, 32 feature maps, 16×16 spatial resolution")

# Different flatten strategies
print(f"\n📐 FLATTEN STRATEGIES:")

# Strategy 1: Flatten everything (not common in practice)
completely_flat = torch.flatten(cnn_output)
print(f"   flatten() → {completely_flat.shape}")
print(f"   Flattened EVERYTHING into 1D vector")

# Strategy 2: Flatten spatial dimensions only (VERY common!)
spatial_flat = torch.flatten(cnn_output, start_dim=2)  # Keep batch and channels
print(f"   flatten(start_dim=2) → {spatial_flat.shape}")
print(f"   Flattened only spatial dimensions (16×16=256)")

# Strategy 3: Flatten for linear layer (THE most common!)
linear_ready = torch.flatten(cnn_output, start_dim=1)  # Keep only batch dim
print(f"   flatten(start_dim=1) → {linear_ready.shape}")
print(f"   Ready for Linear layer! (32×16×16={32*16*16})")

# Strategy 4: Flatten specific dimension range
middle_flat = torch.flatten(cnn_output, start_dim=1, end_dim=2)  # Flatten channels and height
print(f"   flatten(start_dim=1, end_dim=2) → {middle_flat.shape}")
print(f"   Flattened channels and height (32×16=512), kept width")

print(f"\n💡 THE CONVENIENCE: No mental math required!")
print(f"   compare: tensor.view(8, -1) vs tensor.flatten(start_dim=1)")
print(f"   flatten() automatically calculates the dimensions!")

print("\n" + "=" * 60)

# =============================================================================
# COMPARISON: flatten() vs view() vs reshape()
# =============================================================================
print("⚔️ FLATTEN vs VIEW vs RESHAPE - The Showdown")
print("-" * 48)

test_tensor = torch.randn(4, 6, 8, 10)
print(f"Test tensor: {test_tensor.shape}")

# Method 1: Manual calculation with view
manual_calc = 6 * 8 * 10  # Calculate manually
view_result = test_tensor.view(4, manual_calc)
print(f"view(4, {manual_calc}): {view_result.shape} ← Had to calculate {manual_calc} manually")

# Method 2: Auto calculation with view
view_auto = test_tensor.view(4, -1)  
print(f"view(4, -1): {view_auto.shape} ← Let PyTorch calculate, but still need to know structure")

# Method 3: Intelligent flatten
flatten_result = test_tensor.flatten(start_dim=1)
print(f"flatten(start_dim=1): {flatten_result.shape} ← Most intuitive! Just say 'flatten from dim 1'")

print(f"\n🏆 Winner: flatten() for readability and intent clarity!")

print("\n" + "=" * 60)
print("📊 SPECIALIZED SHAPE SORCERY: FLATTEN & UNFLATTEN")
print("=" * 60)

# =============================================================================
# BASIC FLATTEN OPERATIONS - The Intelligent Collapser
# =============================================================================
print("🗜️ TORCH.FLATTEN() - The Intelligent Collapser")
print("-" * 50)

# Create a 4D tensor like CNN output
cnn_output = torch.randn(8, 32, 16, 16)  # (batch, channels, height, width)
print(f"🖼️ CNN Output: {cnn_output.shape}")
print(f"   Interpretation: 8 images, 32 feature maps, 16×16 spatial resolution")

# Different flatten strategies
print(f"\n📐 FLATTEN STRATEGIES:")

# Strategy 1: Flatten everything (not common in practice)
completely_flat = torch.flatten(cnn_output)
print(f"   flatten() → {completely_flat.shape}")
print(f"   Flattened EVERYTHING into 1D vector")

# Strategy 2: Flatten spatial dimensions only (VERY common!)
spatial_flat = torch.flatten(cnn_output, start_dim=2)  # Keep batch and channels
print(f"   flatten(start_dim=2) → {spatial_flat.shape}")
print(f"   Flattened only spatial dimensions (16×16=256)")

# Strategy 3: Flatten for linear layer (THE most common!)
linear_ready = torch.flatten(cnn_output, start_dim=1)  # Keep only batch dim
print(f"   flatten(start_dim=1) → {linear_ready.shape}")
print(f"   Ready for Linear layer! (32×16×16={32*16*16})")

# Strategy 4: Flatten specific dimension range
middle_flat = torch.flatten(cnn_output, start_dim=1, end_dim=2)  # Flatten channels and height
print(f"   flatten(start_dim=1, end_dim=2) → {middle_flat.shape}")
print(f"   Flattened channels and height (32×16=512), kept width")

print(f"\n💡 THE CONVENIENCE: No mental math required!")
print(f"   compare: tensor.view(8, -1) vs tensor.flatten(start_dim=1)")
print(f"   flatten() automatically calculates the dimensions!")

print("\n" + "=" * 60)

# =============================================================================
# COMPARISON: flatten() vs view() vs reshape()
# =============================================================================
print("⚔️ FLATTEN vs VIEW vs RESHAPE - The Showdown")
print("-" * 48)

test_tensor = torch.randn(4, 6, 8, 10)
print(f"Test tensor: {test_tensor.shape}")

# Method 1: Manual calculation with view
manual_calc = 6 * 8 * 10  # Calculate manually
view_result = test_tensor.view(4, manual_calc)
print(f"view(4, {manual_calc}): {view_result.shape} ← Had to calculate {manual_calc} manually")

# Method 2: Auto calculation with view
view_auto = test_tensor.view(4, -1)  
print(f"view(4, -1): {view_auto.shape} ← Let PyTorch calculate, but still need to know structure")

# Method 3: Intelligent flatten
flatten_result = test_tensor.flatten(start_dim=1)
print(f"flatten(start_dim=1): {flatten_result.shape} ← Most intuitive! Just say 'flatten from dim 1'")

print(f"\n🏆 Winner: flatten() for readability and intent clarity!")

print("\n" + "=" * 60)

📊 SPECIALIZED SHAPE SORCERY: FLATTEN & UNFLATTEN
============================================================
🗜️ TORCH.FLATTEN() - The Intelligent Collapser
--------------------------------------------------
🖼️ CNN Output: torch.Size([8, 32, 16, 16])
   Interpretation: 8 images, 32 feature maps, 16×16 spatial resolution

📐 FLATTEN STRATEGIES:
   flatten() → torch.Size([65536])
   Flattened EVERYTHING into 1D vector
   flatten(start_dim=2) → torch.Size([8, 32, 256])
   Flattened only spatial dimensions (16×16=256)
   flatten(start_dim=1) → torch.Size([8, 8192])
   Ready for Linear layer! (32×16×16=8192)
   flatten(start_dim=1, end_dim=2) → torch.Size([8, 512, 16])
   Flattened channels and height (32×16=512), kept width

💡 THE CONVENIENCE: No mental math required!
   compare: tensor.view(8, -1) vs tensor.flatten(start_dim=1)
   flatten() automatically calculates the dimensions!

============================================================
⚔️ FLATTEN vs VIEW vs RESHAPE - The Showdown
------------------------------------------------
Test tensor: torch.Size([4, 6, 8, 10])
view(4, 480): torch.Size([4, 480]) ← Had to calculate 480 manually
view(4, -1): torch.Size([4, 480]) ← Let PyTorch calculate, but still need to know structure
flatten(start_dim=1): torch.Size([4, 480]) ← Most intuitive! Just say 'flatten from dim 1'

🏆 Winner: flatten() for readability and intent clarity!

============================================================

In [ ]:

Copied!





# =============================================================================
# UNFLATTEN - The Dimension Restorer  
# =============================================================================
print("🎈 TORCH.UNFLATTEN() - The Dimension Restorer")
print("-" * 48)

# Start with a flattened tensor (common scenario)
flattened_data = torch.randn(16, 512)  # Maybe output from a linear layer
print(f"📏 Flattened data: {flattened_data.shape}")
print(f"   Scenario: Output from Linear layer that we want to reshape spatially")

# Unflatten back to spatial format
# We want to convert 512 features back to (32, 4, 4) spatial layout
unflattened = torch.unflatten(flattened_data, dim=1, sizes=(32, 4, 4))
print(f"   unflatten(dim=1, sizes=(32,4,4)) → {unflattened.shape}")
print(f"   Restored to (batch, channels, height, width) format!")

# Verify the math
print(f"   Verification: 32×4×4 = {32*4*4} ✓")

# Another example: unflatten different dimensions
matrix_data = torch.randn(8, 24)
print(f"\n📊 Matrix data: {matrix_data.shape}")

# Unflatten into different structures
unflatten_1 = torch.unflatten(matrix_data, dim=1, sizes=(6, 4))
print(f"   unflatten(dim=1, sizes=(6,4)) → {unflatten_1.shape}")

unflatten_2 = torch.unflatten(matrix_data, dim=1, sizes=(2, 3, 4))
print(f"   unflatten(dim=1, sizes=(2,3,4)) → {unflatten_2.shape}")

unflatten_batch = torch.unflatten(matrix_data, dim=0, sizes=(2, 4))
print(f"   unflatten(dim=0, sizes=(2,4)) → {unflatten_batch.shape}")
print(f"   Even the batch dimension can be unflattened!")

print("\n" + "=" * 60)
# =============================================================================
# UNFLATTEN - The Dimension Restorer  
# =============================================================================
print("🎈 TORCH.UNFLATTEN() - The Dimension Restorer")
print("-" * 48)

# Start with a flattened tensor (common scenario)
flattened_data = torch.randn(16, 512)  # Maybe output from a linear layer
print(f"📏 Flattened data: {flattened_data.shape}")
print(f"   Scenario: Output from Linear layer that we want to reshape spatially")

# Unflatten back to spatial format
# We want to convert 512 features back to (32, 4, 4) spatial layout
unflattened = torch.unflatten(flattened_data, dim=1, sizes=(32, 4, 4))
print(f"   unflatten(dim=1, sizes=(32,4,4)) → {unflattened.shape}")
print(f"   Restored to (batch, channels, height, width) format!")

# Verify the math
print(f"   Verification: 32×4×4 = {32*4*4} ✓")

# Another example: unflatten different dimensions
matrix_data = torch.randn(8, 24)
print(f"\n📊 Matrix data: {matrix_data.shape}")

# Unflatten into different structures
unflatten_1 = torch.unflatten(matrix_data, dim=1, sizes=(6, 4))
print(f"   unflatten(dim=1, sizes=(6,4)) → {unflatten_1.shape}")

unflatten_2 = torch.unflatten(matrix_data, dim=1, sizes=(2, 3, 4))
print(f"   unflatten(dim=1, sizes=(2,3,4)) → {unflatten_2.shape}")

unflatten_batch = torch.unflatten(matrix_data, dim=0, sizes=(2, 4))
print(f"   unflatten(dim=0, sizes=(2,4)) → {unflatten_batch.shape}")
print(f"   Even the batch dimension can be unflattened!")

print("\n" + "=" * 60)

Professor Torchenstein's Metamorphosis Outro 🎭⚡¶

SPECTACULAR! ABSOLUTELY MAGNIFICENT! My brilliant apprentice, you have not merely learned tensor operations—you have undergone a complete metamorphosis into a true shape-shifting master!

You now think like PyTorch itself! You understand the fundamental principles that govern ALL tensor operations. Every neural network you encounter, every research paper you read, every model you build—you'll see the tensor metamorphosis patterns we've mastered today.

CNN architectures? Child's play—you know exactly how spatial dimensions collapse into linear layers! Transformer attention? Elementary—you understand how embeddings split into multiple heads!

🎭 A Final Word of Wisdom¶

Remember this moment, my dimensional apprentice. Today, you transcended the limitations of static thinking and embraced the fluid, dynamic nature of tensor reality. You learned that form is temporary, but data is eternal—and with the right metamorphic incantations, any shape can become any other shape!

The tensors... they will obey your geometric commands! The gradients... they will flow through your transformations! And the neural networks... they will bend to your architectural will!

Until our paths cross again in the halls of computational glory... keep your learning rates high and your dimensions fluid!

Mwahahahahaha! ⚡🧪🔬