Converting Python NumPy Arrays to PyTorch Tensors - Be on the Right Side of Change

💡 Problem Formulation: Data scientists and ML engineers often switch between NumPy arrays and PyTorch tensors during their workflow. For instance, let’s say we have a NumPy array representing image pixel values or sensor data and we want to convert this to a PyTorch tensor for training a deep learning model. This article explains multiple methods to perform this conversion- transforming inputs of NumPy array structure to outputs as PyTorch tensors.

Method 1: Using `torch.from_numpy()`

The torch.from_numpy() function is a straightforward way to create a tensor from a NumPy array without copying the data. It creates a tensor that shares the same memory with the NumPy array. However, keep in mind that changes to the tensor will also affect the original NumPy array and vice versa.

Here’s an example:

import numpy as np
import torch

numpy_array = np.array([1, 2, 3])
torch_tensor = torch.from_numpy(numpy_array)
print(torch_tensor)

Output:

tensor([1, 2, 3], dtype=torch.int32)

This code snippet demonstrates the conversion of a simple NumPy array to a PyTorch tensor. The resulting tensor shares the same memory space as the numpy_array, ensuring efficient memory usage.

Method 2: Using `torch.tensor()` Constructor

The torch.tensor() constructor can be used for creating a PyTorch tensor from a NumPy array. By default, this function performs a deep copy of the array’s data, ensuring that the original array is not affected by changes to the tensor.

Here’s an example:

import numpy as np
import torch

numpy_array = np.array([4, 5, 6])
torch_tensor = torch.tensor(numpy_array)
print(torch_tensor)

Output:

tensor([4, 5, 6])

The code snippet creates a new PyTorch tensor from the NumPy array, completely independent of the original array, which may be more suitable in scenarios where the data needs to remain unchanged across both structures.

Method 3: Using `Tensor.clone()` and `torch.from_numpy()`

If you want to create a PyTorch tensor from a NumPy array and ensure no shared memory, you can first create a tensor using torch.from_numpy() and then clone it using the Tensor.clone() method.

Here’s an example:

import numpy as np
import torch

numpy_array = np.array([7, 8, 9])
torch_tensor = torch.from_numpy(numpy_array).clone()
print(torch_tensor)

Output:

tensor([7, 8, 9])

The code initially creates a tensor sharing memory with the NumPy array and then creates a cloned instance of this tensor. This cloned tensor does not share memory with the initial NumPy array, allowing for safe manipulation of tensor values without affecting the original data.

Method 4: Using `torch.Tensor.numpy()` in Reverse

Sometimes you may start with a PyTorch tensor and convert it to a NumPy array with Tensor.numpy(), modifying the data and then needing to revert back. Since the tensor and array share memory, the updated NumPy array is still linked to the tensor.

Here’s an example:

import torch

# Let's assume tensor_a is a pre-existing PyTorch tensor
tensor_a = torch.tensor([10.0, 11.0, 12.0])

# Convert tensor to NumPy, modify it, and convert back without additional functions
numpy_array = tensor_a.numpy()
numpy_array += 1
tensor_b = torch.from_numpy(numpy_array)

print(tensor_b)

Output:

tensor([11., 12., 13.])

In this snippet, we have a pre-existing PyTorch tensor. We convert this tensor to a NumPy array and then directly update the NumPy array, which automatically updates the linked tensor without the need for a conversion function.

Bonus One-Liner Method 5: As Type Casting

For quick conversions where you want to ensure no shared memory, you can do a type cast using torch.Tensor().astype() with tensor’s numpy array method in a one-liner code.

Here’s an example:

import numpy as np
import torch

numpy_array = np.array([13, 14, 15], dtype=np.float32)
torch_tensor = torch.Tensor(numpy_array)

print(torch_tensor)

Output:

tensor([13., 14., 15.])

In this one-liner, we create a torch tensor from a NumPy array by casting the numpy array’s data type to match the PyTorch tensor’s default data type.

Summary/Discussion

Method 1: torch.from_numpy(). Simple and efficient. However, because it shares memory with the original array, this method may not be suitable when the elements of the NumPy array or the tensor need to be independently modified.
Method 2: torch.tensor() Constructor. Safe and self-contained; creates a copy of the array’s data. It is a bit less memory-efficient but ideal when data isolation between array and tensor is required.
Method 3: Tensor.clone() Combined with torch.from_numpy(). This method provides the benefits of memory sharing for initialization and data isolation post-cloning. It’s best for scenarios requiring an initial memory-efficient setup followed by a need to manipulate tensor data independently.
Method 4: Using torch.Tensor.numpy() in Reverse. An intuitive approach when you’re frequently switching between tensor and array representations. It’s highly memory efficient but may lead to unintentional data changes if not used carefully.
Method 5: As Type Casting. Quick and convenient for one-off conversions that match data types. It assures no shared memory but may require additional consideration of data types to avoid conversion errors.

Method 1: Using torch.from_numpy()

Method 2: Using torch.tensor() Constructor

Method 3: Using Tensor.clone() and torch.from_numpy()

Method 4: Using torch.Tensor.numpy() in Reverse

Bonus One-Liner Method 5: As Type Casting

Summary/Discussion

Method 1: Using `torch.from_numpy()`

Method 2: Using `torch.tensor()` Constructor

Method 3: Using `Tensor.clone()` and `torch.from_numpy()`

Method 4: Using `torch.Tensor.numpy()` in Reverse