Clarifying the DataLoaders object

Vishal Bakshi
August 26, 2020

While going over my last blog post, I realized that I still felt a bit uncertain about the contents of the dls object that gets sent to the learner. I decided to breakdown the structure of dls a little more thoroughly by printing out some characteristics when these 100 CAMVID images grouped in batches of 4.

In [76]:
import fastai
In [77]:
from fastai.vision.all import *
In [78]:
# Create the dls object from 100 images
path = untar_data(URLs.CAMVID_TINY)
dls = SegmentationDataLoaders.from_label_func(
        path, bs=4, fnames = get_image_files(path/"images"),
        label_func = lambda o: path/'labels'/f'{o.stem}_P{o.suffix}',
        codes = np.loadtxt(path/'codes.txt', dtype=str)
    )
In [81]:
#BATCH SIZE = 4
number_of_train_images = 0
number_of_valid_images = 0
number_of_dataloaders = 0
number_of_train_tensors = 0
number_of_valid_tensors = 0
valid_image_tensor_size = 0
train_image_tensor_size = 0
for dl in dls:  
    number_of_dataloaders += 1
    for tensor_tuples in dl:
        for tensor in tensor_tuples:
            if type(tensor) == fastai.torch_core.TensorImage:
                if dl == dls.train: 
                    number_of_train_tensors += 1
                    train_image_tensor_size = len(tensor.size())
                if dl == dls.valid: 
                    number_of_valid_tensors += 1
                    valid_image_tensor_size = len(tensor.size())
                for image in tensor:
                    if dl == dls.train: number_of_train_images += 1
                    if dl == dls.valid: number_of_valid_images += 1

print('Number of DataLoaders:', number_of_dataloaders)
print('Number of Training Images:', number_of_train_images)
print('Number of Validation Images:', number_of_valid_images)
print('Number of Training Tensors (with',  valid_image_tensor_size ,'images each):', number_of_train_tensors)
print('Number of Validation Tensors (with', train_image_tensor_size,'images each):', number_of_valid_tensors)
Number of DataLoaders: 2
Number of Training Images: 80
Number of Validation Images: 20
Number of Training Tensors (with 4 images each): 20
Number of Validation Tensors (with 4 images each): 5

In summary, SegmentationDataLoaders.from_label_func() returns a dls object that wraps two DataLoaders (train and valid). The training DataLoader (dls[0]) contains 80 images (80% of the full set of 100 images) split into 20 groups of 4 images each (since bs=4). The validation DataLoader (dls[1]) contains 20 images (20% of the full set) split into 5 groups of 4 images each.