Intro1 - Homework

Vishal Bakshi
Tuesday, August 25, 2020

Batch Size

I changed the batch size for one of the image classifiers we trained in intro notebook to this course, and noticed that the valid_loss value changed significantly.

I'll run the image classifier models to show you what I mean:

In [5]:
from fastai.vision.all import *
path = untar_data(URLs.PETS)/'images'
def is_cat(x): return x[0].isupper()
log = {'pets': {
    2 : 0,
    4 : 0,
    8 : 0,
    16: 0
}, 'segmentation': {
    2 : 0,
    4 : 0,
    8 : 0,
    16: 0
}}

for i in range(4):
    batch_size = 2**(i+1)
    dls = ImageDataLoaders.from_name_func(
        path, get_image_files(path), bs=batch_size, valid_pct=0.2, seed=42,
        label_func=is_cat, item_tfms=Resize(224))
    learn = cnn_learner(dls, resnet34, metrics=error_rate)
    learn.fine_tune(1)
    log['pets'][batch_size] = learn.final_record[1]
epoch train_loss valid_loss error_rate time
0 0.671831 0.144316 0.050744 01:51
epoch train_loss valid_loss error_rate time
0 0.646514 0.352912 0.138024 02:21
epoch train_loss valid_loss error_rate time
0 0.418658 0.077647 0.024357 01:00
epoch train_loss valid_loss error_rate time
0 0.179817 0.050372 0.011502 01:13
epoch train_loss valid_loss error_rate time
0 0.246792 0.049255 0.017591 00:32
epoch train_loss valid_loss error_rate time
0 0.125174 0.023476 0.006089 00:43
epoch train_loss valid_loss error_rate time
0 0.137739 0.052795 0.014208 00:24
epoch train_loss valid_loss error_rate time
0 0.057923 0.013332 0.003383 00:33
In [6]:
path = untar_data(URLs.CAMVID_TINY)

for i in range(4):
    batch_size = 2**(i+1)
    dls = SegmentationDataLoaders.from_label_func(
        path, bs=batch_size, fnames = get_image_files(path/"images"),
        label_func = lambda o: path/'labels'/f'{o.stem}_P{o.suffix}',
        codes = np.loadtxt(path/'codes.txt', dtype=str)
    )
    learn = unet_learner(dls, resnet34)
    learn.fine_tune(8)
    log['segmentation'][batch_size] = learn.final_record[1]
epoch train_loss valid_loss time
0 1.753514 1.250025 00:04
epoch train_loss valid_loss time
0 1.133678 1.048806 00:03
1 1.102538 1.008577 00:03
2 0.981480 0.827040 00:03
3 0.845606 0.736060 00:03
4 0.742655 0.778525 00:03
5 0.644585 0.667181 00:03
6 0.571420 0.669122 00:03
7 0.525055 0.656813 00:03
epoch train_loss valid_loss time
0 2.513621 2.418835 00:03
epoch train_loss valid_loss time
0 1.627781 1.192425 00:02
1 1.382911 1.021066 00:02
2 1.208196 0.872863 00:02
3 1.083026 0.861396 00:02
4 0.955392 0.667616 00:02
5 0.841550 0.637717 00:02
6 0.751018 0.609362 00:02
7 0.684736 0.602068 00:02
epoch train_loss valid_loss time
0 2.916275 2.496579 00:03
epoch train_loss valid_loss time
0 2.078728 1.570321 00:01
1 1.735551 1.315718 00:01
2 1.524138 1.064513 00:01
3 1.349213 0.929341 00:01
4 1.203711 0.811217 00:01
5 1.077349 0.751397 00:01
6 0.973786 0.733066 00:01
7 0.896015 0.722129 00:01
epoch train_loss valid_loss time
0 3.222250 2.311272 00:02
epoch train_loss valid_loss time
0 2.168895 1.820541 00:01
1 1.927316 1.797157 00:01
2 1.764727 1.526917 00:01
3 1.617956 1.200572 00:01
4 1.495027 1.048440 00:01
5 1.390191 0.985106 00:01
6 1.297539 0.928951 00:01
7 1.223987 0.911800 00:01
In [8]:
temp_log = log.copy()

Pets Image Classification Learner

For the pets dataset, the error rate decreases as batch size increases.

In [19]:
plt.plot(log['pets'].keys(), log['pets'].values())
Out[19]:
[<matplotlib.lines.Line2D at 0x7fd195e9bf10>]

Segmentation Learner

For the segmentation learner, the best performance was for a batch size of 4 images.

In [21]:
plt.plot(log['segmentation'].keys(), log['segmentation'].values())
Out[21]:
[<matplotlib.lines.Line2D at 0x7fd195e7ebe0>]

The next day, I came across a very relevant fast.ai forum post which spoke of various reasons that smaller batch sizes may result in a lower loss in the validation set which led me to find this exchange on the same question which referenced the following excerpt from a paper on this topic:

In this paper, we present ample numerical evidence that supports the view that large-batch methods tend to converge to sharp minimizers of the training and testing functions -- and that sharp minima lead to poorer generalization.

From what I understood from that paper after a very quick skim was that larger batch sizes are sensitive to sharp minima and can't find their way out quick enough like smaller batches can. one result of this is that the model does not generalize well. a sharp minima means a very few number of x-values correspond to a y-value which is close to the minima. a smooth or flat minima means many x-values that correspond to a y-value which is close to the minima. as your model is tested with new data, there's a wider breadth of inputs that can lead to the ideal output. a model trained on sharp minima is picky. a model trained on smooth minima is less picky. the paper ended with a series of questions for next steps and my favorite one was:

(e) is it possible,through algorithmic or regulatory means to steer LB methods away from sharp minimizers?

And I'll have to go and find what's already out there on this!

Data Loaders and TensorImages

As I learn the fastai library piece by piece, I learn more about how to program a learner. I decided to figure out what was in dls, the return value to DataLoaders.from_func().

It turns out that dls is an iterable that holds two DataLoaders: one for the training set, and one for the validation set:

In [24]:
dls[0] == dls.train
Out[24]:
True
In [25]:
dls[1] == dls.valid
Out[25]:
True

Each DataLoader holds the batch size number of images (recall that the final learner was trained with a batch size of 16):

In [31]:
i = 0
for dl in dls[0]: 
    print('length of dataloader: ', len(dl[0]))
    img = dl[0][15]
    print('tensor:',img.size())
    print('tensor.permute(1,2,0):', img.cpu().permute(1,2,0).size())
    plt.imshow(img.cpu().permute(1,2,0))
    i += 1
    if i == 1: break
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
length of dataloader:  16
tensor: torch.Size([3, 96, 128])
tensor.permute(1,2,0): torch.Size([96, 128, 3])

The TensorImage has to be restructured a bit before it fits the shape of what plt.imshow is expecting with RGB data: (rows, columns, 3). Since we want to plot horizontal images, the rows (96 of them) are the first input and columns (128 of them) are the second. This is why the inputs to permute are 1, 2, 0. The original order is 3, 96, 128 and the corresponding indices are 0, 1, 2. The 0th value (3) is sent to the end, and the 1st (96) and 2nd (128) values are flipped. 3, 96, 128 gets transformed to 96, 128, 3. If 96 and 128 were flipped, all of the images would be vertical:

In [33]:
i = 0
for dl in dls[0]: 
    print('length of dataloader: ', len(dl[0]))
    img = dl[0][15]
    print('tensor:',img.size())
    print('tensor.permute(2,1,0):', img.cpu().permute(2,1,0).size())
    plt.imshow(img.cpu().permute(2,1,0))
    i += 1
    if i == 1: break
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
length of dataloader:  16
tensor: torch.Size([3, 96, 128])
tensor.permute(2,1,0): torch.Size([128, 96, 3])

Lastly, the image looks burnt meaning it's likely that the image data is not normalized. Thanks to ptrblck's consistently helpful replies in the pytorch forums I was able to fix that:

In [35]:
i = 0
for dl in dls[0]: 
    print('length of dataloader: ', len(dl[0]))
    img = dl[0][15]
    img -= img.min()
    img /= img.max()
    print('tensor:',img.size())
    print('tensor.permute(1,2,0):', img.cpu().permute(1,2,0).size())
    plt.imshow(img.cpu().permute(1,2,0))
    i += 1
    if i == 1: break
length of dataloader:  16
tensor: torch.Size([3, 96, 128])
tensor.permute(1,2,0): torch.Size([96, 128, 3])
In [ ]: