I've learned and clarified a few things about learning rates, which I will have to validate once I have a deeper understanding of these concepts. The first two were from this helpful fastai forum post:

  • Each time you run {learner_name}.lr_find() from the fastai library, it uses a random set of data (in "batches") to train the data.
  • lr_find() saves the model before it trains on the data, and then loads the model again once it's done. This is why you still have to unfreeze() the model afterwards before you can fit it again (if you want to fit all layers).

The next set of observations were from trying different start and end learning rates (which you only need to do for the actual fit). Here are screenshots of learning rate vs loss plots for different start_lr and end_lr values:

Learning Rate vs Loss for Learning Rates between 1e-20 and 1e-1
Learning Rate vs Loss for Learning Rates between 1e-30 and 1e-1
Learning Rate vs Loss for Learning Rates between 1e-100 and 1e-100
Learning Rate vs Loss for Learning Rates between 1e-200 and 1e-1

At some point, a low enough learning rate causes a ZeroDivisionError

Learning Rate vs Loss for Learning Rates between 1e-1000 and 1e-1

I then increased end_lr above 1e-3, and lr_find() basically didn't run:

Learning Rate vs Loss for Learning Rates between 1e-3 and 1e-1

Without any given start_lr or end_lr, lr_find() defaults to start_lr=1e-7 and end_lr=10.

Learning Rate vs Loss for Learning Rates with default values of start_lr=1e-7 and end_lr=10

After thinking about all of this for awhile, I realized this: the real benefit of fixing a minimum and maximum learning rate is realized when you are actually fitting the model, not when you are trying to find a good learning rate range to use.

Model fit using learning rate between 1e-6 and 1e-4 resulting in an error rate of 6.22%