one major reason i am enjoying learning deep learning is that the concepts behind it opens up endless analogies with my personal emotional and intellectual development.
one of the topics covered in Lesson 2 of the fastai course is the drive train approach. jeremy howard has co-authored an informative and inspiring article about it, where i took away the following two key points:
- optimization should be focused on the objective
- optimization leads to new insights beyond a prediction
in my experience, humans start optimizating from birth and children only care about optimization, leading to the never ending question: "why?". curiosity is then maybe an optimizing function.
think about the relationship between language, culture, and meaning specifically for languages with grammatical gender. in gujarati, the word for chair (khurashee) is feminine, and the word for countertop (otlo) is masculine. why? what is feminine about a chair? is that how it even works?
i've found that the gujarati khurashee is loaned from the arabic kursi which means "throne".
kursi comes from the root krs which means related to compacting. other words derived from the same root mean to consolidate, to become compacted and cohering, and parcel of paper.
a natural language processing model may not understand gender, but i imagine that it would probably recognize the features of words that are gendered, based on how that culture thought of gender when root words were formed and as they evolved.
an interesting quora response and blog post i came across along the way:
the model can only predict based on what is known. imagine if gujarati was modified by concensus to change the gender of chair to masculine, maybe it becomes khurashow. you can imagine how long it would take to randomly pick a dataset from all existing gujarati literature and expect equal usage of khurashee and khurashow. i imagine it takes a lot of time for changes in language to permeate through literature.
this is where an optimizer would be handy. suppose our objective is to generate text (predict the next word) based on gujarati literature throughout history, i imagine we would have some rule in our optimizer which favors khurashee. on the other hand, if our objective is to generate text based on modern gujarati, we would favor the hypothetical khurashow.
i think of the optimizer as sort of a bias we can add to the model. a model's accuracy is based on how well it matches actual labels learned from training data with predictions for new data, but a model's optimizer seems to give the modeler the power to influence the prediction toward a fundamentally different direction than what it was trained toward.
i bring this analogy back to my personal development. when i learn from others, i am essentially using their optimizers to reach a different conclusion (prediction) than what i would with my own logic model (architecture + parameters). i label what i perceive as `right`, `wrong`, `adaptive`, `maladaptive`, `true`, `false`, etc. in vulnerability i present those predictions to others through conversation, as they do with me. we explain our thinking to each other, showing the other which parameters to set in our architecture to reach a certain prediction, and how to optimize for a certain objective. over time, we both perceive new data, and continue this exchange, until each other's models have integrated the other's parameters.
an effective teacher then, in my opinion, is someone who listens to their students, understands how the student's logic model translates data to conclusions, and then articulates how to implement a different set of parameters, or optimizer to reach different conclusions. imagine the teacher as a deep learning recommendation system, making predictions about what the student understands baesd on their communication skills given the inputs from the student through discourse, assignments, and exams.