In [20]:
# thanks to Jeremy's twitter
import graphviz
def gv(s): return graphviz.Source('digraph G{ ' + s + '; }')

Chapter 1 - Questionnaire

Vishal Bakshi
August 26, 2020

These are my answers to the Chapter 1 questionnaire...

  1. Do you need these for deep learning?

    • Lots of math T / F
    • Lots of data T / F
    • Lots of expensive computers T / F
    • A PhD T / F
  2. Name five areas where deep learning is now the best in the world.

    • Natural language processing
    • Computer vision
    • Medicine
    • Biology
    • Image generation
  3. What was the name of the first device that was based on the principle of the artificial neuron?

    • Mark I Perceptron
  4. Based on the book of the same name, what are the requirements for parallel distributed processing (PDP)?

    • A set of processing units
    • A state of activation
    • An output function for each unit
    • A pattern of connectivity among units
    • A propagation rule for propagating patterns of activities through the network of connectivities
    • An activation rule for combining the inputs impinging on a unit with the current state of that unit to produce an output for the unit
    • A learning rule whereby patterns of connectivity are modified by experience
    • An environment within which the system must operate
  5. What were the two theoretical misunderstandings that held back the field of neural networks?

    • The first misunderstanding was that neural networks were ineffective because the first layer was unable to learn some simple critical math functions. Marvin Minsky's book on the Perceptron was the source of this.
    • The second misunderstanding was that deep learning practictitioners did not know/hear/understand the importance of having many layers in the neural net. The research was there, but the awareness was not, and neither was the hardware.
  6. What is a GPU?

    • Graphics Processing Unit. A processor which performs operations on multiple objects at the same time. As an example, an operation on every pixel in an image happen at the same time.
  7. Open a notebook and execute a cell containing: 1+1. What happens?

In [3]:
  1. Follow through each cell of the stripped version of the notebook for this chapter. Before executing each cell, guess what will happen.
    • Didn't do this one as I watched the video first. Will do this first for Week 2. I like this.
  2. Complete the Jupyter Notebook online appendix.
    • I did! But some explanatory text is missing (or am I supposed to fill them in?) I'll check the forums.
  3. Why is it hard to use a traditional computer program to recognize images in a photo?

    • Because we don't exactly know the steps we take to recognize images in a photo, we can't program it.
  4. What did Samuel mean by "weight assignment"?

    • Something that represents the machine's strategy for learning. The machine's learning style so to speak.
  5. What term do we normally use in deep learning for what Samuel called "weights"?

    • Parameters
  6. Draw a picture that summarizes Samuel's view of a machine learning model.
In [22]:
gv('''"weight assignment"[shape=box3d width=1. height=0.7]
"weight altering mechanism"->"weight assignment"->results->"automatic test"->"weight altering mechanism"''')
G weight assignment weight assignment results results weight assignment->results weight altering mechanism weight altering mechanism weight altering mechanism->weight assignment automatic test automatic test results->automatic test automatic test->weight altering mechanism
  1. Why is it hard to understand why a deep learning model makes a particular prediction?

    • the "why" is represented by the parameters and there are millions of them so intpreting their values would be difficult
  2. What is the name of the theorem that shows that a neural network can solve any mathematical problem to any level of accuracy?

    • Universal approximation theorem
  3. What do you need in order to train a model?
    • Data
    • and labels if it's supervised learning
  4. How could a feedback loop impact the rollout of a predictive policing model?
    • The model will predict arrests not crime. If policing uses that to inform their policies, the prediction of the model will become a self-fulfilling prophecy.
  5. Do we always have to use 224×224-pixel images with the cat recognition model?
    • No
  6. What is the difference between classification and regression?
    • Classification is used when the dependent variable is discrete ("dog", "cat")
    • Regression is used for continuous dependent variables (1.2, 3.33)
  7. What is a validation set? What is a test set? Why do we need them?
    • The validation set data not used to train the model, but to measure the performance of parameters
    • The test is not used to train the model or measure the performance, but to measure the performance of hyperparameters
  8. What will fastai do if you don't provide a validation set?
    • Set 20% of the data aside as the validation set
  9. Can we always use a random sample for a validation set? Why or why not?
    • No! When images have similar features, like the same person making different gestures, you don't want to train the model using one and then validate using the other, as your model will be "cheating".
  10. What is overfitting? Provide an example.
    • Overfitting is the memorization of a training set's images instead of learning the general features. Training a model for too many epochs results in an overfit model. When you have two parameters, an overfitted model will pass through many points of data, with many inflection points. A better fitted model will be smoother (fewer inflection points) and while may not pass through many training points, will come closer to new data.
  11. What is a metric? How does it differ from "loss"?
    • A metric is a measure of a model's performance
    • Loss is the difference between the actual result and the model's prediction
  12. How can pretrained models help?
    • Only a few layers of pretrained model are needed to train on your data since the model already has learned many features.
  13. What is the "head" of a model?
    • The top layers of a pretrained model trained on new data
  14. What kinds of features do the early layers of a CNN find? How about the later layers?
    • Early layers find simple shapes and gradients. Later layers find more complex features.
  15. Are image models only useful for photos?
    • No! Any data represented by an image can be used to teach an image model.
  16. What is an "architecture"?
    • The function receiving inputs and assigned parameters
  17. What is segmentation?
    • The identification of pixels in an image with some code indicating a category or item
  18. What is y_range used for? When do we need it?
    • y_range is the range for the continuous number being predicted. We must tell the model this range when the data we are predicting is continuous using a neural network.
  19. What are "hyperparameters"?
    • Hyperparameters are things like the learning rate, which affect the value of the parameters.
  20. What's the best way to avoid failures when using AI in an organization?
    • Withold some data as a validation set.
In [ ]: