Notes on "Intriguing Properties of Neural Networks", and two other papers (2014)
\[ \DeclareMathOperator*{\argmax}{argmax} \]
Attention conservation notice: Slides full of bullet points are never good reading; why would you force yourself to read painfully obsolete slides (including even more painfully dated jokes) about a rapidly moving subject?
These are basically the slides I presented at CMU's Statistical Machine Learning Reading Group on 13 November 2014, on the first paper on what have come to be called "adversarial examples". It includes some notes I made after the group meeting on the Q-and-A, but I may not have properly credited (or understood) everyone's contributions even at the time. It also includes some even rougher notes about two relevant papers that came out the next month. Presented now because I'm procrastinating preparing for my fall class in the interest of the historical record.
Background
- Nostalgia for the early 1990s: G. Hinton and company are poised to take over the world, NIPS is mad for neural networks, Clinton is running for President...
- Learning about neural networks for the first time in cog. sci. 1
- Apocrypha: a neural network supposed to distinguish tanks from trucks in aerial photographs actually learned about parking lots...
- The models
- Multilayer perceptron \[ \phi(x) = (\phi_K \circ \phi_{K-1} ... \circ \phi_1)(x) \]
- This paper not concerned with training protocol, just following what others have done
- The applications
- MNIST digit-recognition
- ImageNet
- \(10^7\) images from YouTube
- So we've got autoencoders, we've got convolutinal networks, we've got your favorite architecture and way of training it
Where Are the Semantics?
- My assessment of the critique:
- Weakness of both claim and critique: people are very good at finding semantic features that link random objects (e.g., Zhu, Rogers and Gibson, "Human Rademacher Complexity", NIPS 2009 where random word lists come up with semantics like "related to motel service")
- How much semantics would people be able to read into a random collection of training \(x\)'s of equal size to \(\mathcal{X}_i\) or \(\mathcal{X}_v\)?
- Implications of the critique: This is if anything a vindication for good old fashioned parallel distributed processing (as in the 1990s...)
- Doesn't matter as engineering...
- Also doesn't matter as a caricature of animal nervous systems: just as there are no grandmother cells in the brain, there is no "white flower" cell in the network, that's actually distributed across the network
- Suggestions from the audience:
- Ryan Tibshirani: maybe the basis vectors are more "prototypical" than the random vectors? Referenced papers by Nina Baclan and by Robert Tibs. on prototype clustering
- Yu-Xing Wang: What about images corresponding to weights of hidden-layer neurons in convolutional networks? Me: I need to see what's going on in that paper before I can comment...
The Learned Classifier Isn't Perceptually Continuous
- Claim: generalization based on semantics, or at least on features not local in the input space
- "Adversarial examples"
- Find the smallest perturbation \(r\) we can apply to a given \(x\) to drive it to the desired class \(l\), i.e., smallest \(r\) s.t. \(f(x+r) = l\)
- Robustness: use the same perturbation on a different network (different hyperparameters, different training set) and see whether \(f^{\prime}(x+r) = l\) as well
- Some adversarial images
- "All images in the right column are predicted to be an"ostrich, Struthio camelus"
- "The examples are strictly randomly chosen. There is not any postselection involved."
- Some more adversarial images: car vs. not-car
- Some adversarial digits
- RMS distortion \(0.063\) on a 0-1 scale, accuracy 0
- Some non-adversarial digits
- Gaussian white noise, RMS distortion \(0.1\), accuracy 51% (!)
- \(\therefore\) it's not just the magnitude of the noise, it's something about the structure
- Assessing the adversarial examples
- Obviously all adversial examples are mis-classified on the \(f\) they're built for
- But much higher mis-classification rates on other networks as well
- Much higher mis-classification rates than much bigger Gaussian noise
- To be fair, what would adversarial examples look like for any other classifier? For any other classifier using the kernel trick? Someone should try this for a support vector machine
- Suggestions from the audience:
- Ryan: stability somehow? Changing one data point (say one dog for the adversarial dog) must be really changing the function; me: but isn't stability about the stability of the risk (in R^1), not the stability of the classifier (as a point in function space)? Risk averages over the measure again, smoother than the function as such. Martin Azizyan: so even if you always get wrong a dense subset, it wouldn't affect your risk. Me: right, though good luck proving that there is a countable dense subset of adversarial dogs. Ryan: senses of stability ought to be related somehow? Me, after: maybe, see London et al.
- David Choi: might this be some sort of generic phenomenon in very high dimensions? Since most of the volume of a closed body is very close to its surface, unless the geometric margin is very large, most points will be very close in the input space to crossing some decision boundary
How Can This Be?
- The networks aren't over-fitting in any obvious sense
- CV shows they generalize to new instances from the same training pool
- The paper suggests looking at "instability", i.e., the Lipschitz constant of the \(\phi\) mapping
- They can only upper-bound this
- And frankly it's not very persuasive as an answer (Lipschitz constant is a global property not a local one)
- Speculative thought 1: what it is like to be an autoencoder?
- Adversarial examples are perceptually indistinguishable for humans but not for the networks
- \(\therefore\) human perceptual feature space is very different from the network's feature space
- What do adversarial examples look like for humans? (Possible psych. experiment with Mechanical Turk)
- Speculative thoguht 2: "be careful what you wish for"
- Good generalization to the distribution generating instances
- This is \(n^{-1}\sum_{i=1}^{n}{\delta(x-x_i)}\), and \(n = 10^7 \ll\) dimension of the input space
- Big gaps around every point in the support
- \(\therefore\) training to generalize to this distribution doesn't care about small-scale continuity...
- IOW, the network is doing exactly what it's designed to do
- Do we need lots more than \(10^7\) images? How many does a baby see by the time it's a year old, anyway?
- What if we added absolutely-continuous noise to every image every time it was used? (Not enough, they use Gaussians)
- They do better when they fed in adversial examples into the training process (natch), but not clear whether that's not just shifting around the adversarial examples...
- Szegedy et al. looked for minimally-small perturbations which changed the predicted class
- Nguyen et al. look for the largest possible perturbations which leave the class alone: "it is easy to produce images that are completely unrecognizable to humans, but that state-of-the-art DNNs believe to be recognizable objects with 99.99% confidence (e.g. labeling with certainty that white noise static is a lion)."
- Used evolutionary algorithms to find these examples, either "directly encoded" (genome = the pixels) or "indirectly encoded" (genome = a picture-producing network, with more structure in the image).
- Also used gradient ascent
"Evolved images that are unrecognizable to humans, but that state-of-the-art DNNs trained on ImageNet believe with \(\geq 99.6\)% certainty to be a familiar object. ... Images are either directly (top) or indirectly (bottom) encoded."
As you can tell from that figure, the evolved images look nothing at all like what human beings would think of as examples of those classes.
Results for MNIST are similar: in each plot, we see images classified (with \(\geq 99.9\)% confidence) as the digits given by the columns; the rows are 5 different evolutionary runs, with either direct (white-noise-y panel) or indirect (abstract-art-y panel) encodings.
- Paper I says that imperceptible-to-humans perturbations make huge differences to neural networks; paper II says that huge-to-human perturbations can leave neural networks utterly indifferent. And this is, N.B., on networks that do well under cross-validation on the data.
- Conclusion: Clearly human iso-classification contours are nothing at all like DNN iso-classification contours.
- It's entirely possible that any method which works with a high-dimensional feature space is vulnerable to similar attacks. I'm not sure how we'd do this with human beings as the classifiers (some sort of physiological measure of bogglement in place of the network's numerical confidence?), but it'd be in-principle straightforward enough to d with a common-or-garden support vector machine.
- Desimone et al. (1984), in a now-classic paper in neuroscience, demonstrated that macaques had neurons which selectively responded to faces. They went on to show that (at least some) of those cells still responded to quite creepy altered faces, with their features re-arranged, blanking bars drawn across them, and even faces of a different species (as in this portion of their Figure 6A):
It'd seem in-principle possible to use the technique of this paper to evolve images for maximal response from these cells, if you could get the monkey to tolerate the apparatus for long enough.
- The basic idea: defining what microscopic, low-level features cause an image to have macroscopic, high-level properties
- This is a counterfactual / interventional definition, not just about probabilistic association
- This gives an in-principle sound way of constraining a learning system to only pay attention to the causally relevant features
- With observational data, you'd need to do causal inference
- They have a very clever way of doing that here, based on a coarsening theorem, which says that the correct causal model is (almost always) a coarsening of the optimally-predictive observational model
- This in turn gets into issues of predictive states...
I guess a more purely Greek phrase would be "isotaxon", but that's just a guess.
i.e., I learned about this paper from Shallice and Cooper's excellent The Organisation of Mind, but felt dumb for not knowing about it before.
Posted at August 06, 2019 15:17 | permanent link