## May 22, 2015

### 36-402, Advanced Data Analysis, Spring 2015: Self-Evaluation and Lessons Learned

Attention conservation notice: 2000+ words of academic navel-gazing about teaching a weird class in an obscure subject at an unrepresentative school; also, no doubt, more complacent than it ought to be.

Once again, it's the brief period between submitting all the grades for 402 and the university releasing the student evaluations (for whatever they're worth), so time to think about what I did, what worked, what didn't, and what to do better.

My self-evaluation was that the class went decently, but very far from perfectly, and needs improvement in important areas. I think the subject matter is good, the arrangement is at least OK, and the textbook a good value for the price. Most importantly, the vast majority of the students appear to have learned a lot about stuff they would not have picked up without the class. Since my goal is not for the students to have fun0 but to challenge them to learn as much as possible, and assist them in doing so, I think the main objective was achieved, though not in ways which will make me beloved or even popular.

All that is much as it was in previous iterations of the class; the big changes from the last time I taught this were the assignments, using R Markdown, and the size of the class.

Writing (almost) all new assignments — ten homeworks and three exams — was good; it reduced cheating1 to negligible proportions2 and kept me interested in the material. It was also a lot more work, but I think it was worth it. Basing them on real papers, mostly but not exclusively from economics, seems to have gone over well, especially considering how many students were in the joint major in economics and statistics. (It also led to a gratifying number of students reporting crises of faith about what they were being taught in their classes in other departments.) Relatedly, having the technical content of each homework only add up to 90 points, with the remaining 10 being allocated for following a writing rubric3 seems to have led to better writing, easier grading, and I think more perception of fairness in the grading.

Encouraging the use of R Markdown so that the students' data analyses were executable and replicable was a very good call. (I have to thank Jerzy Wieczorek for over-coming my skepticism by showing me R Markdown.) In fact, I think it worked well enough that in the future I will make it mandatory, with a teaching session at the beginning of the semester (and exceptions, with permission in advance, for those who want to use knitr and LaTeX). However, I may have to reconsider my use of the np package for kernel regression, since it is very aggressive about printing out progress messages which are not useful in a report.

The big challenge of the class was sheer size. The first time I taught this class, in 2011, it had 63 students; we hit 120 this year. (And the department expects about 50% more next year.) This, of course, made it impossible to get to know most of the students — at best I got a sense of the ones who ere were regular at my office hours or spoke up in lecture, and those who sent me e-mail frequently. (Linking the faces of the former to the names of the latter remains one of my weak points.) It also means I would have gone crazy if it weren't for the very good TAs (Dena Asta, Collin Eubanks, Sangwon "Justin" Hyun and Natalie Klein), and the assistance of Xizhen Cai, acting as my (as it were) understudy — but coordinating six people for teaching is also not one of my strengths. Over the four months of the semester I sent over a thousand e-mails about the class, roughly three quarters to students and a quarter among the six of us; I feel strongly that there have to be more efficient ways of doing this part of my job.

The "quality control" samples — select six students at random every week, have them in for fifteen minutes or so to talk about what they did on the last assignment and anything that leads to, with a promise that their answers will not hurt their grades — continue to be really informative. In particular, I made a point of asking every student how long they spent on that assignment and on previous ones, and most (though not all) were within the university's norms for a nine-credit class. Some students resisted participation, perhaps because they didn't trust the wouldn't-hurt-their-grades bit; if so, I failed at "drive out fear". Also, it needs a better name, since the students keep thinking it's their quality that's being controlled, rather than that of the teaching and grading.

Things that did not work so well:

• I did not do enough to ensure consistency across graders, especially when it came to the depth of the feedback to students. Unfortunately, the only things I can think to improve this will use a lot of my time, but I'll just have to do them. Also, towards the end of the semester, we were slow in getting things graded, largely because of organizational failures on my part. Again, something I will just need to invest more time in doing it right.
• A vocal and numerically non-trivial minority of the students keep finding it hard to get the connections between lectures, the text, and the assignments. If they were just the ones who were obviously slacking or not getting it, I'd be more inclined to dismiss them, but some otherwise very good ones were in this group. This means that the way I'm teaching is not working for people I ought to be able to reach, and I need to somehow change it — while still serving the ones my current style works for.
• This is a distinct complaint from those who dislike not having an explicit example in the text or lecture to copy for each homework assignment. This is not something I plan to change.
• This is also distinct from the complaint about having to do too much programming. That also isn't going to go away, and statistical computing isn't going to become a pre-requisite (whether or not that would be a good idea...), so there needs to be more provision of support for that.
• Many students continue to have difficulty with "what does the model say would happen under situation $X$?" questions, especially when situation $X$ does not occur in the training data. The handout on predict seems to have helped, but not gone far enough. This needs to be made even more explicit, and perhaps a whole lecture given over to hypotheticals, to predictive comparisons, and to average predictive comparisons.
• Once again, I dropped the lowest three homework grades, no questions asked, and didn't give extensions; this is so I don't have to try to decide whether a grand-parent is sick enough, or a job interview demanding enough, or extra-curricular activities at Carnival are important enough, to merit an extension. The drawback is that this leads to lots of students not doing some of the last problem sets (especially the ones who have done well before that), and so being thrown when the final is cumulative.

Things I am considering trying next time:

• Setting up a website where students can ask, and answer, questions with persistent pseudonyms4, so they're not embarrassed to ask for help, and I don't have to repeat myself. (I actually looked into that this semester, but didn't find anything which didn't seem awful. Basically, what I want is a private Stack Exchange.) To prevent this degenerating into either a sewer or a forum for cheating, it will need moderation and monitoring, and perhaps need to be seeded with some planted questions, to encourage participation.
• I am not sure that setting take-home exams really accomplishes anything that wouldn't be done just as well by more homework assignments. Except that I can say they're not to collaborate on exams, and they (mostly, apparently) listen. I might, however, just make it one homework assignment a week, with three of them requiring the report format I've used for take-homes.
• Making participation in the quality-control sampling a small but non-zero part of the class grade, maybe 5% — full credit if you're either called up and do it, or never get called on, 0 if you refuse. (But maybe "a fine is a price" effects would then lead to less participation?)
• Include in each lecture (but not in the online notes?) a short question, to be answered by the next day, which is either conceptual or a tiny bit of theory, totaling say 5% of the grade. This should give me feedback on how well the lectures are working, and give them some feedback on how well they're actually understanding the ideas behind the methods. (The last thing I want to produce is people who just think they know which commands to type in R.)
• Consider moving the dependent-data lectures after graphical models but before causal inference, so as to end with the latter. I might also remove the new lecture on experimental design, because while it's a worthy subject it doesn't excite me, and it fits somewhat awkwardly with the others. (Perhaps I'm not reading the right stuff on experimental design.)
• Consider finding a replacement for the competition to find the most typos in the text.
• Consider promising to feed the class for the last lecture if the response rate on course evaluations goes above, say, 90%.

— Naturally, while proofing this before posting, the university e-mailed me the course evaluations. They were unsurprisingly bimodal.

[0] I have no objection to fun, or to fun classes, or even to students having fun in my classes; it's just not what I'm aiming at here. ^

[1] I am sorry to have to say that there are some students who have tried to cheat, by re-using old solutions. This is why I no longer put solutions on the public web, and part of why I made sure to write new assignments this time, or, if I did re-cycle, make substantial changes. ^

[2] At least, cheating that we caught. (I will not describe how we caught anyone.) ^

[3] This evolved a little over the semester; here's the final version.

The text is laid out cleanly, with clear divisions between problems and sub-problems. The writing itself is well-organized, free of grammatical and other mechanical errors, and easy to follow. Figures and tables are easy to read, with informative captions, axis labels and legends, and are placed near the text of the corresponding problems. All quantitative and mathematical claims are supported by appropriate derivations, included in the text, or calculations in code. Numerical results are reported to appropriate precision. Code is either properly integrated with a tool like R Markdown or knitr, or included as a separate R file. In the former case, both the knitted and the source file are included. In the latter case, the code is clearly divided into sections referring to particular problems. In either case, the code is indented, commented, and uses meaningful names. All code is relevant to the text; there are no dangling or useless commands. All parts of all problems are answered with actual coherent sentences, and never with raw computer code or its output. For full credit, all code runs, and the Markdown file knits (if applicable). ^

[4] The North American Mammals Paleofauna Database for homework 5 has about two thousand entries, so my thought would be to assign each student a random extinct species as their pseudonym. These should be socially neutral, and more memorable than numbers, but no doubt I'll discover that some students have profound feelings about the amphicyonidae. ^

Posted at May 22, 2015 19:34 | permanent link