## April 23, 2004

### Monday Exam Blogging

In honor of the end of the semester and the arrival of spring, here is my first, and so far only, attempt at writing an exam. This was given as a take-home midterm by a friend teaching a statistics-for-people-who-don't-like-math course at a school which, to protect the innocent, I'll call the University of Winnemac. (It's a long story.) I'm fond of it, but the students at Winnemac hated it, and apparently none of them got any of the jokes. Solutions are available upon request. (Note that Problem 5 is a simplified version of the "Carnival Booth" algorithm due to Samidh Chakrabarti and Aaron Strauss.)

### Take-Home Midterm, Generic Statistics Service Course, Generic Department, University of Winnemac at J. Random College Town

Problems are longer and harder than exercises; they also count for twice as much. Some questions are in multiple choice format, but you should always show your work.

#### Exercise 1: Average Models

A designer measures the height of a hundred models, randomly chosen from the runways in Milan. The sample mean height is 5.85 feet, with a sample standard deviation of 0.15 feet.
(a) What is the 95% confidence interval for the mean height, in feet, of Milanese models?
(b) What is the standard deviation of their height in inches?
(c) What is the confidence interval for their height in inches?
(d) What is the confidence interval (in feet or inches) for their height while wearing three-inch platform shoes?
(e) The standard deviation of their height in platforms?

#### Exercise 2: Statistics for Con Artists

Every week the market price for frozen concentrated orange juice (FCOJ) futures either goes up or down, with equal probability. A market analyst obtains a list of 1024 FCOJ traders. At the beginning of the year he sends them a letter announcing a free trial period of his new market prediction service; half the letters say the market will rise that week, and half that it will fall. The next week he discards the names of the traders to whom he made the wrong prediction, and repeats the process. Thus after k weeks, the remaining names on the list have received k correct predictions in a row for free.
(a) What is the probability that any given trader is still on the list after seven weeks?
(b) How many names are still on the list after seven weeks?

#### Exercise 3: Coins

What are the probabilities of getting the following sequences of heads and tails from 30 consecutive tosses of a fair coin?

• HHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
• HTHTHTHTHTHTHTHTHTHTHTHTHTHTHT
• HHTTTTHHTHHHHHTHHHTTHTHTTHTTTH

#### Exercise 4: Beating the Market

Mutual fund annual rates of return are normally distributed. What fraction of funds will have returns between one and three standard deviations above the mean in a given year?

#### Exercise 5: Cats and Their Humans

A survey of cats in J. Random College Town finds their weight is normally distributed, with a mean of 9 pounds and a standard deviation of 1.5 pounds. The same survey finds that the weight of cat owners is also normally distributed, with a mean of 150 pounds and a standard deviation of 15 pounds. Describe the distribution of the weight of cat-owners holding their cats, assuming feline and human weights are independent random variables.

## Fun with Venn Diagrams

In Exercises 6--8, A, B and C are three events. P(A) = 0.75, P(B) = 0.65 and P(C) = 0.40. Note: Drawing Venn diagrams is not required to solve these problems, but it may help.

#### Exercise 6

(a) What is the smallest possible value of P(A and B)?
(b) The largest possible value?

#### Exercise 7

Which of the following statements could be true?

1. C is the complement of A
2. B completely contains A
3. C is the complement of (A and B)
4. C is the complement of (A or B)
5. A and B are mutually exclusive

#### Exercise 8

(a) If C is (A and B), what is P(B or A)?
(b) If C = (A and B), what is P(B|A)?

#### Exercise 9: Truly Great Poetry

In his book I Am Sickened by Your Ignorance, the critic Orpheus Bruno declared that "no more than one poem in ten thousand is truly great; the rest might as well be shopping lists". Bruno was subsequently abducted by renegade experimental psychologists and made to rate a large number of randomly-selected poems from 0 to 100, and also to say which ones were "truly great". He, of course, ignored the 0--100 scale entirely, preferring a boundless scale to mirror his boundless magnificence. His ratings, in fact, were normally distributed (or, as he said, "followed the law of the immortal Gauss"): the mean rating was 45, with a standard deviation of 16, and poems which scored 93 or above were "truly great". What is the proportion of truly great poetry?

#### Problem 1: Educational Testing

Professor Sheila Nagig of the Miskatonic University Department of Statistics refuses to give tests to her students, saying that most students who get high scores are just lucky, not knowledgeable, so the test isn't informative. Pressed by the Dean to explain herself, she argues as follows. Consider a test with 100 yes-or-no questions. A student's degree of knowledge of the subject (say, finite-temperature canonical quantum gravity) can be measured by the probability p of their answering a given question correctly. Assuming the questions are independent (which is the case on Prof. Nagig's tests), a student's score is therefore a binomial random variable, B(p, 100). Normally a passing score is 70% correct, or 70 questions out of 100. Note that someone who knows nothing and guesses completely at random has p = 0.5. (In the following, you may use the normal approximation if you wish.)
(a) What is the probability of a student passing if their p = 0.5?
(b) What is the probability of a student passing if their p = 0.7?
(c) Assume that one American in a million has a finite-temperature canonical quantum gravity p of 0.7, and the rest have p = 0.5, i.e., they know nothing about it. What is the probability that a random American who passes a test in the subject knows nothing about it?
(d) Explain what is wrong with Prof. Nagig's argument.

#### Problem 2: Glenn and Glenda Martingale and Their Angora Goats

Glenn and Glenda Martingale have a successful angora sweater business, and, in a fit of vertical integration, buy an angora goat farm. As you know, the most important trait of an angora goat is its fuzziness, measured in hairs per square millimeter. The Martingales, being statistically sophisticated, determine the fuzziness of their goats as follows. For each goat in the herd, fuzziness is measured at a random spot on its body, and then averaged across all goats in the herd. These are their results.

 Sample mean fuzziness 10.3 Sample standard deviation 1.21 Low end of 95% confidence interval 9.99 High end of 95% confidence interval 10.61

Assume, like the Martingales, that fuzziness is normally distributed. Calculate the number of goats in the herd, i.e., the number of samples. Round to the nearest goat. You may assume that there are a lot of goats.

Extra credit. Can you re-do the calculation, without assuming the number of samples is large?

#### Problem 3: Glenn and Glenda Martingale, cont'd.

The International Group of Angora Fanciers (IGAF) stipulates that only wool from goats whose fuzziness is at least 9.09 can be used to make angora sweaters. Assume that the fuzziness is normally distributed with mean 10.30 and standard deviation 1.21 (as in the previous problem).
(a) What is the probability that a random goat on the farm is fuzzy enough for IGAF?
(b) What is the probability that at least 25 out of a random group of 30 goats meets the IGAF standard? Calculate this exactly, using the binomial distribution.
(c) Repeat the previous calculation using the normal approximation.

#### Problem 4: Cagliostro Consulting PLC

In his book Alchemical Management: Getting the Lead Out and the Gold In, Alex Cagliostro, the famous business consultant, profiles ten companies that achieved excellence after adopting his system of alchemical management (seminars available through appointment with Cagliostro Consulting PLC). The rival consultancy of Hooke, Waterhouse, Comstock and Root points out that there were seventy-two other companies which tried alchemical management without achieving excellence.
(a) Assuming these 82 firms form a representative sample (aside: is that reasonable?), calculate a 95% confidence interval for the proportion of excellent firms among alchemically-managed companies.
(b) It is known that 12% of all firms achieve excellence. Test HWCR's claim that excellence is no more common among alchemically-managed firms than among non-alchemical companies. State the null and alternative hypotheses. Is this a one-sided or two-sided test? Calculate the p-value.

#### Problem 5: Airport Screening

Airport security cannot give a detailed screening to everybody trying to fly by plane, so they select a fraction for detailed screening. Suppose there are four kinds of passengers: innocent-looking law-abiding citizens, suspicious-looking law-abiding citizens, suspicious-looking terrorists and innocent-looking terrorists. Security officials decide to screen all suspicious-looking passengers, and a random 2% of all innocent-looking people just to be safe. 10% of the total population is suspicious-looking, but 80% of all terrorists are.
(a) What is the probability that no one in a group of four random terrorists will be screened the next time they fly?
(b) Say a terrorist has evaded scrutiny if they have taken five flights without being screened once. What is the probability that a terrorist is innocent-looking, given that he has evaded scrutiny?
(c) Supposing one has a group of four terrorists who have all evaded scrutiny, what is the probability that none of them will be screened the next time they fly?
(d) Repeat the calculation in (c), supposing that airport security ignores who looks suspicious or innocent, and screens 12% of all passengers completely at random.

Posted at April 23, 2004 18:46 | permanent link