Problem of the Week 1060

Detecting Disease Efficiently

This is based on a true story about detection of syphilis in U.S. military recruits during World War II. Source: D. Teague, College Math Journal 37:4 Sept 2006, 261-268.

Suppose that in a group of 1,000,000 men, 100 of them have a disease that can be detected by an expensive blood test, but it is not known which 100 have the disease. Suppose one has an unlimited supply of blood from each recruit, but is trying to minimize the number of tests needed to determine the 100 diseased cases.

For example, one could combine the blood of #1 and #2, combine the blood of #3 and #4, and so on, performing 500,000 tests. At most 100 of these will be positive. Then one could individually test the 200 men still in doubt, for a total of 500,200 tests. It is possible to do much better; for example, one can combine the samples after the first round, too.

Design a method that succeeds with, in the worst case, as few tests as possible.

Actually, I don't have a proof of optimality, so the idea is to get the number as low as you can. Perhaps a reader can provide a proof of optimality. An obvious lower bound is 100. The source I got this from does it with about 2500 tests, and it is certainly very close to optimal, but the source is considering a continuous version of the problem, and it seems possible to me that the answer in the discrete case posed above could differ by a small amount from this.

Note:

  1. The blood test is a simple YES/NO test that detects the disease or proves it is not present.
  2. What is sought is an explicit method that yields an explicit upper bound on the number of tests. It has been pointed out to me that one can design a method that in theory would give the upper bound, but might not be workable in practice. As noted, my source gives 2500 as an upper bound. It is indeed possible to do significantly better!
  3. I'll also repeat that for this problem we take it as known that there are EXACTLY 100 diseased individuals. Of course, in practice one would not know this. We will stick with the problem as given, where the 100 is known to be the number, though I imagine the variations where one knows the diseased number to be at most 100, say, are also interesting.

© Copyright 2006 Stan Wagon. Reproduced with permission.



19 September 2006