A given population of an unknown number of items, estimate the size of the population from a random sample and the algorithm below.

Accuracy can be improved by taking several samples then taking the average of the estimated sizes.

Plotting the size estimates as a histogram may also give you a feel for the data.

Watch the video in the links below.

Create an interactive program to estimate the size of a population from a random sample.

- create a test population of integers (1 to N)
- loop
- ask the user for the sample size
- use the algorithm below to estimate the size of the population
- display the results
- actual size
- estimated size
- size difference
- accuracy percentage
- other stats

Create an interactive program to estimate the size of a population from several random samples.

- create a test population of integers (1 to N)
- loop
- ask the user for the sample size
- ask the users how many time (X) to sample the population
- Loop X times
- use the algorithm below to estimate the size of the population

- average the estimated population sizes
- plot the estimated sizes
- display the results
- actual size
- estimated size
- accuracy percentage
- size difference
- other stats

matplotlib.pyplot
(documentation and examples)

To see more random data generation
and pyplot examples click
HERE

Estimate the population size using several sample sizes. For each sample size, estimate the size of the population (100 times?). Output the data to a file for further analyses.

Using the data in the file

- Plot the sample sizes vs the accuracy percentage.
- Plot the sample sizes vs the differences between the actual and estimated.

For example, assuming

- a population "P" of integers (1 to N) with no gaps
- the population is a Python list or tuple (iterable)
- a random sample "S" from the population "P"
- a sample size of "k" elements
- the sample sorted into ascending order
- the sample is a Python list or tuple (iterable)

- Find the largest/maximum sample value
MAX = max(S) or MAX = S[-1] - Calculate the sum of the gaps between the elements in the sample
SUM_OF_GAPS = S _{0}-1 + S_{1}-S_{0}-1 + S_{2}-S_{1}-1 + S_{3}-S_{2}-1 + ... + S_{k-1}-S_{k-2}-1_{0}). - Calculate the average gap size
AVERAGE_GAP = SUM_OF_GAPS / k - Estimate the size of the population P
SIZE = MAX + AVERAGE_GAP

The Clever Way to Count Tanks (YouTube)

Your output does not need to look like this.

sample size: 20
population size: 2000
gap between 0 to 40 is 39
gap between 40 to 44 is 3
gap between 44 to 84 is 39
gap between 84 to 320 is 235
gap between 320 to 362 is 41
gap between 362 to 423 is 60
gap between 423 to 591 is 167
gap between 591 to 628 is 36
gap between 628 to 742 is 113
gap between 742 to 907 is 164
...
number of gaps : 20
sum of gaps : 1950
average gap size: 97.5
sample size : 20
sample sorted : [ 40, 44, 84, 320, 362, 423, 591, 628,
742, 907, 909, 1048, 1154, 1261, 1293,
1329, 1497, 1777, 1962, 1970 ]
actual population size is 2000
estimated population size is 2067.5
difference is 67.5
percent difference is 3.4%