Hodgepodge

Introduction

A given population of an unknown number of items, estimate the size of the population from a random sample and the algorithm below.

Accuracy can be improved by taking several samples then taking the average of the estimated sizes.

Plotting the size estimates as a histogram may also give you a feel for the data.

Watch the video in the links below.

Project #1

Create an interactive program to estimate the size of a population from a random sample.

create a test population of integers (1 to N)
loop
1. ask the user for the sample size
2. use the algorithm below to estimate the size of the population
3. display the results
  - actual size
  - estimated size
  - size difference
  - accuracy percentage
  - other stats

Project #2

Create an interactive program to estimate the size of a population from several random samples.

create a test population of integers (1 to N)
loop
1. ask the user for the sample size
2. ask the users how many time (X) to sample the population
3. Loop X times
  1. use the algorithm below to estimate the size of the population
4. average the estimated population sizes
5. plot the estimated sizes
6. display the results
  - actual size
  - estimated size
  - accuracy percentage
  - size difference
  - other stats

matplotlib.pyplot (documentation and examples)
To see more random data generation and pyplot examples click HERE

Project #3

Estimate the population size using several sample sizes. For each sample size, estimate the size of the population (100 times?). Output the data to a file for further analyses.

Using the data in the file

Plot the sample sizes vs the accuracy percentage.

Plot the sample sizes vs the differences between the actual and estimated.

Algorithm (from the video)

For example, assuming

a population "P" of integers (1 to N) with no gaps
the population is a Python list or tuple (iterable)
a random sample "S" from the population "P"
a sample size of "k" elements
the sample sorted into ascending order
the sample is a Python list or tuple (iterable)

Find the largest/maximum sample value
MAX = max(S) or MAX = S[-1]

Calculate the sum of the gaps between the elements in the sample
SUM_OF_GAPS = S₀ - 1 + S₁-S₀ - 1 + S₂-S₁ - 1 + S₃-S₂ - 1 + ... + S_k-1-S_k-2 - 1
Don't forget there is a gap from the first integer (1) to the first element in the sample (S₀).

Calculate the average gap size
AVERAGE_GAP = SUM_OF_GAPS / k
Estimate the size of the population P
SIZE = MAX + AVERAGE_GAP
The maximum sample value plus the average gap size.

Links

The Clever Way to Count Tanks (YouTube)

Output Display Example

Your output does not need to look like this.

sample size: 20 population size: 2000 gap between 0 to 40 is 39 gap between 40 to 44 is 3 gap between 44 to 84 is 39 gap between 84 to 320 is 235 gap between 320 to 362 is 41 gap between 362 to 423 is 60 gap between 423 to 591 is 167 gap between 591 to 628 is 36 gap between 628 to 742 is 113 gap between 742 to 907 is 164 ... number of gaps : 20 sum of gaps : 1950 average gap size: 97.5 sample size : 20 sample sorted : [ 40, 44, 84, 320, 362, 423, 591, 628, 742, 907, 909, 1048, 1154, 1261, 1293, 1329, 1497, 1777, 1962, 1970 ] actual population size is 2000 estimated population size is 2067.5 difference is 67.5 percent difference is 3.4%