Bell Curve

Standard Deviation

Using graphics.py, plot a bell curve.

- plot X = -200 to 200 incremented by 10
- with YMAX = 200
- and a standard deviation of 50 (σ)

Vary these values and see what you get.

*Are there other Python modules that can plot data?*

Create a file containing population data that is a bell curve. This file can act as population data when generating statistics. (see project #3)

- generate 1000 to 3000 data points
- data values should be integers
- data values should be one per line

What does random.normal() do?

What does numpy.random.normal() do?

What does scipy.stats.norm() do?

Create an interactive program to

- ask the user for a population data file
- read population data from a file (see project #2)
- calculate the population's mean and standard deviation
- display the population's mean and standard deviation
- Loop...
- ask the user for a sample size
- create a random sample from the population
- calculate the sample's mean and standard deviation
- display the sample's mean and standard deviation

There are several Python modules that will generate the mean and standard deviation from a
list of numbers. (see **numpy**)

Y = Ke^{-(X-M)2/(2σ2)}

X,Y | are the curve's x,y coordinates (used for plotting, etc.) |

K | is the maximum Y coordinate; used to scale the Y coordinates
(height in Y units) |

M | is the curve's mathematical mean (X coordinate of the mean) |

σ | is the curve's standard deviation;
determines how fat or skinny the curve is (width in X units) |

e | is Euler's number; is a constant; is an irrational number (defined in the Python numpy module and other libraries) |

With this equation the user can:

- select what part of the curve to calculate; (used for plotting, etc.)
- set X coordinate for the curve's mean
- set maximum value of the curve's Y coordinates
- set how fat of skinny the curve is; standard deviation (in X units)

m = the population mean
n = the size of the population
x = each value from the population

σ = population standard deviation
n = the size of the population
x = each value from the population
m = the population mean

# -------------------------------------------------------------------
# ---- return the bell curve's y coordinate for a given x coordinate
# ---- x bell curve x coordinate
# ---- ymax bell curve data arithmetic mean (y coordinate)
# ---- mean bell curve data arithmetic mean (x coordinate)
# ---- sd bell curve data standard deviation
# -------------------------------------------------------------------
import numpy as np
def BellCurveValue(x,ymax,mean,sd):
y = ymax * pow(np.e,-pow(x-mean,2)/(2.0*sd*sd))
return y

# -------------------------------------------------------------------
# ---- return a list of random samples from a population list
# ---- poplst - population data list
# ---- samsiz - size of sample
# -------------------------------------------------------------------
# ---- What kind of sampling does your problem require?
# ---- When a sample is drawn from a finite population and is
# ---- returned to that population, sampling is said to be
# ---- "sampling with replacement". This means a sample can be
# ---- selected more than once. When we sample with replacement,
# ---- the two sample values are independent. For example, if we
# ---- are roll a die or tossing a coin more that once, we are
# ---- "sampling with replacement".
# ------------------------------------------------------------------
import numpy as np
import random
def RandomSample(poplst,samsiz):
poplen = len(poplst)
# ---- collect samsiz samples
sam = [] # list of samples
for _ in range(samsiz):
i = random.randint(0,poplen-1)
sam.append(poplst[i])
# ---- calculate mean and standard deviation
avg = np.mean(sam) # mean (average)
std = np.std(sam) # standard deviation
return (sam,avg,std)

Formula for the Normal Distribution or Bell Curve
*Note: This has a slightly different version of the equation.
Read the article for more information.*

Standard Deviation (Wikipedia)

Normal Distribution (Wikipedia)

Standard deviation (simply explained) (YouTube)

Sometimes suspect/bad/outlier data can be part of a sample taken from a population. For a method to eliminate outliers, click HERE .