Thank you John! Your solution is what I was looking for: it resolves my conflict between preserving all the data and excluding outliers; is simple enough to implement on a microcontroller; and seems to work perfectly. I'm guessing that the rolling mean will also smooth out any potential swings due to mechanical resonance.
I know this is an Arduino forum, but I find it quicker and easier to write interactive code in Python, so I've written a crude `test bed' implementation of your algorithm in Python so that I could get a feel for how it works in practice, with user keyboard input simulating real-life sensor readings. I've avoided any libraries or esoteric Python syntax, so this should be easy to adapt to C++/Arduino. I'm posting it here in case it is of interest to the community.
I've used the formula for Sample Std Deviation (n-1) rather than Population SD (n) which I think is correct in this context, but please let me know if not:
in LaTeX: $SD=\sqrt{\frac{\sum_{(i=1)}^{n}(x_{i}-\bar{x})^{2}}{n-1}}$
If there are any other algorithms or solutions to my post, I'd be very interested to know. Thanks again.
#===================================================
def SampleSD(data,bufLen):
datasum = 0.0 # NB declare as float to prevent integer rounding error
diffsum = 0.0 # NB declare as float to prevent integer rounding error
for m in range (0,bufLen):
datasum += data[m]
mean = datasum / bufLen
for m in range (0,bufLen):
diffsum += (data[m] - mean) ** 2
sd = (diffsum /(bufLen-1)) ** 0.5
print "n: %d\tSum: %f\tMean: %f\tSD: %f" %(bufLen,datasum,mean,sd)
return mean,sd
#=====================================
def addData(data,bufLen):
print "\n------------------------------"
new = input ("Sensor reading: ")
for m in range(bufLen-1,0,-1):
data[m]=data[m-1]
data[0]=new
return data
#======================================
bufLen = input ("Buffer Length: ")
print
data = [0] * bufLen # Declare an array of suitable length in Python
n = 0
# Fill buffer with initial data
while n < (bufLen):
new = input ("Initial sensor reading: ")
data[n] = new
n +=1
print "\nBuffer initialised."
print "Data: ",data
while True:
addData(data,bufLen)
print "Data: ", data
mean,sd = SampleSD(data,bufLen)
#===========================================
# Recalculate Mean only using data within range +/- 1 SD of mean
if sd != 0: # Error trap to avoid divide by zero error if sd = 0
print "\nChecking for outliers:"
newSum = 0.0 # NB declare as float to prevent integer rounding error
newLen = 0
n=0
while n < (bufLen):
if ((data[n] < (mean + sd)) and (data[n] > (mean - sd))):
print "Data [%d] = %d Included" % (n, data[n])
newSum += data[n] # sum of data which is within +- 1SD of original mean
newLen += 1 # number of data whic are within +/- 1SD of original mean
else:
print "Data [%d] = %d Excluded: Out of range" % (n, data[n])
n+=1
CorrMean = newSum / newLen
print "New n: %d\tNew Sum: %f\tCorrected Mean: %f" %(newLen,newSum,CorrMean)