Skip to main content
clarification of description
Source Link
Dave
  • 153
  • 1
  • 5

Thank you John! Your solution is what I was looking for: it resolves my conflict between preserving all the data and excluding the skewing effects of outliers; is simple enough to implement on a microcontroller; and seems to work perfectly. I'm guessing that the rolling mean will also smooth out any potential swings due to mechanical resonance.
I know this is an Arduino forum, but I find it quicker and easier to write interactive code in Python, so I've written a crude `test bed' implementation of your algorithm in Python so that I could get a feel for how it works in practice, with user keyboard input simulating real-life sensor readings. I've avoided any libraries or esoteric Python syntax, so this should be easy to adapt to C++/Arduino. I'm posting it here in case it is of interest to the community.
I've used the formula for Sample Std Deviation (n-1) rather than Population SD (n) which I think is correct in this context, but please let me know if not:
in LaTeX: $SD=\sqrt{\frac{\sum_{(i=1)}^{n}(x_{i}-\bar{x})^{2}}{n-1}}$

Thank you John! Your solution is what I was looking for: it resolves my conflict between preserving all the data and excluding outliers; is simple enough to implement on a microcontroller; and seems to work perfectly. I'm guessing that the rolling mean will also smooth out any potential swings due to mechanical resonance.
I know this is an Arduino forum, but I find it quicker and easier to write interactive code in Python, so I've written a crude `test bed' implementation of your algorithm in Python so that I could get a feel for how it works in practice, with user keyboard input simulating real-life sensor readings. I've avoided any libraries or esoteric Python syntax, so this should be easy to adapt to C++/Arduino. I'm posting it here in case it is of interest to the community.
I've used the formula for Sample Std Deviation (n-1) rather than Population SD (n) which I think is correct in this context, but please let me know if not:
in LaTeX: $SD=\sqrt{\frac{\sum_{(i=1)}^{n}(x_{i}-\bar{x})^{2}}{n-1}}$

Thank you John! Your solution is what I was looking for: it resolves my conflict between preserving all the data and excluding the skewing effects of outliers; is simple enough to implement on a microcontroller; and seems to work perfectly. I'm guessing that the rolling mean will also smooth out any potential swings due to mechanical resonance.
I know this is an Arduino forum, but I find it quicker and easier to write interactive code in Python, so I've written a crude `test bed' implementation of your algorithm in Python so that I could get a feel for how it works in practice, with user keyboard input simulating real-life sensor readings. I've avoided any libraries or esoteric Python syntax, so this should be easy to adapt to C++/Arduino. I'm posting it here in case it is of interest to the community.
I've used the formula for Sample Std Deviation (n-1) rather than Population SD (n) which I think is correct in this context, but please let me know if not:
in LaTeX: $SD=\sqrt{\frac{\sum_{(i=1)}^{n}(x_{i}-\bar{x})^{2}}{n-1}}$

Source Link
Dave
  • 153
  • 1
  • 5

Thank you John! Your solution is what I was looking for: it resolves my conflict between preserving all the data and excluding outliers; is simple enough to implement on a microcontroller; and seems to work perfectly. I'm guessing that the rolling mean will also smooth out any potential swings due to mechanical resonance.
I know this is an Arduino forum, but I find it quicker and easier to write interactive code in Python, so I've written a crude `test bed' implementation of your algorithm in Python so that I could get a feel for how it works in practice, with user keyboard input simulating real-life sensor readings. I've avoided any libraries or esoteric Python syntax, so this should be easy to adapt to C++/Arduino. I'm posting it here in case it is of interest to the community.
I've used the formula for Sample Std Deviation (n-1) rather than Population SD (n) which I think is correct in this context, but please let me know if not:
in LaTeX: $SD=\sqrt{\frac{\sum_{(i=1)}^{n}(x_{i}-\bar{x})^{2}}{n-1}}$

If there are any other algorithms or solutions to my post, I'd be very interested to know. Thanks again.

#===================================================
def SampleSD(data,bufLen):
    datasum = 0.0 # NB declare as float to prevent integer rounding error
    diffsum = 0.0 # NB declare as float to prevent integer rounding error
    for m in range (0,bufLen):
        datasum += data[m]
    mean = datasum / bufLen
    for m in range (0,bufLen):
        diffsum += (data[m] - mean) ** 2
    sd = (diffsum /(bufLen-1)) ** 0.5 
    print "n: %d\tSum: %f\tMean: %f\tSD: %f" %(bufLen,datasum,mean,sd)
    return mean,sd

#=====================================
def addData(data,bufLen):
    print "\n------------------------------"
    new = input ("Sensor reading: ")
    for m in range(bufLen-1,0,-1):       
        data[m]=data[m-1]
    data[0]=new
    return data

#======================================
bufLen = input ("Buffer Length: ")
print
data = [0] * bufLen # Declare an array of suitable length in Python
n = 0

# Fill buffer with initial data
while n < (bufLen):
    new = input ("Initial sensor reading: ")         
    data[n] = new
    n +=1
print "\nBuffer initialised."
print "Data: ",data

while True:
    addData(data,bufLen)
    print "Data: ", data
    mean,sd = SampleSD(data,bufLen)
#===========================================
# Recalculate Mean only using data within range +/- 1 SD of mean
    if sd != 0: # Error trap to avoid divide by zero error if sd = 0
        print "\nChecking for outliers:"
        newSum = 0.0 # NB declare as float to prevent integer rounding error
        newLen = 0
        n=0
        while n < (bufLen):
            if ((data[n] < (mean + sd)) and (data[n] > (mean - sd))):
                print "Data [%d] = %d Included" % (n, data[n])
                newSum += data[n] # sum of data which is within +- 1SD of original mean
                newLen += 1 # number of data whic are within +/- 1SD of original mean
            else:
                print "Data [%d] = %d Excluded: Out of range" % (n, data[n])
            n+=1
    CorrMean = newSum / newLen 
    print "New n: %d\tNew Sum: %f\tCorrected Mean: %f" %(newLen,newSum,CorrMean)