1

At the request of a user, I have rewritten this question with more information and tried to clarify it as much as I possibly can.

I have code that reads a range into an array. Many calculations are performed. The resulting array contains an ID and two values:

ID   Seq   Value
a    1     100
a    2     150
a    3     200
b    1     10
b    2     10
b    3     10

However, the calculation step uses Redim Preserve so I have to store the array as TestArray(1 To 3, 1 To 6).

I need to filter the array for duplicate ID's.

If there is no duplicate, I need to store ID, seq and value.

If there is a duplicate ID, I need to store the ID, seq and value where value is the maximum value for a given ID.

If there is a duplicate ID and there are multiple instances of a maximum value, I want to keep the ID, date and value where the value is the maximum value for a given ID and seq is the minimum seq for a given ID.

Basically, for each ID I want the maximum value and if there are multiple maximums, default to the earliest sequence number.

This is a sample of code that shows how the array is structured and what I need the results to look like.

Sub TestArray()

  Dim TestArray() As Variant
  Dim DesiredResults() As Variant

  TestArray = Array(Array("a", "a", "a", "b", "b", "b"), _
    Array(1, 2, 3, 1, 2, 3), _
    Array(100, 150, 200, 10, 10, 10))
  DesiredResults = Array(Array("a", "b"), Array(3, 1), Array(200, 10))

End Sub

Is there some way to loop through the array and find duplicates and then compare them? I could do this easily in SQL but I am struggling in VBA.

11
  • it would be easier to check for duplicates while its still a range in a sheet Commented Aug 27, 2013 at 19:08
  • is that an option to it first? thats what i meant Commented Aug 27, 2013 at 19:09
  • No unfortunatley not. I have to pull the raw data into VBA and do a lot of processing because it's an unreliable database output that I can't control. Commented Aug 27, 2013 at 19:11
  • this is certainly possible and I will post a solution later (I have to pop out now) but can you please confirm whether or not the data is meant to be in a jagged array? The structure you describe is not a jagged array, but the test data of TestArray is. Commented Aug 27, 2013 at 19:15
  • 1
    To find/eliminate duplicates you can use a Dictionary object as explained here: stackoverflow.com/questions/915317/… It allows you to "Add" all IDs as "Keys". In the end, you get a collection of unique keys. Commented Aug 27, 2013 at 19:25

1 Answer 1

5

I kept my test code in so you can inspect the results and play around. I commented why certain things are being done - hope it helps.

The return array is base 1, in the format (column, row). You can of course change this.

Option Explicit

Public Sub TestProcess()

    Dim testResults
    testResults = GetProcessedArray(getTestArray)
    With ActiveSheet
        .Range( _
            .Cells(1, 1), _
            .Cells( _
                1 + UBound(testResults, 1) - LBound(testResults, 1), _
                1 + UBound(testResults, 2) - LBound(testResults, 2))) _
            .Value = testResults
    End With

End Sub

Public Function GetProcessedArray(dataArr As Variant) As Variant

    Dim c As Collection
    Dim resultsArr
    Dim oldResult, key As String
    Dim i As Long, j As Long, lb1 As Long

    Set c = New Collection
    lb1 = LBound(dataArr, 1) 'just cache the value of the lower bound as we use it a lot

    For j = LBound(dataArr, 2) To UBound(dataArr, 2)

        'extract current result for the ID, if any
        '(note that if the ID's aren't necessarily the same type you can add
        ' the key with  prefix of VarType or TypeName as something like key = CStr(VarType(x)) & "|" & CStr(x))
        key = CStr(dataArr(lb1 + 0, j))
        On Error Resume Next
        oldResult = c(key)

        If Err.Number = 5 Then 'error number if record does not exist

            On Error GoTo 0
            'record doesn't exist so add it
            c.Add Array( _
                key, _
                dataArr(lb1 + 1, j), _
                dataArr(lb1 + 2, j)), _
                key

        Else

            On Error GoTo 0
            'test if new value is greater than old value
            If dataArr(lb1 + 2, j) > oldResult(2) Then
                'we want the new one, so:
                'Collection.Item reference is immutable so remove the record
                c.Remove key
                'and Add the new one
                c.Add Array( _
                    key, _
                    dataArr(lb1 + 1, j), _
                    dataArr(lb1 + 2, j)), _
                    key
            ElseIf dataArr(lb1 + 2, j) = oldResult(2) Then
                'test if new sequence number is less than old sequence number
                If dataArr(lb1 + 1, j) < oldResult(1) Then
                    'we want the new one, so:
                    'Collection.Item reference is immutable so remove the record
                    c.Remove key
                    'and Add the new one
                    c.Add Array( _
                        key, _
                        dataArr(lb1 + 1, j), _
                        dataArr(lb1 + 2, j)), _
                        key
                End If
            End If

        End If

    Next j

    'process results into the desired array format
    ReDim resultsArr(1 To 3, 1 To c.Count)
    For j = 1 To c.Count
        For i = 1 To 3
            resultsArr(i, j) = c(j - LBound(resultsArr, 2) + 1)(i - LBound(resultsArr, 1))
        Next i
    Next j

    GetProcessedArray = resultsArr

 End Function

Private Function getTestArray()

  Dim testArray() As Variant
  Dim flatArray
  Dim i As Long
  ReDim flatArray(0 To 2, 0 To 5)

  testArray = Array( _
    Array("a", "a", "a", "b", "b", "b"), _
    Array(1, 2, 3, 1, 2, 3), _
    Array(100, 150, 200, 10, 10, 10))

  For i = 0 To 5

    flatArray(0, i) = testArray(0)(i)
    flatArray(1, i) = testArray(1)(i)
    flatArray(2, i) = testArray(2)(i)

  Next i

  getTestArray = flatArray

End Function
Sign up to request clarification or add additional context in comments.

3 Comments

This is insane, THANK YOU! I will learn a lot from breaking down how this works and understanding it.
@JeffreyKramer you're welcome - let me know if it doesn't work on the full data set! I did make an assumption that all ID's are strings (or at least that the default string conversion is ok), and the odd other small assumption (like an err.number <>5 means the oldResult was retrieved ok). But it should be alright...
Yes, it's working fine for the whole set. This is really interesting to know, because this problem has vexed me for a while. I could always filter for duplicates but I never managed to connect how to do it in conjunction with other criteria. This isn't something that comes up often but it's a huge hassle when it does and this will handle it great. I can also expand upon it to take care of a lot of other stuff.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.