0

This following code works perfectly fine on a small data set. However, the GetMatchCount and BuildMatchArrary are very sluggish on large result. Can anyone recommend any different approach so save processing time? Would it be better to write the array to a file? Are lists just generally slow and not the best option?

using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

public class Client
{
    public int Id;

    public string FirstName
    {
        get
        {
            var firstName = //<call to get from database via Id>

            return firstName;
        }
    }

    public string MiddleName
    {
        get
        {
            var middleName =  //<call to get from database via Id>

            return middleName;
        }
    }

    public string LastName
    {
        get
        {
            var lastName =  //<call to get from database via Id>

            return lastName;
        }
    }

    public string FullName
    {
        get
        {
            return FirstName + " " + MiddleName + " " + LastName;
        }
    }

    public int GetMatchCount(IEnumerable<string> clientFirstNames, IEnumerable<string> clientMiddleNames, IEnumerable<string> clientLastNames)
    {
        var clientFullNames = BuildMatchArray(clientFirstNames, clientMiddleNames, clientLastNames);
        return clientFullNames.Count(x => x == FullName);
    }


    public string[] BuildMatchArray(IEnumerable<string> clientFirstNames, IEnumerable<string> clientMiddleNames, IEnumerable<string> clientLastNames)
    {
        Debug.Assert(clientFirstNames.Count() == clientMiddleNames.Count() && clientMiddleNames.Count() == clientLastNames.Count());

        var clientFullNames = new List<string>();
        for (int i = 0; i < clientFirstNames.Count(); i++)
        {
            clientFullNames.Add(clientFirstNames.ElementAt(i) + " " + clientMiddleNames.ElementAt(i) + " " + clientLastNames.ElementAt(i));
        }
        return clientFullNames.ToArray();
    }
}
3
  • Can you tell us more about the call to the database? Are your first middle and last names stored in separate tables for some reason? Is each property indexed? As it is, even getting one full name looks like it requires 3 select statements in your database. Could that be trimmed? Commented Apr 10, 2018 at 23:43
  • 1
    seems like it needs triple zip stackoverflow.com/questions/10297124/…, and a Lookup Commented Apr 10, 2018 at 23:54
  • Yes, you could also use the proposed ZipThree method in one of the answers on that post. firsts.ZipThree(middles, lasts, (f, m, l) => $"{f} {m} {l}"). This is probably a better and more reusable solution than my answer. Commented Apr 11, 2018 at 0:34

1 Answer 1

1

Where are you getting these strings? If you are using lazy sequences, every time you call Count() you will have to iterate the entire sequence to count how many objects are in the sequence. If the IEnumerable<T> is really a T[] or List<T>, then Count() is optimized to just call the Length or Count property, which isn't expensive. Similarly, ElementAt is also very inefficient and iterates the collection. So with an in-memory lazy sequence this performance will be bad, but if you are streaming results from SQL or an external source, it will be really bad or possibly even incorrect.

A more performant implementation of of BuildMatchArray would be like this:

public IEnumerable<string> ZipNames(IEnumerable<string> firsts, 
    IEnumerable<string> middles, IEnumerable<string> lasts) 
{
    using(var e1 = firsts.GetEnumerator())
    using(var e2 = middles.GetEnumerator())
    using(var e3 = lasts.GetEnumerator())
    {
        var stop = false;

        while (!stop)
        {
            var hasNext1 = e1.MoveNext();
            var hasNext2 = e2.MoveNext();
            var hasNext3 = e3.MoveNext();

            if (hasNext1 && hasNext2 && hasNext3) 
            {
                yield return $"{e1.Current} {e2.Current} {e3.Current}";
            }
            else
            {
                stop = true;
                Debug.Assert(!(hasNext1 || hasNext2 || hasNext3));
            }
        }
    }
}

This requires only one iteration of each input collection, and doesn't need to copy elements to a new List<T>. Another point to note, is that List<T> starts with capacity for 4 elements, and when it fills up, it copies all elements to a new list with double capacity. So if you have a large sequence, you will copy many times.

This implementation is very similar to System.Linq.Enumerable.Zip

In your case, you also shouldn't do a ToArray to your sequence. This will require another copying, and can potentially be a huge array. If you are only sending that array to .Count(x => x == y), then keeping a lazy IEnumerable would be better, because Count operates lazily for lazy sequences and streams data in and counts elements as it sees them, without ever requiring the full collection to be in memory.

See IEnumerable vs List - What to Use? How do they work?

Sign up to request clarification or add additional context in comments.

7 Comments

Shouldn't there be some kind of loop..?
Yes @ThomasHilbert, my mistake, it should be a while not an if/else
Iterators should be Move()d inside that loop :-)
You wouldn't want to open a using block inside a loop, you need to call MoveNext and Current in the loop, in order to get a different result each time.
Well currently neither the iterator positions nor hasNext[1,2,3] will change and the loop will just spin forever.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.