Share via


group by elements of array

Question

Wednesday, October 13, 2010 11:36 PM

Hi,

We have an array of PurchaseOrder objects, PurchaseOrder[] purchaseorders = PurchaseOrder[Size].

The PurchaseOrder objects has an array member called ProductID.

public class PurchaseOrder

{ 

 int[] ProductID;

}

The number of elements in the ProductID can be any number greate than 0. For the PurchaseOrder objects contained within the same array PurchaseOrder[], the number of elements in the ProductID are same.

e.g.

purchaseOrder1.ProductID = new int[]{1,2,3,4,5};

purchaseOrder2.ProductID = new int[]{2,3,4,5,6};

purchaseOrder3.ProductID = new int[]{3,4,5,6,7};

purchaseOrder4.ProductID = new int[]{4,5,6,7,8};

purchaseOrder5.ProductID = new int[]{5,6,7,8,9};

purchaseOrder6.ProductID = new int[]{2,3,4,5,6};

purchaseOrder7.ProductID = new int[]{1,2,3,4,5};

Would it be possible to use LINQ to product a two dimensional array int[][] after grouping the ProductID member by all the products ids?

The above 7 purchaseorders in an array should product the result  below

int[][]{

                 {1,2,3,4,5},

                 {2,3,4,5,6},

                 {3,4,5,6,7},

                 {4,5,6,7,8},

                 {5,6,7,8,9}

}

as ProductOrder1 and ProductOrder7, ProductOrder2 and ProductOrder6 ontains same arry of product ids.

The code below will select every productid array. How could I group them by all the product id in each array? If LINQ can not do it, what will be the best way?

var productids = (from p in purchaseorders

      select p.ProductID).ToArray();

Thanks

All replies (6)

Thursday, October 14, 2010 11:39 AM ✅Answered

An numcric key is probably faster as a string key

  • also in terms of creating it and also of using it in
    IEnumerable.GroupBy() or any other Linq
    method that comes into play.
    Also if the key is a lambda expression and not a simple
    field/prop of  a class, its body is executed multiple times,
    which will also have an neg. performance impact.

So, having an effective algorithm to build a numeric key
(aka hashcode) based on the  productid array elements,
will most likely speed up you code.

In other words:

public class PurchaseOrder
{ 
 private int[] _productID;
 private int _hashCode;

 public override int GetHashCode(){return _hashCode;}

 int[] ProductID
 {
   get { return _productID; }
   set {
     _productID = value;
     _hashCode = /* you algorithm goes here */;
     }

}


//group the orders
var groupedIds = purchaseOrders
   .GroupBy(p => p.GetHashCode())
   .Select(g => g.First().ProductID)
   .ToArray();

Chris


Thursday, October 14, 2010 3:48 PM ✅Answered

Hi Reed,

Thanks for you reply.

In terms of performance, will the IEqualityComparer<int[]> methods performs well? what will be the best way to implement the GetHashCode function?

We need to run it on huge amout of data.

The GetHashCode implementation will depend on how many elements you're talking about in the array.  What I posted isn't bad, though it will provide the same hash for different ordering of the same numbers.  If you have long arrays, you should only hash a subset of the elements (as you want it to be fast).

If you're talking a very large dataset, you might want to consider putting the data into a database and doing the query there.  LINQ's GroupBy and Distinct will both require some hashing (though my implementation above will be quite a bit less overhead than string joining + grouping).

 

 

Reed Copsey, Jr. - http://reedcopsey.com


Wednesday, October 13, 2010 11:53 PM

You would need an IEqualityComparer<int[]> to do this:

 

public IntArrayEqualityComparer : IEqualityComparer<int[]>
{
  public bool Equals(int[] array1, int[] array2)
  {
     return array1.SequenceEquals(array2);
  }

  public int GetHashCode(int[] arr)
  {
    // Implement some hash - this isn't great, but it'll work
    int hash = array.Length;
    foreach(var i in arr)
      hash = hash ^ i;
    return hash;
  }
}

 

With this method, you could then do:

 

    int[][] productIds = purchaseorders.Select(p => p.ProductID).Distinct(new IntArrayEqualityComparer()).ToArray();

 

Reed Copsey, Jr. - http://reedcopsey.com


Thursday, October 14, 2010 2:42 AM

Hi

you could join the int[] and use the string as the key to group by:

 PurchaseOrder[] purchaseOrders = {
                           new PurchaseOrder{ProductID = new []{1,2,3,4,5}},
                           new PurchaseOrder{ProductID = new []{2,3,4,5,6}},
                           new PurchaseOrder{ProductID = new []{3,4,5,6,7}},
                           new PurchaseOrder{ProductID = new []{4,5,6,7,8}},
                           new PurchaseOrder{ProductID = new []{5,6,7,8,9}},
                           new PurchaseOrder{ProductID = new []{2,3,4,5,6}},
                           new PurchaseOrder{ProductID = new []{1,2,3,4,5}},
                       };

var groupedIds = purchaseOrders
          .GroupBy(p => string.Join(",", p.ProductID))
          .Select(g => g.First().ProductID)
          .ToArray();

If this is exercised on huge data amounts then building some kind
of unique int32 / int64 out of the productid - which is calculated
only once per instance - and that serves as a key to group by -
is recommended.

Chris


Thursday, October 14, 2010 8:17 AM

Hi Chris,

Thanks for your reply.

The query will be running on a humg amount of data.

Could you please explain more in detail about **"If this is exercised on huge data amounts then building some kind
of unique int32 / int64 out of the productid - which is calculated
only once per instance - and that serves as a key to group by -
is recommended. "?
**

Thanks


Thursday, October 14, 2010 8:19 AM

Hi Reed,

Thanks for you reply.

In terms of performance, will the IEqualityComparer<int[]> methods performs well? what will be the best way to implement the GetHashCode function?

We need to run it on huge amout of data.