Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Question
Wednesday, October 13, 2010 11:36 PM
Hi,
We have an array of PurchaseOrder objects, PurchaseOrder[] purchaseorders = PurchaseOrder[Size].
The PurchaseOrder objects has an array member called ProductID.
public class PurchaseOrder
{
int[] ProductID;
}
The number of elements in the ProductID can be any number greate than 0. For the PurchaseOrder objects contained within the same array PurchaseOrder[], the number of elements in the ProductID are same.
e.g.
purchaseOrder1.ProductID = new int[]{1,2,3,4,5};
purchaseOrder2.ProductID = new int[]{2,3,4,5,6};
purchaseOrder3.ProductID = new int[]{3,4,5,6,7};
purchaseOrder4.ProductID = new int[]{4,5,6,7,8};
purchaseOrder5.ProductID = new int[]{5,6,7,8,9};
purchaseOrder6.ProductID = new int[]{2,3,4,5,6};
purchaseOrder7.ProductID = new int[]{1,2,3,4,5};
Would it be possible to use LINQ to product a two dimensional array int[][] after grouping the ProductID member by all the products ids?
The above 7 purchaseorders in an array should product the result below
int[][]{
{1,2,3,4,5},
{2,3,4,5,6},
{3,4,5,6,7},
{4,5,6,7,8},
{5,6,7,8,9}
}
as ProductOrder1 and ProductOrder7, ProductOrder2 and ProductOrder6 ontains same arry of product ids.
The code below will select every productid array. How could I group them by all the product id in each array? If LINQ can not do it, what will be the best way?
var productids = (from p in purchaseorders
select p.ProductID).ToArray();
Thanks
All replies (6)
Thursday, October 14, 2010 11:39 AM ✅Answered
An numcric key is probably faster as a string key
- also in terms of creating it and also of using it in
IEnumerable.GroupBy() or any other Linq
method that comes into play.
Also if the key is a lambda expression and not a simple
field/prop of a class, its body is executed multiple times,
which will also have an neg. performance impact.
So, having an effective algorithm to build a numeric key
(aka hashcode) based on the productid array elements,
will most likely speed up you code.
In other words:
public class PurchaseOrder
{
private int[] _productID;
private int _hashCode;
public override int GetHashCode(){return _hashCode;}
int[] ProductID
{
get { return _productID; }
set {
_productID = value;
_hashCode = /* you algorithm goes here */;
}
}
//group the orders
var groupedIds = purchaseOrders
.GroupBy(p => p.GetHashCode())
.Select(g => g.First().ProductID)
.ToArray();
Chris
Thursday, October 14, 2010 3:48 PM ✅Answered
Hi Reed,
Thanks for you reply.
In terms of performance, will the IEqualityComparer<int[]> methods performs well? what will be the best way to implement the GetHashCode function?
We need to run it on huge amout of data.
The GetHashCode implementation will depend on how many elements you're talking about in the array. What I posted isn't bad, though it will provide the same hash for different ordering of the same numbers. If you have long arrays, you should only hash a subset of the elements (as you want it to be fast).
If you're talking a very large dataset, you might want to consider putting the data into a database and doing the query there. LINQ's GroupBy and Distinct will both require some hashing (though my implementation above will be quite a bit less overhead than string joining + grouping).
Reed Copsey, Jr. - http://reedcopsey.com
Wednesday, October 13, 2010 11:53 PM
You would need an IEqualityComparer<int[]> to do this:
public IntArrayEqualityComparer : IEqualityComparer<int[]>
{
public bool Equals(int[] array1, int[] array2)
{
return array1.SequenceEquals(array2);
}
public int GetHashCode(int[] arr)
{
// Implement some hash - this isn't great, but it'll work
int hash = array.Length;
foreach(var i in arr)
hash = hash ^ i;
return hash;
}
}
With this method, you could then do:
int[][] productIds = purchaseorders.Select(p => p.ProductID).Distinct(new IntArrayEqualityComparer()).ToArray();
Reed Copsey, Jr. - http://reedcopsey.com
Thursday, October 14, 2010 2:42 AM
Hi
you could join the int[] and use the string as the key to group by:
PurchaseOrder[] purchaseOrders = {
new PurchaseOrder{ProductID = new []{1,2,3,4,5}},
new PurchaseOrder{ProductID = new []{2,3,4,5,6}},
new PurchaseOrder{ProductID = new []{3,4,5,6,7}},
new PurchaseOrder{ProductID = new []{4,5,6,7,8}},
new PurchaseOrder{ProductID = new []{5,6,7,8,9}},
new PurchaseOrder{ProductID = new []{2,3,4,5,6}},
new PurchaseOrder{ProductID = new []{1,2,3,4,5}},
};
var groupedIds = purchaseOrders
.GroupBy(p => string.Join(",", p.ProductID))
.Select(g => g.First().ProductID)
.ToArray();
If this is exercised on huge data amounts then building some kind
of unique int32 / int64 out of the productid - which is calculated
only once per instance - and that serves as a key to group by -
is recommended.
Chris
Thursday, October 14, 2010 8:17 AM
Hi Chris,
Thanks for your reply.
The query will be running on a humg amount of data.
Could you please explain more in detail about **"If this is exercised on huge data amounts then building some kind
of unique int32 / int64 out of the productid - which is calculated
only once per instance - and that serves as a key to group by -
is recommended. "?
**
Thanks
Thursday, October 14, 2010 8:19 AM
Hi Reed,
Thanks for you reply.
In terms of performance, will the IEqualityComparer<int[]> methods performs well? what will be the best way to implement the GetHashCode function?
We need to run it on huge amout of data.