Share via


OverflowException with IEnumerable.Sum() but not with a foreach loop

Question

Tuesday, September 27, 2011 6:48 AM

I'm trying to understand why I'm seeing what I'm seeing and also learn the best way to handle it. The situation can be illustrated by the following code block

 

int[] data = new int[]{int.MaxValue, 1};

int test1 = 0;
foreach (int datum in data)
    test1 += datum;
//test1 equals "-2147483648"

int test2 = data.Sum();
//Throws an "OverflowException"

 

I vastly prefer the declarative style of linq in comparison to the foreach loop approach. The .Sum() method seemed especially appropriate for GetHashCode() overrides for immutable objects that contain collections. The integer looping back around is perfectly acceptable in this situation.

So for example, I'd prefer to do this implementation:

public override int GetHashCode()
{
    return 17 + 23 * _prop1.GetHashCode()
        + 23 * _prop2.GetHashCode()
        + _sequence1.Sum(item => 23 * item.GetHashCode())
        + _sequence2.Sum(item => 23 * item.GetHashCode());
}

...instead of this:

 

public override int GetHashCode()
{
    int ret = 17 + 23 * _prop1.GetHashCode()
        + 23 * _prop2.GetHashCode();
    foreach (var item in _sequence1)
    {
        ret += 23 * item.GetHashCode();
    }
    foreach (var item in _sequence2)
    {
        ret += 23 * item.GetHashCode();
    }
    return ret;
}

But I certainly couldn't use the linq if it risked throwing these OverflowExceptions. Does anyone know of a solution?

 

All replies (13)

Tuesday, September 27, 2011 10:03 AM âś…Answered

As an aside, is there a simple way to see whether the library you're using is compiled with or without boundary checking?

Btw, a simple solution would be to create a duplicate of System.Linq.Enumerable that contains a SumUnchecked, and apply the unchecked keyword for that.

 

The code of the sum method is very similar to yours:

publicstaticintSum(thisIEnumerable<int> source)
{
    if (source == null)
    {
        throwError.ArgumentNull("source");
    }
    intnum = 0;
    foreach (intnum2insource)
    {
        num += num2;
    }
    returnnum;
}

Tuesday, September 27, 2011 7:14 AM

Karatin,

The best way of learning things is trying to solve your problem in the shortest time. It means also that you will try to use things like Linq and other ways of helpful pieces of code.

But be aware that Linq is behind the scene also using for each loops, it only gives you a more declarative way of writing. Don't assume a for each loop cost time. Those nanoseconds used for most for each loops are seldom to recognise. 

To say it in another way, don't do optimization while you are learning, do that if you have full knowledge of all aspects and even then do it only if you have a real time problem in your solution. 

Success
Cor


Tuesday, September 27, 2011 7:39 AM

In C# specification, section 7.8.4:

In a checked context, if the sum is outside the range of the result type, a System.OverflowException is thrown. In an unchecked context, overflows are not reported and any significant high-order bits outside the range of the result type are discarded.

And section 7.6.12: 

For non-constant expressions (expressions that are evaluated at run-time) that are not enclosed by any checked or unchecked operators or statements, the default overflow checking context is unchecked unless external factors (such as compiler switches and execution environment configuration) call for checked evaluation.

So actually this will throw OverflowException too:

            int[] data = new int[] { int.MaxValue, 1 };







            int test1 = 0;



            checked



            {



                foreach (int datum in data)



                    test1 += datum;



            }

On the other hand, since linq Sum() executes on a new thread (You should be able to see the ThreadStart in exception stacktrace), unchecked keyword will have no effect on it (out of scope).

For your case, you could try something like this if you want:

public override int GetHashCode()



{



    return (int)(17 + 23 * _prop1.GetHashCode()



        + 23 * _prop2.GetHashCode()



        + _sequence1.Sum(item => 23 * (double)item.GetHashCode())



        + _sequence2.Sum(item => 23 * (double)item.GetHashCode()));



}

It'll extend the "in-memory" copy of variable to double size to get around the exception, then you could cast it back to integer. I've compiled it to check that it should run without giving you the exception.

EDIT: It would also work too:

            int[] data = new int[] { int.MaxValue, 1 };

            int test2 = data.Aggregate((temp, next) => unchecked(temp + next));

Tuesday, September 27, 2011 8:18 AM

I'm trying to understand why I'm seeing what I'm seeing and also learn the best way to handle it. The situation can be illustrated by the following code block

Linq's Sum() method (along with many others) does checked arithmetic, and unfortunately there's no way of turning it off. Your only recourse is to write your own loop.

(Aside: I notice you are using exactly the same hash function that I usually do. ;)


Tuesday, September 27, 2011 8:19 AM

On the other hand, since linq Sum() executes on a new thread (You should be able to see the ThreadStart in exception stacktrace), unchecked keyword will have no effect on it (out of scope).

This has nothing to do with it being threaded. It's because the "checked" keyword is applied to the already compiled (to IL) code in the Linq library.


Tuesday, September 27, 2011 11:46 AM

As an aside, is there a simple way to see whether the library you're using is compiled with or without boundary checking?

I doubt it. You can, for example, just mark individual blocks of code as "checked".


Tuesday, September 27, 2011 12:36 PM

By looking at the code in Reflector, you can't see it directly, but by looking at the IL, you'll see it's using add.ovf instead of add:

 

.method public hidebysig static int32 Sum(class [mscorlib]System.Collections.Generic.IEnumerable`1<int32> source) cil managed
{
    .custom instance void System.Runtime.CompilerServices.ExtensionAttribute::.ctor()
    .maxstack 2
    .locals init (
        [0] int32 num,
        [1] int32 num2,
        [2] class [mscorlib]System.Collections.Generic.IEnumerator`1<int32> enumerator)
    L_0000: ldarg.0 
    L_0001: brtrue.s L_000e
    L_0003: ldstr "source"
    L_0008: call class [mscorlib]System.Exception System.Linq.Error::ArgumentNull(string)
    L_000d: throw 
    L_000e: ldc.i4.0 
    L_000f: stloc.0 
    L_0010: ldarg.0 
    L_0011: callvirt instance class [mscorlib]System.Collections.Generic.IEnumerator`1<!0> [mscorlib]System.Collections.Generic.IEnumerable`1<int32>::GetEnumerator()
    L_0016: stloc.2 
    L_0017: br.s L_0024
    L_0019: ldloc.2 
    L_001a: callvirt instance !0 [mscorlib]System.Collections.Generic.IEnumerator`1<int32>::get_Current()
    L_001f: stloc.1 
    L_0020: ldloc.0 
    L_0021: ldloc.1 
    L_0022: add.ovf
    L_0023: stloc.0 
    L_0024: ldloc.2 
    L_0025: callvirt instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()
    L_002a: brtrue.s L_0019
    L_002c: leave.s L_0038
    L_002e: ldloc.2 
    L_002f: brfalse.s L_0037
    L_0031: ldloc.2 
    L_0032: callvirt instance void [mscorlib]System.IDisposable::Dispose()
    L_0037: endfinally 
    L_0038: ldloc.0 
    L_0039: ret 
    .try L_0017 to L_002e finally handler L_002e to L_0038
}

I can't find an attribute or other indicator on the Assembly level which shows whether the assembly was built with or without the /checked option.


Tuesday, September 27, 2011 3:46 PM

Okay. There were a lot of great responses but I think I'm just going to write a utility extension method that does an unchecked Sum().


Wednesday, September 28, 2011 9:18 AM

Or you can use the Aggregate method:

int test2 = data.Aggregate((a, b) => a + b);

Wednesday, September 28, 2011 9:32 AM

Where do thos numbers come from? What would be the difference if you didn't add 17 to the sum and didn't multiply that sum by 23? You're limiting the usable hashcodes to those in the form 17+23x without adding variety. Shouldn't you use distinct factors for each operand of the sum? Couldn't you get rid of that added constant? Am I missing something?


Wednesday, September 28, 2011 10:35 AM

Where do thos numbers come from? What would be the difference if you didn't add 17 to the sum and didn't multiply that sum by 23? You're limiting the usable hashcodes to those in the form 17+23x without adding variety. Shouldn't you use distinct factors for each operand of the sum? Couldn't you get rid of that added constant? Am I missing something?

That form of hash code (with two primes) works pretty well. It's a mini form of a random number generator. Specifically, it's a form of a linear congruential generator.

 


Wednesday, September 28, 2011 2:27 PM

Isn't linear congruential generator supposed to have a recurrence relation somewhere?

The OP's hashcode is nothing more than 17+23*SumOfHashCodesOfConstituents.

I wouldn't have raised an eyebrow with the following:

public override int GetHashCode()
{
        int ret = 17 * 23 + _prop1.GetHashCode();
        ret = 23 * ret + _prop2.GetHashCode();
        foreach (var item in _sequence1)
        {
                ret = 23 * ret + item.GetHashCode();
        }
        foreach (var item in _sequence2)
        {
                ret = 23 * ret + item.GetHashCode();
        }
        return ret;
}

While writing it, I remembered seeing something like this in a post of yours, and found it:

public override int GetHashCode()
{
    int hash = 17;
    hash = hash * 23 + ID.GetHashCode();
    hash = hash * 23 + Salary.GetHashCode();
    return hash;
}

 


Wednesday, September 28, 2011 5:54 PM

Oh, you're right! I misread the OP's code (my eyes saw the same code that I use... I even commented to that effect! ;)

So I agree - it looks a bit iffy!