Share via


Removing duplicate results in a foreach loop

Question

Tuesday, September 2, 2008 8:37 AM

I was wondering how would I be able to remove duplicate results using the following foreach loop? Then displaying each result in a separate textbox.

Thank you in advance.

                Regex rx = new Regex("(?:value=\)(?<Value>[^\\n\r]*)(?:\)");

                MatchCollection ms = rx.Matches(MSAGResult);

                foreach (Match m in ms)

                {

                    for(int i = 0; i < m.Groups["Value"].Captures.Count; i++)

                    {

 

                      if(m.Groups["Value"].Equals(m.Groups["Value"]))

                      {

                         

                      }

 

                    }

 

                }

All replies (7)

Tuesday, September 2, 2008 9:22 AM âś…Answered

 

Hashtable htMatches = new Hashtable();

Regex rx = new Regex("abc");

MatchCollection ms = rx.Matches(text);

foreach (Match m in ms)

{

if (!htMatches.ContainsKey(m.Value))

    htMatches.Add(m.Value, m);

}

htMatches.Values will have all the distinct matches;


Tuesday, September 2, 2008 8:51 AM

Try using a generic dictionary collection or a hashtable for storing the result from the for loop or use Linq to get distinct m from ms ms.cast<Type>.Distinct()


Tuesday, September 2, 2008 8:54 AM

Thank you, Narotham. But, I am still running into the problem of how to remove the duplicate results. 


Tuesday, September 2, 2008 8:58 AM

First: You generally can't change a collection while enumerating it - your iterator will break. If you have List<T> you can use the predicate-based List<T>.RemoveAll(Predicate<T>), otherwise the typical approach is either to create a new list of the items you want to keep, or a list of the items you want to remove, and do the actual removal afterwards. Alternatively you can do things like enumerating via an indexer in reverse. However, I'm not sure that you can mutate a MatchCollection, so this probably isn't an option.

If you are using .NET 3.5, LINQ provides an interesting option:

var groups = ms.Cast<Match>().Select(match => match.Groups["Value"]).Distinct();


Marc


Tuesday, September 2, 2008 9:09 AM

Thanks you Marc, it looks like I need to upgrade from 3.0 :-) . If I cant remove the duplicate results, is there a way to only show the result once instead of 2x?

The reason for this is that I am attempting my first web scrape from one of my work sites. Here is an the source that = MSAGresult above:

MSAG Number:</td>

                    <td colspan="3" style="width: 784px" valign="top">

                        <div style="padding-left:3px;"><input name="gridOrder:_ctl9:multilineControl:orderLevelControl:txtMSANO" type="text" value="1860" maxlength="10" id="gridOrder__ctl9_multilineControl_orderLevelControl_txtMSANO" disabled="disabled" title="SANO: MSAG Addr Number (10an)" class="clsTextBox" style="width:80px;" /> 

                        <input name="gridOrder:_ctl9:multilineControl:orderLevelControl:txtMSASF" type="text" maxlength="4" id="gridOrder__ctl9_multilineControl_orderLevelControl_txtMSASF" disabled="disabled" title="SASF: MSAG Addr Number Suffix (4an)" class="clsTextBox" style="width:40px;" /> 

                        <input name="gridOrder:_ctl9:multilineControl:orderLevelControl:txtMSASD" type="text" maxlength="2" id="gridOrder__ctl9_multilineControl_orderLevelControl_txtMSASD" disabled="disabled" title="SASD: MSAG Addr Street Directional Prefix (2an)" class="clsTextBox" style="width:30px;" /></div>

                    </td>

                </tr>

                <tr>

                    <td valign="top" class="LeftMarginTd">

                        Svc Street:</td>

                    <td valign="top" class="LeftBodyTd">

                        <input name="gridOrder:_ctl9:multilineControl:orderLevelControl:txtSASN" type="text" value="STIREWALT" maxlength="60" id="gridOrder__ctl9_multilineControl_orderLevelControl_txtSASN" disabled="disabled" title="SASN: Svc Addr Street Name (60an)" class="clsTextBox" style="background-color:White;width:240px;" /> 

                        <input name="gridOrder:_ctl9:multilineControl:orderLevelControl:txtSATH" type="text" value="RD" maxlength="7" id="gridOrder__ctl9_multilineControl_orderLevelControl_txtSATH" disabled="disabled" title="SATH: Svc Addr Street Type (7an)" class="clsTextBox" style="background-color:White;width:60px;" /> 

                        <input name="gridOrder:_ctl9:multilineControl:orderLevelControl:txtSASS" type="text" maxlength="2" id="gridOrder__ctl9_multilineControl_orderLevelControl_txtSASS" disabled="disabled" title="SASS: Svc Addr Street Suffix (2an)" class="clsTextBox" style="background-color:White;width:30px;" />

                    </td>

                    <td valign="top" class="CenterMarginTd">

                        MSAG Street:</td>

                    <td colspan="3" style="width: 784px" valign="top">

                        <div style="padding-left:3px;"><input name="gridOrder:_ctl9:multilineControl:orderLevelControl:txtMSASN" type="text" value="STIREWALT" maxlength="60" id="gridOrder__ctl9_multilineControl_orderLevelControl_txtMSASN" disabled="disabled" title="SASN: MSAG Addr Street Name (60an)" class="clsTextBox" style="width:240px;" /> 

                        <input name="gridOrder:_ctl9:multilineControl:orderLevelControl:txtMSATH" type="text" value="RD" maxlength="7" id="gridOrder__ctl9_multilineControl_orderLevelControl_txtMSATH" disabled="disabled" title="SATH: MSAG Addr Street Type (7an)" class="clsTextBox" style="width:60px;" /> 

                        <input name="gridOrder:_ctl9:multilineControl:orderLevelControl:txtMSASS" type="text" maxlength="2" id="gridOrder__ctl9_multilineControl_orderLevelControl_txtMSASS" disabled="disabled" title="SASS: MSAG Addr Street Suffix (2an)" class="clsTextBox" style="width:30px;" /></div>

                    </td>

                </tr>

                <tr>

                    <td valign="middle" class="LeftMarginTd">

                        Svc City/State:</td>

                    <td class="LeftBodyTd">

                        <input name="gridOrder:_ctl9:multilineControl:orderLevelControl:txtCITY2" type="text" value="CHINA GROVE" maxlength="32" id="gridOrder__ctl9_multilineControl_orderLevelControl_txtCITY2" disabled="disabled" title="CITY1: City (32an)" class="clsTextBox" style="background-color:White;width:130px;" /> 

                        <input name="gridOrder:_ctl9:multilineControl:orderLevelControl:txtSTATE2" type="text" value="NC" maxlength="2" id="gridOrder__ctl9_multilineControl_orderLevelControl_txtSTATE2" disabled="disabled" title="STATE1: State (2a)" class="clsTextBox" style="background-color:White;width:30px;" /> 

                        <input name="gridOrder:_ctl9:multilineControl:orderLevelControl:txtZipCode2" type="text" value="280237710" maxlength="9" id="gridOrder__ctl9_multilineControl_orderLevelControl_txtZipCode2" disabled="disabled" title="ZipCode1: Zip/Postal Code (9an)" class="clsTextBox" style="background-color:White;width:75px;" />

                    </td>

                    <td class="CenterMarginTd">

                        MSAG City/State:</td>

                    <td colspan="3" style="width: 784px">

                        <div style="padding-left:3px;"><input name="gridOrder:_ctl9:multilineControl:orderLevelControl:txtMCity" type="text" value="CHINA GROVE" maxlength="32" id="gridOrder__ctl9_multilineControl_orderLevelControl_txtMCity" disabled="disabled" title="CITY: City (32an)" class="clsTextBox" style="width:130px;" /> 

                        <input name="gridOrder:_ctl9:multilineControl:orderLevelControl:txtMState" type="text" value="NC" maxlength="2" id="gridOrder__ctl9_multilineControl_orderLevelControl_txtMState" disabled="disabled" title="STATE: State (2a)" class="clsTextBox" style="width:30px;" /> 

                        <input name="gridOrder:_ctl9:multilineControl:orderLevelControl:txtMZipCode" type="text" value="280237710" maxlength="9" id="gridOrder__ctl9_multilineControl_orderLevelControl_txtMZipCode" disabled="disabled" title="

 

As you can see if you do a Control + F on this section I am trying to get all the results that are after '  value="  ' . This foreach loop does that except for the problem of running into duplicates when I want to display them.


Tuesday, September 2, 2008 9:12 AM

Try this

public class RegexTest

{

public static void Match(string text)

{

Regex rx = new Regex("[abc]");

MatchCollection ms = rx.Matches(text);

IEnumerable<Match> matches = ms.Cast<Match>().Distinct(new MyComparer());

}

}

public class MyComparer : IEqualityComparer<Match>

{

#region IEqualityComparer<Match> Members

public bool Equals(Match x, Match y)

{

return x.Value == y.Value;

}

public int GetHashCode(Match obj)

{

return obj.Value.GetHashCode();

}

#endregion

}


Tuesday, September 2, 2008 9:39 AM

Thank you, you have been a lot of help.