Share via


Sorting DataTable on multiple columns with Linq

Question

Friday, July 31, 2020 6:23 PM

I know there are a lot of threads related to sorting a DataTable, but I was not able to find anything that gives a clear solution to my problem. I currently have something that works, but I am curious if there is a better way to do the sorting.

The problem seems simple enough: I have a DataTable that I want to sort on multiple columns that can change (the user wants to sort on selected columns) and the sorting needs to be custom because of certain business rules. So the DataView Sort cannot be used (unless I would create some types that have a strange implementations of ToString). The next best thing is Linq with OrderBy and ThenBy. In the end, the solution I currently have is based on this, although the code is somewhat fragile.

I will simplify the problem and eliminate the custom sorting in order to illustrate the issue with the Linq code. A table with people that have first, last and middle names is sorted by First, Last, then Middle.

static void TestSortTable()
{
    DataTable people = new DataTable();
    people.Columns.Add("First", typeof(string));
    people.Columns.Add("Last", typeof(string));
    people.Columns.Add("Middle", typeof(string));
            
    people.Rows.Add("A", "M", "D");
    people.Rows.Add("A", "B", "E");
    people.Rows.Add("B", "B", "E");
    people.Rows.Add("B", "B", "A");
    people.Rows.Add("B", "B", "D");
    people.Rows.Add("C", "M", "E");
    people.Rows.Add("C", "D", "A");
    PrintPeople(people, "All people:");

    string[] sortedColumns = { "First", "Last", "Middle" };
    DataTable sortedPeople;

    sortedPeople = SortWithLinqGood(people, sortedColumns);
    PrintPeople(sortedPeople, "LINQ good:");
    
    sortedPeople = SortWithLinqBad(people, sortedColumns);
    PrintPeople(sortedPeople, "LINQ bad:");
    
    Console.ReadLine();
}

static DataTable SortWithLinqGood(DataTable data, string[] sortedColumns)
{
    string sortedColumn = sortedColumns[0];
    OrderedEnumerableRowCollection<DataRow> sorted = data.AsEnumerable().OrderBy(x => x.Field<string>(sortedColumn));
    for(int i = 1; i < sortedColumns.Count(); i++)
    {
        string additionalColumn = sortedColumns[i];
        sorted = sorted.ThenBy(x => x.Field<string>(additionalColumn));
    }

    return sorted.CopyToDataTable();
}

static DataTable SortWithLinqBad(DataTable data, string[] sortedColumns)
{
    string sortedColumn = sortedColumns[0];
    OrderedEnumerableRowCollection<DataRow> sorted = data.AsEnumerable().OrderBy(x => x.Field<string>(sortedColumn));
    for(int i = 1; i < sortedColumns.Count(); i++)
    {
        sortedColumn = sortedColumns[i];
        sorted = sorted.ThenBy(x => x.Field<string>(sortedColumn));
    }

    return sorted.CopyToDataTable();
}

The output from executing TestSortTable() is:

All people:
First, Last, Middle
    A     M       D
    A     B       E
    B     B       E
    B     B       A
    B     B       D
    C     M       E
    C     D       A

LINQ good:
First, Last, Middle
    A     B       E
    A     M       D
    B     B       A
    B     B       D
    B     B       E
    C     D       A
    C     M       E

LINQ bad:
First, Last, Middle
    B     B       A
    C     D       A
    A     M       D
    B     B       D
    A     B       E
    B     B       E
    C     M       E

The first Linq function where a new variable additionalColumn is declared in the loop is good. Trying to reuse the sortedColumn variable results in a bad sort.

Trying to change the loop and not use any variable, but sortedColumns[i] will result in an exception during the execution with index out of range. This is actually the clue to what happens: the expression is evaluated only when the result is used, so at that point i is out of range.

The documentation for OrderBy hints at that:

This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.

This does not make it clear that there is an issue with the code.

The method that creates a new variable inside the loop works, but somebody looking at the code could think that it can be "optimized" and make a change that might seem OK although the result will be affected.

Does anyone have a better idea for sorting a DataTable on multiple columns using a method that also allows for custom sorting?

Thanks!

All replies (5)

Saturday, August 1, 2020 2:19 AM âś…Answered | 2 votes

Taking into consideration a series of articles such as /en-us/archive/blogs/ericlippert/closing-over-the-loop-variable-considered-harmful, try this approach too:

static DataTable SortWithLinqGood2( DataTable data, string[] sortedColumns )
{
   var sorted = data.AsEnumerable( ).OrderBy( x => x.Field<string>( sortedColumns.First( ) ) );

   foreach( var column in sortedColumns.Skip( 1 ) )
   {
      sorted = sorted.ThenBy( x => x.Field<string>( column ) );
   }

   return sorted.CopyToDataTable( );
}

In your bad functions, sortedColumn and i represent a kind of shared variables that are overwritten, keeping the last values.

In case of foreach loop in current C#, a fresh copy of variable is made.


Friday, July 31, 2020 9:10 PM

Does your DataTable have a primary key? 

Not sure if this is relevant, but I got some wrong results before and this was the solution in my case

https://stackoverflow.com/questions/14217285/linq-select-on-a-sql-view-gets-wrong-answer (making sure the result has a PK and adding AsNoTracking())

For every expert, there is an equal and opposite expert. - Becker's Law

My blog

My TechNet articles


Friday, July 31, 2020 10:02 PM | 1 vote

I have not fully tested this so it may be imperfect. Requires a CheckedListBox and button.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace SortDataTable
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
            Shown += Form1_Shown;
        }

        private DataTable _peopleDataTable;
        private void Form1_Shown(object sender, EventArgs e)
        {
            dataGridView1.AllowUserToAddRows = false;
            _peopleDataTable = peopleDataTable();

            checkedListBox1.DataSource = _peopleDataTable.Columns
                .Cast<DataColumn>()
                .Select(col => col.ColumnName)
                .ToList();
        }

        private DataTable peopleDataTable()
        {
            var people = new DataTable();

            people.Columns.Add("First", typeof(string));
            people.Columns.Add("Last", typeof(string));
            people.Columns.Add("Middle", typeof(string));

            people.Rows.Add("A", "M", "D");
            people.Rows.Add("A", "B", "E");
            people.Rows.Add("B", "B", "E");
            people.Rows.Add("B", "B", "A");
            people.Rows.Add("B", "B", "D");
            people.Rows.Add("C", "M", "E");
            people.Rows.Add("C", "D", "A");

            return people;
        }

        private void button1_Click(object sender, EventArgs e)
        {

            string[] sortedColumns = checkedListBox1.CheckedNames();

            if (sortedColumns.Length != 0)
            {
                dataGridView1.DataSource = Work(sortedColumns);
            }
        }

        private DataTable Work(string[] sortedColumns)
        {
            var results = _peopleDataTable.AsEnumerable()
                .OrderBy(col => col.Field<string>(sortedColumns
                    .First()));

            if (sortedColumns.Length >1)
            {

                results = sortedColumns.Skip(1)
                    .Aggregate(results, (current, column) =>
                        current.ThenBy(dataRow => dataRow.Field<string>(column)));
            }


            return results.CopyToDataTable();
        }
    }

    public static class CheckedListBoxExtensions
    {
        /// <summary>
        /// Get checked items as string
        /// </summary>
        /// <param name="sender"></param>
        /// <returns></returns>
        public static string[] CheckedNames(this CheckedListBox sender) =>
            sender.Items.Cast<string>().Where((stringToken, index) => sender.GetItemChecked(index))
                .Select(stringToken => stringToken)
                .Select(x => x).ToArray();
    }
}

Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

NuGet BaseConnectionLibrary for database connections.

StackOverFlow


Wednesday, August 12, 2020 7:26 PM

I think this makes the code clear - thanks. The fact that foreach creates a new variable each time is "hidden", but it is definitely less likely for somebody to come in and change this loop.

I was looking into the for loop with an index since in the real project the sorted columns were actually extracted from another data structure with a way to change the sorting order and direction. But I think that code can be adapted to use the foreach since it is safer long term.


Wednesday, August 12, 2020 7:36 PM

Thank you, this is another way to do it.