Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Question
Wednesday, August 27, 2014 5:50 PM
Hello, I have the following sample data(below) that is in a datatable. I need to select into a new datatable distinct rows by colname and where the most occurance of Datatype, then if a tie exists with DataType use Length.
I can get the correct data with this sql statement if the data was in a sql database table:
SELECT
*
FROM (
SELECT
*
, ROW_NUMBER() OVER (PARTITION BY Datatype
ORDER BY length DESC) AS w
FROM TN_ImportFileColumns
) x
WHERE w = 1
What I need to have:
ColumnName DataType Length
created datetime 10
phone int 10
name string 4
From this sample data:
ColumnName ExampleData DataType Length
name scott string 4
name ben string 3
name kim string 3
phone 1111111111 int 10
phone 222222 int 6
phone string 0
created 12-31-2014 datetime 10
created 11-09-2013 datetime 10
created 11092013 int 8
Some background info:
The data in the datatable comes from a parse of a text file where I'm trying to determine the most likely datatype for a give row. In my sample data there are 3 columns: name, phone and created. But you see that created has 2 entries for datetime and one for int. Since column(datatype) is most likely datetime then that is the row I need for the given columnName.
newb
All replies (7)
Wednesday, August 27, 2014 6:29 PM ✅Answered
You will most likely use a Linq projection like in the code below and build a collection of custom objects. Then you can walk the collection with foreach loop and build table records from the objects. You use a distinct function. But more likely, you'll have to go into some kind of foreach loop to detemine what the column type is and get records that way.
Yiu might be able to even use Linq with a Where clause to selete based on column type.
http://www.blackwasp.co.uk/LinqProjection.aspx
http://blogs.msdn.com/b/charlie/archive/2006/11/19/linq-farm-group-and-distinct.aspx
// Fill the DataSet.
DataSet ds = new DataSet();
ds.Locale = CultureInfo.InvariantCulture;
FillDataSet(ds);
DataTable products = ds.Tables["Product"];
var query = products.AsEnumerable().
Select(product => new
{
ProductName = product.Field<string>("Name"),
ProductNumber = product.Field<string>("ProductNumber"),
Price = product.Field<decimal>("ListPrice")
});
Console.WriteLine("Product Info:");
foreach (var productInfo in query)
{
Console.WriteLine("Product name: {0} Product number: {1} List price: ${2} ",
productInfo.ProductName, productInfo.ProductNumber, productInfo.Price);
}
Thursday, August 28, 2014 12:42 PM ✅Answered
I started looking into writing something for this.
Working with datatables in LINQ is a right nuisance.
Personally....
I would probably write the data to a database.
A local sql ce database perhaps.
You can then use sql to manipulate the data, which will be way less painfull.
I'm not sure if sqlbulkcopy works with sql ce but it would with proper sql server to a temporary table.
public bool BulkInsertDataTable(string tableName, DataTable dataTable)
{
bool isSuccuss;
try
{
SqlConnection SqlConnectionObj = GetSQLConnection();
SqlBulkCopy bulkCopy = new SqlBulkCopy(SqlConnectionObj, SqlBulkCopyOptions.TableLock | SqlBulkCopyOptions.FireTriggers | SqlBulkCopyOptions.UseInternalTransaction, null);
bulkCopy.DestinationTableName = tableName;
bulkCopy.WriteToServer(dataTable);
isSuccuss = true;
}
catch (Exception ex)
{
isSuccuss = false;
}
return isSuccuss;
}
Once you have the data there you can use sql.
That partition stuff probably wouldn't work with sql ce, you might need to nest queries.
Hope that helps
Please don't forget to up vote answers you like or which help you and mark one(s) which answer your question.
Thursday, August 28, 2014 3:39 PM ✅Answered
Try this
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
DataTable dt = new DataTable();
DataTable dt1 = new DataTable();
List<List<object>> input = new List<List<object>>() {
new List<object>{ "name", "scott", typeof(string) , 4},
new List<object>{ "name", "ben", typeof(string), 3},
new List<object>{ "name", "kim", typeof(string), 3},
new List<object>{ "phone", 1111111111, typeof(int), 10},
new List<object>{ "phone", 222222, typeof(int), 6},
new List<object>{ "phone",null, typeof(string),0},
new List<object>{ "created", 12-31-2014, typeof(DateTime), 10},
new List<object>{ "created", 11-09-2013, typeof(DateTime),10},
new List<object>{ "created", 11092013, typeof(int),8}
};
public Form1()
{
InitializeComponent();
dt.Columns.Add("ColumnName", typeof(string));
dt.Columns.Add("ExampleData", typeof(object));
dt.Columns.Add("DataType", typeof(Type));
dt.Columns.Add("Length", typeof(int));
foreach(List<object> row in input)
{
DataRow newRow = dt.Rows.Add();
newRow.ItemArray = row.ToArray();
dataGridView1.DataSource = dt;
}
}
private void button1_Click(object sender, EventArgs e)
{
var results = (dt.AsEnumerable()
.GroupBy(x => x.Field<string>("ColumnName"))
.ToList()
.Select( x => new {
ColumnName = x.Key,
Type = x.Take(1).Single().Field<Type>("DataType"),
Length = x.Select(y => y.Field<int>("Length")).Max()
}));
dt1.Columns.Add("ColumnName", typeof(string));
dt1.Columns.Add("DataType", typeof(Type));
dt1.Columns.Add("Length", typeof(int));
foreach (var row in results)
{
dt1.Rows.Add(new object[]{ row.ColumnName, row.Type, row.Length });
}
dataGridView1.DataSource = dt1;
}
}
}
jdweng
Friday, August 29, 2014 4:04 PM ✅Answered
Joel's query is close, but doesn't return the "DataType" column from the row that has the largest Length.
You can grab the target row first, then select the values from that.
var results = dt.AsEnumerable()
.GroupBy(x => x.Field<string>("ColumnName"))
.Select(x => new
{
Key = x.Key,
FirstRow = x.OrderBy(y => y.Field<int>("Length")).First()
})
.Select( x => new
{
ColumnName = x.Key,
DataType = x.FirstRow.Field<Type>("DataType"),
Length = x.FirstRow.Field<int>("Length")
}
);
David
David http://blogs.msdn.com/b/dbrowne/
Wednesday, August 27, 2014 7:40 PM
I'm not sure it's that hard.
Breaking the algorithm down:
First group by ColumnName, Datatype, Count(*) as c, Max(Length) as mx
Then group by Columname, Datatype, c, mx, sorted by Columname, Datatype, c desc, mx desc
Hope that helps
Please don't forget to up vote answers you like or which help you and mark one(s) which answer your question.
Thursday, August 28, 2014 2:47 AM
Worst case scenario. If you cannot do everything in one LINQ statement then you can always spilt them each SELECT a LINQ query, and loop in through to filter.
chanmm
chanmm
Friday, August 29, 2014 12:43 PM
Thank you everyone for your assistance. I learning and testing. I will get back to everyone with my solution. I like jdweng's solution best so far.
newb