Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Question
Wednesday, June 30, 2010 9:27 AM
Hello,
Is there a way to automaticaly determine the separator character in C#. I don't know if I should make user to choose characte from a combobox or determine by code. Any helo will be good.
All replies (10)
Wednesday, June 30, 2010 9:37 AM
Personally I would never try an automatic decision. Also Excel works with a number of predefined separators (like ",", ";", "|" just to mention a few).
If you anyway want to implement an automatic decision you "dangerously" ( ;-) ) walk against parsing (a demodè concept nowadays, like the compiler doctrines few programmer knows). You should have a way to analyze the most recurrent "typically not orthodox in sentences" chars in the file and make assumptions. Based on assumptions you should try to pass to your parser the files generated along with the presumed separator, and see if the resultant tables match enough with some "correctness" rule, that you should establish based on the business you are involved in (coordinates? commands? recipes in the cookbook?)
Giuseppe
Wednesday, June 30, 2010 9:51 AM
Given what You have written I'll go with the predefined set in a combobox. The only concern is whether the user will know what is the separator in the file he's opening.
Wednesday, June 30, 2010 10:06 AM
As a hint, don't try to put the chars simply in a combobox, but load the combobox with an enumeration of separators. I would tipically exceed in my academic approach to OOP and produce a class "Separator" and then make a List<ISeparator>, which I would object-bind to combobox putting the list as member of a controller class of the form (MVC pattern is quite useful here). Also if you are more moderate than me, and use vanilla ASCII chars, I would put them as chars (string) values in XML app configuration file section, so to easy future updates.
Giuseppe
Wednesday, June 30, 2010 12:41 PM
Automatically determine which is the separator in a comma-separated values file?
Wednesday, June 30, 2010 12:56 PM
Heh, that's what I wondered too. :)
Wednesday, June 30, 2010 1:19 PM
All joking about Comma Separated Values and non-comma separators aside..
You can try to guess, but if you do, you might show the first two or three rows to the user so they can verify and alter parsing parameters (separator char, first-row is header, etc.) if incorrect.
Show me your CSV parsing code and I'll confirm that you did it incorrectly. Ninety-nine out of every one hundred programmers get it wrong. I've lost count of the number of "professional" applications I've seen with broken CSV parsers.
Hint #1: Quotation marks are used where a value includes the separator character.
Hint #2: EOL marks can be part of a value if enclosed in quotation marks.
Hint #3: Quotation marks in values are escaped with double-quotation marks.
Wednesday, June 30, 2010 5:26 PM
Use the TextFieldParser (Reference Microsoft.VisualBasic) and try possible delimiters until you don't get a MalformedLineException while reading the file.
Wednesday, June 30, 2010 9:17 PM
Use the TextFieldParser (Reference Microsoft.VisualBasic) and try possible delimiters until you don't get a MalformedLineException while reading the file.
I wasn't aware of that one, so I gave it a test:
string fileData = "\"One\r\nSplit\", \"\"\"Two\"\", and\", \tThree, \"\tFour\tboats\"";
// "One|Split", """Two"", and", Three, "|Four|boats"
//
// correct result is 1 row, 4 fields
//
// field 1 is: One|Split
// where the | is a EOL mark
// field 2 is: "Two", and
// field 3 is: Three
// -- or --
// field 3 is: |Three
// trimming field values is often debated, but considering the quoted rule (below), I say trim is correct
// field 4 is: |Four|boats
// where the | is a tab character (trimming inside quotes is not desired)
var parser = new TextFieldParser(new StringReader(fileData))
{ Delimiters = new string[] { "," } };
string[] fields = parser.ReadFields();
Console.WriteLine(fields.Length);
foreach (string field in fields)
Console.WriteLine(field);
It fails on field #4 by trimming whitespace inside the quotation marks.
Wednesday, July 7, 2010 10:07 PM
Hi,
I would use a default separator for instance " ; " . But If you want to let user free to choose any separator, you can use a regex pattern to parse the files :
namespace Test.CSVSeparators
{
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
class Program
{
static void Main(string[] args)
{
// possible separators : ',' ';' ':' '|'
string csvPattern = "(,)|(;)|(:)|"
+ "(" + Regex.Escape("|") + ")";
string csv1 = "a ; b ; c ; d "; // ;
string csv2 = "a , b , c , d "; // ,
string csv3 = "a | b | c | d "; // -
string csv4 = "a : b : c : d "; // :
List<string> keywords1 = Regex.Split(csv1, csvPattern)
.Where(x => !Regex.Match(x, csvPattern).Success).ToList();
List<string> keywords2 = Regex.Split(csv2, csvPattern)
.Where(x => !Regex.Match(x, csvPattern).Success).ToList();
List<string> keywords3 = Regex.Split(csv3, csvPattern)
.Where(x => !Regex.Match(x, csvPattern).Success).ToList();
List<string> keywords4 = Regex.Split(csv4, csvPattern)
.Where(x => !Regex.Match(x, csvPattern).Success).ToList();
}
}
}
Regards,
Monday, November 7, 2016 1:22 PM | 2 votes
Depending of the current culture, a CSV might be delimited by e.g. ',' or ';'
A proper way to get that is to read the System.Globalization.TextInfo.ListSeparator property of a given cuture, e.g.
System.Globalization.CultureInfo.CurrentCulture.TextInfo.ListSeparator