Share via


Linq searching ignoring accents.

Question

Tuesday, August 10, 2010 1:56 AM

I have a collection with objects. Each object has a variable of type String. I wanna make some searchs on this collection ignoring accents that could have on variable of t ype String.

 

Cheers.

All replies (18)

Wednesday, August 11, 2010 12:50 AM ✅Answered | 2 votes

Normalize the string using FormD then remove all characters with values outside the ASCII space:

 

  String text = "A big text comes here with many words that have accents for example: Usuários, Você, Instantânea, etc.";
  string ntext = new string(text.Normalize(NormalizationForm.FormD).Where(c=>c<128).ToArray());
  var i = ntext.IndexOf("Usuario", StringComparison.InvariantCultureIgnoreCase);

 


Wednesday, August 11, 2010 1:03 AM ✅Answered | 1 vote

Is this what you are looking for?

var search = searchText.ToCharArray();
var q = from obj in collection
    where obj.Text.Normalize(NormalizationForm.FormD).ToCharArray().Contains(search)
    select obj;

I'm really having n hard time trying to understand what you need.

Paulo Morgado


Tuesday, August 10, 2010 3:24 AM

Hi,

String comparisons are done in the current culture of the UI thread. If you're asking how to do an invariant culture comparison, you can use the String.Equals overload that allows you to specify a StringComparison value.

Otherwise, I'm not sure how the culture you're dealing with will handle accents during a sort. Let me know if this isn't the direction of help you're looking for. 

Good luck!

-Scosby


Tuesday, August 10, 2010 10:23 PM

Hi,

 

Well, I tryed to use your tips but I didn´t get what I want. So, I will try, with code, explain what I need.

String text = "A big text comes here with many words that have accents for example: Usuários, Você, Instantânea, etc.";
if (text.IndexOf("Usuario", StringComparison.InvariantCultureIgnoreCase) >= 0) {
  MessageBox.Show("Ok");
}

You see, If exist some words with accents I wanna make a search without accents, for example, and it have to show all words without accents or not.


Wednesday, August 11, 2010 12:23 AM

Hi Eliezer,

Can you give more information on your problem?

What exactly are you have dificulties with?

What LINQ provider are you working with?

Paulo Morgado


Wednesday, August 11, 2010 11:19 AM

Yes guys, this is it. I didn't know the class NormalizationForm. What I need to do is exactly what Paulo Morgado provided on his linq instruction.

Cheers.


Wednesday, August 11, 2010 12:02 PM

Just another question. Why I have to use this Where clause like Luise.FR put in his code?

(text.Normalize(NormalizationForm.FormD).Where(c=>c<128)

I tryed another ways to normalize, but without this "where clause" my search doesn't works.

 

 

 


Wednesday, August 11, 2010 1:49 PM

ToCharArray().Contains(search)

 You cannot find a string inside an array of chars.


Wednesday, August 11, 2010 2:04 PM

To simplify, the Normalize method turns a string like

Usuários não têm café.

into a string similar to

Usua´rios na~o te^m cafe´.

The accented letters are replaced by the naked letter + diacritic. The diacritics being in ranges starting at U+0300 (character code 768), removing all non-ASCII characters from the normalized string leaves only

Usuarios nao tem cafe.

The Where method does just that: keep only the characters with codes less than 128.


Wednesday, August 11, 2010 2:22 PM

Louis,

search is searchText.ToCharArray().

Paulo Morgado


Wednesday, August 11, 2010 2:54 PM

I still wonder if your first suggestion of using a String.Equals overload that allows to specify StringComparison is good enough. It never failed to me and I speak nearly the same language as Eliezer.

Paulo Morgado


Wednesday, August 11, 2010 3:30 PM

You cannot find a char array inside a char array.


Wednesday, August 11, 2010 3:43 PM

None of the 166 specific cultures found on my computer consider strings equal when they differ by accent.

Portuguese is certainly not a language where one would expect such strings to be considered equal.

"tem" e "têm" are really two different words.


Wednesday, August 11, 2010 4:16 PM

Huge mistake!!! My bad!!! :P

I usually deal with identifiers that carefully don't use accented characters and use databases case insensitive and accent insensitive.

I guess a culture dependent accent ignoring comparison would be useful.

Paulo Morgado


Wednesday, August 11, 2010 4:56 PM

The Where method does just that: keep only the characters with codes less than 128.

All accents have codes over 128? Is it true for all languages that have support to accents?

What is the cost to perfomance to use this "Where clause" to check every char code on each string variable necessary for my search?

I usually deal with identifiers that carefully don't use accented characters and use databases case insensitive and accent insensitive.

In this case I'm not using databases. I'm reading information that is sent for others application by socket. So, I can´t control what programming languages they use nor even the charset so, I have to take carefull.

 

Just to help someone in the future here is the result using codes of Louis.fr and Paulo Morgado:

 

var search = new String(MyTextBox.Text.Trim().Normalize(NormalizationForm.FormD).Where(c => c < 128).ToArray());

var result = (from obj in myCollection
    where
    new String(obj.Name.Normalize(NormalizationForm.FormD).Where(c => c < 128).ToArray())
    .ToLower().Contains(search.ToLower())
    select obj).ToList();

 

Cheers.

 


Wednesday, August 11, 2010 8:46 PM

All accents have codes over 128? Is it true for all languages that have support to accents?

There is no accent below code 128. It is not language dependent.

You can see all ASCII characters here: http://en.wikipedia.org/wiki/ASCII

I would need to look at a chart to know the codes of the diacritics, but what's sure is they all have codes over 128.


Thursday, August 12, 2010 1:15 AM

Eliezer,

You should avoid creating unnecessary strings or arrays.

For better performance, you should also create your own Contains method that doesn't need strings to be built.

Paulo Morgado


Thursday, September 13, 2018 1:50 PM

With your help I solved the problem by saying:

lstUsers.Where(x => 
x.Email
.Normalize(NormalizationForm.FormD)
.ToUpper()
.Contains(
    buscar
    .Normalize(NormalizationForm.FormD)
    .ToUpper()
)).ToList();