Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Question
Thursday, June 29, 2017 2:44 PM
How to find 10 most common value (words) from a column table? What sql query to use?
All replies (9)
Monday, July 3, 2017 1:40 PM âś…Answered
I get values from a MatchCollection and this doesn't work.
I solved it using this query, to get all rows as one single row:
db.QuerySingle(@"SELECT STUFF((SELECT TOP 100 ',' + Cities FROM EuMap FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 1,'') AS One")
And this function:
public static List<string> CommonTags(string mct)
{
var hashtags = new List<string>();
MatchCollection matchtgs = Regex.Matches(mct, @"\B#\w\w+", RegexOptions.IgnoreCase | RegexOptions.Compiled);
matchtgs.Cast<Match>().Select(m => m.Value.ToLowerInvariant()).GroupBy(s => s).OrderByDescending(g => g.Count()).Select(g => g.Key).Distinct().ToList().ForEach(delegate (string s)
{
hashtags.Add(s);
}
);
return hashtags;
}
Thursday, June 29, 2017 4:30 PM
Are their individual words in the specific columns (i.e. single words and not paragraphs)? If so, you could simply group on the content of the specific column and order the size of the groupings:
SELECT TOP 10
YourColumn,
COUNT(YourColumn) AS Appearances
FROM YourTable
GROUP BY YourColumn
ORDER BY Appearances DESC
Thursday, June 29, 2017 5:07 PM
Are their individual words in the specific columns
No, but they start with #. e.g. #eggs and are part of text paragraphs.
Friday, June 30, 2017 11:53 AM
that will help
Friday, June 30, 2017 1:43 PM
No, but they start with #. e.g. #eggs and are part of text paragraphs.
This will complicate things a bit as you'll have to manually split the string up using a given delimiter (likely a space). SQL doesn't necessarily support this type of behavior out of the box (as easy as you might in C#), but the link that jkjhse provided will define a function that should handle that for you.
You might consider if this needs to be done entirely in SQL, as it may be much easier to do this within your C# code (via a simple Split() call and a little bit of LINQ).
Friday, June 30, 2017 4:10 PM
I may consider something like this:
Regex.Split("Hello World This is a great world, This World is simply great".ToLower(), @"\W+")
.Where(s => s.Length > 3)
.GroupBy(s => s)
.OrderByDescending(g => g.Count())
it may be much easier to do this within your C# code
I think so, but what if i consider 100 rows or more? Any e.g.?
Friday, June 30, 2017 9:32 PM
I think so, but what if i consider 100 rows or more? Any e.g.?
It shouldn't be any issue.
Now if you start getting into thousands and thousands of rows, then you might start to run into some issues, but it really just depends on your scenario. Additionally, you could use a Take(10) on your current query to only grab the top 10 as well.
Sunday, July 2, 2017 9:22 AM
Just one last thing for the following code i'm trying to achieve it with Linq but there's an error i get
System.Linq.Enumerable+WhereSelectEnumerableIterator`2[System.Linq.IGrouping`2[System.Char,System.Char],System.Char]
var query = db.Query("SELECT TOP 100 Cities FROM EuMap");
foreach (var row in query)
{
MatchCollection links = Regex.Matches(row.Cities, @"(\B#)(\w\w+)", RegexOptions.IgnoreCase | RegexOptions.Compiled);
foreach (Match link in links)
{
<p>@link.Value.GroupBy(s => s).OrderByDescending(g => g.Count()).Select(g => g.Key)</p>
}
}
If i don't use
.GroupBy(s => s).OrderByDescending(g => g.Count()).Select(g => g.Key)
i get this results
#Berlin
#Bremen
#Paris
#Madrid
#Munich
#Berlin
#Lyon
#Berlin
#Munich
...
Monday, July 3, 2017 7:27 AM
Hi dow7,
you can also try to refer example below may help you.
Dictionary<string, int> GetTopKWords(string input, int k)
{
string[] words = Regex.Split(input, @"\W");
var occurrences = new Dictionary<string, int>();
foreach (var word in words)
{
string lowerWord = word.ToLowerInvariant();
if (!occurrences.ContainsKey(lowerWord))
occurrences.Add(lowerWord, 1);
else
occurrences[lowerWord]++;
}
return (from wp in occurrences.OrderByDescending(kvp => kvp.Value) select wp).Take(k).ToDictionary (kw => kw.Key, kw => kw.Value);
}
var input = "the quick brown fox is brown and jumps over the brown log over the long fire and quickly jumps to a brown fire fox";
GetTopKWords(input, 10);
Output:
Reference:
Using LINQ in C# to easily get a list of most often used words
Regards
Deepak