Share via


Find repeating patterns (that you do not know in advance) in string

Question

Thursday, September 24, 2009 3:58 AM

Hello everyone,

I was wandering how can one find a repeated pattern in a string. for example:

asdxgkeopgkajdflkjbpoijadadafhjkafikeoadkjhadfkjhocihakeo

If you see the keo is repeated. BUT that I don't know it in advance. I want a function that can identify a repeating pattern and save it saying me how many times it found it.

Thanks in advance for any help

idipous

All replies (12)

Thursday, September 24, 2009 5:17 AM ✅Answered

Hi,
I just tried this one. I am not tested well. May be it will help you.

string value = textBox1.Text.Trim();
            int length=3;
            ArrayList list = new ArrayList();
            for (int i = 0; i < value.Length; i++)
            {
                if (i + length < value.Length)
                {
                    string subValue = value.Substring(i, length);
                    list.Add(subValue);
                    i = i + length;
                }
            }
            foreach (string lst in list)
            {
                foreach (string lstIn in list)
                {
                    if (Regex.IsMatch(lstIn, lst, RegexOptions.IgnoreCase))
                    {
                        MessageBox.Show("Pattern= "+lst+" "+"match= "+lstIn);
                    }
                }
            }

 

Best Regards, C.Gnanadurai Please mark the post as answer if it is helpfull to you


Thursday, September 24, 2009 5:39 AM ✅Answered | 2 votes

Probably the best answer for this question is to use algorithms for pattern discovery.

But here is a code I wrote for your example.

        private static Dictionary<string, int> FindPatterns(string value)
        {
            List<string> patternToSearchList = new List<string>();
            for (int i = 0; i < value.Length; i++)
            {
                for (int j = 2; j <= value.Length / 2; j++)
                {
                    if (i + j <= value.Length)
                    {
                        patternToSearchList.Add(value.Substring(i, j));
                    }
                }
            }
            Dictionary<string, int> results = new Dictionary<string, int>();
            foreach (string pattern in patternToSearchList)
            {
                int occurence = Regex.Matches(value, pattern, RegexOptions.IgnoreCase).Count;
                if (occurence > 1)
                {
                    results[pattern] = occurence;
                }
            }

            return results;
        }
        static void Main(string[] args)
        {
            Dictionary<string, int> result = FindPatterns("asdxgkeopgkajdflkjbpoijadadafhjkafikeoadkjhadfkjhocihakeo");
            foreach (KeyValuePair<string, int> res in result.OrderByDescending(r => r.Value))
            {
                Console.WriteLine("Pattern:" + res.Key + " occurence:" + res.Value.ToString());
            }
            Console.Read();

Note: This is not a very efficient way, using an algorith is the best solution.


Thursday, September 24, 2009 5:53 AM ✅Answered

For BoyerMore algorithm you can visit

http://www.codeproject.com/KB/recipes/BoyerMooreSearch.aspx


Thursday, September 24, 2009 6:44 AM ✅Answered | 1 vote

Hello, here is another possibility.

Code Snippet

  1. using System;
  2. using System.Collections.Generic;
  3. using System.Text.RegularExpressions;
  4.  
  5. class Program
  6. {
  7.     static void Main(string[] args) {
  8.         string str = "asdxgkeopgkajdflkjbpoijadadafhjkafikeoadkjhadfkjhocihakeo";
  9.  
  10.         foreach (KeyValuePair<string, int> pair in GetPatterns(str, 3))
  11.         {
  12.             if (pair.Value > 1)
  13.             {
  14.                 Console.WriteLine("{0}: {1}", pair.Key, pair.Value);
  15.             }
  16.         }
  17.  
  18.         Console.ReadKey();
  19.     }
  20.  
  21.     private static IEnumerable<KeyValuePair<string, int>> GetPatterns(string value, int blockLength) {
  22.         string currentBlock = string.Empty;
  23.         IList<string> list = new List<string>();
  24.         for (int i = 0; i < value.Length; i++)
  25.         {
  26.             if (i + blockLength <= value.Length)
  27.             {
  28.                 currentBlock = value.Substring(i, blockLength);
  29.                 if (!list.Contains(currentBlock))
  30.                 {
  31.                     list.Add(currentBlock);
  32.                     MatchCollection match = Regex.Matches(value, currentBlock);
  33.                     yield return new KeyValuePair<string, int>(currentBlock, match.Count);
  34.                 }
  35.             }
  36.         }
  37.     }
  38. }

Eyal, Regards.

blog.eyalsh.net


Thursday, September 24, 2009 8:55 AM ✅Answered

This can also be done fairly efficiently by generating a suffix tree for the string.

See http://en.wikipedia.org/wiki/Suffix_tree

I don't have C# code for that though.


Thursday, September 24, 2009 9:27 AM ✅Answered

This can also be done fairly efficiently by generating a suffix tree for the string.

See http://en.wikipedia.org/wiki/Suffix_tree

I don't have C# code for that though.

Just doing a google search on the behalf of Matthew Watson. :)

http://code.google.com/p/csharsuffixtree/source/browse/#svn/trunk/suffixtree

Nice thing too, also.

http://en.wikipedia.org/wiki/Aho-Corasick_algorithmEyal, Regards.

blog.eyalsh.net


Thursday, September 24, 2009 4:04 AM

What are your pattern? Are they prefined? If you don't know what are your patterns look like, it is impossible.


Thursday, September 24, 2009 5:41 AM

This is the basis for the PKZip deflate algorithm.  Look at the reference source for DeflateStream.


Saturday, September 26, 2009 12:14 AM

Thank you guys for all the help. I will give it a try with the stuff here and let you know what worked better.


Saturday, September 26, 2009 12:55 PM

Hello everyone,

I was wandering how can one find a repeated pattern in a string. for example:

asdxgkeopgkajdflkjbpoijadadafhjkafikeoadkjhadfkjhocihakeo

If you see the keo is repeated. BUT that I don't know it in advance. I want a function that can identify a repeating pattern and save it saying me how many times it found it.

Thanks in advance for any help

idipous

You have more patterns besides keo, ad, ke, eo, kj, gk, ka, df, da, af, kjh,  jh, ha


John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com


Saturday, September 26, 2009 1:11 PM

Tamer's example caught every pattern.John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com


Thursday, October 1, 2009 1:38 AM

Hi,

Does it work for you ?

HarryPlease remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.