Share via


How to use Regex to parse and breakup URL using c#.net

Question

Tuesday, October 25, 2011 7:52 AM

For a sample URL below:

"http://servername/username/collectionkey/sectionkey.extension"

The URL is in the following format:

1.       Literal “http://” (case in-sensitive)

2.       Server name in the form of non-white space value except “/”, followed by /

3.       User name in the form of non-white space value except “/”, followed by /

4.       Collection key as a numeric value followed by optional / or .xml

5.       Optional section key as a numeric value followed by optional .xml

 How can we use Regex to parse and breakup the above URL.

All replies (14)

Tuesday, October 25, 2011 8:36 AM ✅Answered | 3 votes

Using a Regex allows you to split the string, check that all required parts are there and check that the keys are numeric:

string servername = null, username = null, collectionkey = null, sectionkey = null;
Match m = Regex.Match(url, @"(?i:http://)(?<servername>[^\s/]*)/(?<username>[^\s/]*)/(?<collectionkey>\d*)(/(?<sectionkey>\d*))?(\.xml)?");
if (m.Success)
{
        servername = m.Groups["servername"].Value;
        username = m.Groups["username"].Value;
        collectionkey = m.Groups["collectionkey"].Value;
        sectionkey = m.Groups["sectionkey"].Value;
}

Make sure you understand the regex if you plan to use it. If you don't, you won't be able to modify it later if you need.


Tuesday, October 25, 2011 9:48 AM ✅Answered | 1 vote

below is a class that does parsing and validation of urls as required.
 

using System;
using System.Linq;
using System.Text.RegularExpressions;
...
public class Info
{
    public Info(string url)
    {
        Uri uri = new Uri(url);
        if(uri.Segments.Length < 3 || uri.Segments.Length > 4)
            throw new ArgumentException("Segments.Length");
        // server
        this.Server = uri.Host;
        // user
        this.User = uri.Segments[1].TrimEnd('/');
        if(this.User.Contains(' '))
            throw new ArgumentException("User");
                
        Regex regex = new Regex(@"^(\d+)?(\.xml){0,1}$", RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.Compiled);
                
        // collectiokey
        this.CollectionKey = uri.Segments[2].TrimEnd('/');
        if(regex.Match(this.CollectionKey).Success == false)
            throw new ArgumentException("CollectionKey");
        // sectionkey
        if(uri.Segments.Length == 4)
        {
            this.SectionKey = uri.Segments[3].TrimEnd('/');
            if(regex.Match(this.SectionKey).Success == false)
                throw new ArgumentException("SectionKey");
        }
    }
    public readonly string Server;
    public readonly string User;
    public readonly string CollectionKey;
    public readonly string SectionKey;
}
...
// test
var inf1 = new Info("http://servername/username/123.xml");
var inf2 = new Info("http://servername/username/123/45.xml");

Tuesday, October 25, 2011 8:00 AM

*> ... to parse and breakup URL using c#.net
 *

the simplest way is to use the Uri class.

var uri = new Uri("http://servername/username/collectionkey/sectionkey.extension");

Tuesday, October 25, 2011 8:12 AM

Thank you for your quick response. 

Could you please tell me how to split this and validate using Uri.

the last field "Sectionkey.extension" is optional.

Sometimes the url will be like "http://servername/username/collectionkey.extension."

even the extension is also optional.

i need  to take each fields to a string and need to do pass this values to a method.  

Ullas_Joseph


Tuesday, October 25, 2011 8:13 AM

Hello,

You don't need to use Regex here, it's pretty simple.

string s = "http://servername/username/collectionkey/sectionkey.extension";

int index = s.IndexOf("//");

index = index > 0 ? index + 2 : 0;

s = s.Remove(0, index);

string[] values = s.Split('/');

If the format is guaranteed to be a valid format you can remove the ternary check I made there.
Eyal (http://shilony.net), Regards.


Tuesday, October 25, 2011 8:17 AM

Thank you Eyal for your suggestions.

Actually i used the Split, but i was suggested to use the Regex.

 In this way, we don’t have to split the single search logic into multiple lines of code, and therefore we can avoid checking validity in each line.

Ullas_Joseph


Tuesday, October 25, 2011 8:21 AM

*> i need to take each fields to a string ...

  
they are into the Uri.Segments:
*

var uri = new Uri("http://servername/username/collectionkey/sectionkey.extension");
foreach(var str in uri.Segments.Select(s => s.TrimEnd('/')))
{
}

 


Tuesday, October 25, 2011 8:29 AM

Malobukv, 

I'm using visual studio 2005, so var is not available so i used System.Uri.

but uri.Segment.Select is not available.

Ullas_Joseph


Tuesday, October 25, 2011 8:33 AM | 1 vote

> I'm using visual studio 2005, so var is not available so i used System.Uri. but uri.Segment.Select is not available.

Uri uri = new Uri("http://servername/username/collectionkey/sectionkey.extension");
foreach(string segment in uri.Segments)
{
    string str = segment.TrimEnd('/');
    System.Diagnostics.Trace.WriteLine(str);
}

Tuesday, October 25, 2011 8:42 AM

Thank you very much Malobukv, let me try this and get back to you.

 

Ullas_Joseph


Tuesday, October 25, 2011 8:43 AM

Thank you Louis, this is what i am looking for. let me try this and will get back to you.Ullas_Joseph


Tuesday, October 25, 2011 8:58 AM | 1 vote

 In this way, we don’t have to split the single search logic into multiple lines of code, and therefore we can avoid checking validity in each line.

You do understand the implications ? You shouldn't use a sledgehammer to kill a fly.

Regex is awesome for complex string manipulation not for simple things such as this.

There you go.

string s = "http://servername/username/collectionkey/sectionkey.extension";

            MatchCollection matches = Regex.Matches(s, @"(?<=/)[\w\.]+");

            foreach (Match match in matches)
            {
                Console.WriteLine(match.Value);
            }

You can also use Regex.Split.
Eyal (http://shilony.net), Regards.


Tuesday, October 25, 2011 9:03 AM

Thank you Eyal for your valuable suggestion.

 

Ullas_Joseph


Tuesday, October 25, 2011 11:19 AM

Thank you MalobukvUllas_Joseph