Extract links using regular expressions with VB .Net

Here is a bit of code to get the urls inside a text. Specifically deals with some kind of double url links, like http://domain.com/?link=http://www.otherdomain.com/blablalba. I make some research in order to get the regular expresion working o a recent project so this may help others having my same annoying issue. The key is to have a word in this case I needed to check if there is a link containing a specific domain name, this expression searches for all links containing that specific word between http:// and the first slash, doesnt matter if starts with or without www .

In order to make it work I prepared first a String with the regex and a pattern to be replaced with the subject of the matching, in this case I named :server thats because I have a String Collection containing the servers I wanted to find.

'prepare de regex
Dim the_magic_pattern As String = "https?://((\w+\.)?(:server)\b\w*)+([a-zA-Z0-9\~\!\@\#\$\%\^\&\*\(\)_\-\=\+\\\/\?\.\:\;\'\,]*)?"

Then, each time you need to search matches on a certain block of text, you create de Regex object based on the patern and replacing :server with something else.

'replace :server and make the new Regex instance
Dim the_regex As Regex = New Regex(the_magic_pattern.Replace(":server", "example"), RegexOptions.IgnoreCase)

Next, call the Regex.Matches function that returns a Collection of Match objects containing the links

'Regex.Matches
Dim the_block_of_text as String = "Your text with a lot of links"
Dim match_list As MatchCollection = the_regex.Matches(the_block_of_text)

And that’s it, you can take all the code an put it inside a For Each block and get all the links from all the domains you want.

Regular Expressions are awesome, if you know how to use them, it takes some time to understand the concept and the syntax but then can save us the day.

Here is the documentation for the .Net Regex Class

Here is a very VERY useful tool to play with regular expression that I found on one of my books. The source is included :)

See you soon.


Share