Currently Browsing: Home » Regular Expressions

Regular Expressions

How many times have you had to dig through a long, confusing string for information? This has been a common problem for years. What’s the solution? Regular expressions.

What are regular expressions? Well..

Long And Complex Regular Expression

Now that you’ve seen that… guess why so many people do NOT program in Perl? :(

To put it very simply, regular expressions are strings that search for patterns. Patterns can be a few letters or enormous strings with many wild cards. In this tutorial, I will outline some basics of regular expressions (also known as regexes).

Before I get started, I must tell you that what you saw before is an extremely complicated regular expressions that even I do not understand fully. Please do not get discouraged because of it. Perl is an excellent language that I recently used to create my Twitter application. I guarantee this article will be easy to follow!

All regexes must contain two similar start and end characters. Extra characters at the end are referred to as modifiers. Those that are at the beginning are called operators. In the middle lies the real pattern.

This real pattern contains everything that should be matched. For example, if you wanted to match the pattern “ab” in Perl, use:

$str =~ m/ab/

Notice the “m” operator. It is one of the most common operators in Perl, and it is is used for simple matching. If no operator is specified, the match operator is used by default.

In the above case, / is our start and end character. We could have very well used:

$str =~ m%ab%

Be careful! Regular Expressions are case sensitive. If you were actually trying to match “AB”, the above pattern would not work. To make any regex case insensitive, use the “i” modifier:

$str =~ m/ab/i

Well, now that you’ve seen two simple examples, you are probably wondering how to use them. In Perl, it’s as easy as placing them in an if statement:

if( $my_str =~ m/ab/i ) {
	# The code to execute assuming "ab" is in $my_str
}

Matching returns true if the regular expression finds a match. Otherwise, it comes up as false.

How about something a bit more advanced? Let’s say you wanted to match “cat”:

$string =~ m/cat/

But wait! What if you don’t mind “cat”, “hat”, or “bat”? You can use () and | to represent “or”:

$string =~ m/(c|h|b)at/

Between each | lies a possible pattern to match. Now, let’s say you wanted to match “him” or “Robert”:

$string =~ m/(him|Robert)/

It’s very simple! For more practice, repeat this process with “child” and “children”:

$info =~ m/(child|children)/

Now, take a look at that pattern. Doesn’t it seem a bit redundant? You are matching the word “child” in both “child” and “children”. Shouldn’t there be a way to make part of a regular expression optional? Well, there is! The “?” can be used in a regular expression to make the preceding character(s) optional:

$info =~ m/child(ren)?/

If you are just making one character optional, you can omit the parentheses:

$var =~ m/favou?rite/

The above pattern will match “favorite” or “favourite”. To finish up this lesson on regular expressions, let’s go through how to get the actual match from a string.

Patterns that are enclosed by parentheses may be accessed using the variables $1 … $9 with Perl. For example:

if( $my_var =~ m/(cat)/ ) {
	print $1; # This will print "cat"
}

But why would you want to do this? You already know the match is going to be “cat”. Parentheses are mainly used when certain things are optional, or many different matches may occur:

if( $my_var =~ m/(honou?r)/ ) {
	print $1; # This will print either "honor" or "honour", depending on which was matched
}

Remember that there are different variables – $1 … $9 – to represent these matches. Each variable represents a pattern enclosed in parentheses. The order of the variables matches the order of parentheses in the pattern:

if( $some_str =~ m/((r|m|s)at)/ ) {
	print $1; # This will print "rat", "mat", or "sat"
	print $2; # This will print "r", "m", or "s"
}

To make everything a bit easier, Perl also returns the complete matched string through $&. So:

if( $some_info =~ m/hello/ ) {
	print $&; # This will print "hello"
}

is an alternative (and easier) way to do this:

if( $some_info =~ m/(hello)/ {
	print $1; # This will also print "hello"
}

That’s all there is for today! There will soon be more in this series. Thank you for reading!

This entry was posted on Thursday, April 16th, 2009 at 23:50:44. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply

Want to be notified when someone replies? Subscribe to this post's comment RSS feed.
Any field marked with a * is required.