RoRLearn.com Home News Articles Tutorials Resources Books Forums
 

Home>Articles>

Object-Oriented Regular Expressions

Extracted from the book "Programming Ruby - The Pragmatic 
Programmer's Guide" by Dave Thomas and Andrew Hunt
Copyright ?2001 by Addison Wesley Longman, Inc. This material may be 
distributed only subject to the terms and conditions set forth in the 
Open Publication License, v1.0 or later (the latest version is 
presently available at http://www.opencontent.org/openpub/)).

Distribution of substantively modified versions of this document is 
prohibited without the explicit permission of the copyright holder.

Distribution of the work or derivative of the work in any standard 
(paper) book form is prohibited unless prior permission is obtained 
from the copyright holder.

Start using Ruby's regular expressions, and you'll soon come across a whole menagerie of weird-looking variables, $&, $1, $' and others. We have to admit that while all these weird variables are very convenient to use, they aren't very object oriented, and they're certainly cryptic. And don't we say that everything in Ruby is an object? What's gone wrong here?

Nothing, really. It's just that when Matz designed Ruby, he produced a fully object-oriented regular expression handling system. He then made it look familiar to Perl programmers by wrapping all these $-variables on top of it all. The objects and classes are still there, underneath the surface. So let's spend a while digging them out.

We're all familar with one class: regular expression literals create instances of class Regexp.

  re = /cat/
  re.type        # -> Regexp

The method Regexp#match matches a regular expression object against a string. If unsuccessful, the method returns nil. On success, it returns an instance of class MatchData. And that MatchData object gives you access to all available information about the match. All that good stuff that you can get from the $-variables is bundled in a handy little object.

  re = /(\d+):(\d+)/     # match a time hh:mm
  md = re.match("Time: 12:34am")
  md.type                # -> MatchData
  md[0]         # == $&  # -> "12:34"
  md[1]         # == $1  # -> "12"
  md[2]         # == $2  # -> "34"
  md.pre_match  # == $`  # -> "Time: "
  md.post_match # == $'  # -> "am"

Because the match data is stored in it's own object, you can keep the results of two or more pattern matches available at the same time, something you can't do using the $-variables. In the next example, we're matching the same Regexp object against two strings. Each match returns a unique MatchData object, which we verify by examining the two subpattern fields.

  re = /(\d+):(\d+)/     # match a time hh:mm
  md1 = re.match("Time: 12:34am")
  md2 = re.match("Time: 10:30pm")
  md1[1,2]       # -> ["12", "34"]
  md2[1,2]       # -> ["10", "30"]

So how do the $-variables fit in? Well, after every pattern match, Ruby stores a reference to the result (either nil or a MatchData object) in a thread-local variable (accessible using $~). All the other regular expression variables are then derived from this object. Although we can't really think of a use for the following code, it demonstrates that all the other regexp-related $-variables are indeed slaved off the value in $~.

  re = /(\d+):(\d+)/
  md1 = re.match("Time: 12:34am")
  md2 = re.match("Time: 10:30pm")
  [ $1, $2 ]   # last successful match      # -> ["10", "30"]
  $~ = md1
  [ $1, $2 ]   # previous successful match  # -> ["12", "34"]

Having said all this, we have to 'fess up. Andy and Dave normally use the $-variables rather than worrying about MatchData objects. For everyday use, they just end up being more convenient. Sometimes we just can't help being pragmatic ;-)






 
Site Copyright © 2006-2007 RoRLearn.com All rights reserved. Privacy Policy | About Us | Contact Us | Site Map