This post will only be of interest to people writing scripts in Ruby. Seriously, zero utility if you’re not using Ruby. Though I would be curious how you accomplish the same thing in other languages like Rust and Python, because I’ve never gotten too deep with string manipulation in anything other than Ruby, Swift, and Objective-C. If you care to leave a comment with pointers, I’m all ears.

I do a lot of string manipulation in Ruby. One of the things that always gets me is that the Regexp::match method returns groups but only matches the first instance. To match all instances for enumeration, you have to use Regexp::scan. But scan doesn’t include groups (i.e. MatchData). So a while back I figured out the solution, and I thought I’d share it for any aspiring Ruby scripters.

The trick is to map scan results and replace each result with Regexp::last_match, which includes groups (and named groups) from the last regex that was run. Thus:

str.to_enum(:scan, regex).map { Regexp.last_match }

results in an array of MatchData. Then you can iterate through it and use indexes or group names to pull out particular groups of each match.

I’ve combined this with a few other methods to create a general string handling routine that I use regularly.

# frozen_string_literal: true

# String helpers
class ::String
  def match_scan(regex)
    to_enum(:scan, regex).map { Regexp.last_match }
  end

  def matches(regex)
    match_scan(regex).match_to_h.map(&:symbolize_keys)
  end
end

# Array helpers
class ::Array
  def match_to_h
    map { |m| m.named_captures.each_with_object({}) { |(k, v), h| h[k] = v&.strip } }
  end
end

# Hash helpers
class ::Hash
  def symbolize_keys
    each_with_object({}) { |(k, v), hsh| hsh[k.to_sym] = v.is_a?(Hash) ? v.symbolize_keys : v }
  end
end

With the above methods available, you can do something like:

str = <<~EOEMAILS
  Arthur P. Dent <arthur@example.com>
  Ford Prefect <perfect@example.com>
  Zaphod Beeblebrox <zaph@example.com>
  Mrs. Alice Beeblebrox <zaphsfav@example.com>
  Slartibartfast <fjordmaster@example.com>
  Marvin the Paranoid Android <planetbrain@example.com>
EOEMAILS

rx = /(?<prefix>\S+\. )?(?<first>.*?)(?:( (?<middle>\w+\.?))*(?: (?<last>[\w-]+)))? <(?<email>.*?)>/i

pp str.matches(rx)

Running that results in:

[{:prefix=>nil,
  :first=>"Arthur",
  :middle=>"P.",
  :last=>"Dent",
  :email=>"arthur@example.com"},
 {:prefix=>nil,
  :first=>"Ford",
  :middle=>nil,
  :last=>"Prefect",
  :email=>"perfect@example.com"},
 {:prefix=>nil,
  :first=>"Zaphod",
  :middle=>nil,
  :last=>"Beeblebrox",
  :email=>"zaph@example.com"},
 {:prefix=>"Mrs.",
  :first=>"Alice",
  :middle=>nil,
  :last=>"Beeblebrox",
  :email=>"zaphsfav@example.com"},
 {:prefix=>nil,
  :first=>"Slartibartfast",
  :middle=>nil,
  :last=>nil,
  :email=>"fjordmaster@example.com"},
 {:prefix=>nil,
  :first=>"Marvin",
  :middle=>"Paranoid",
  :last=>"Android",
  :email=>"planetbrain@example.com"}]

That’s a silly example, but hopefully you can see the utility of turning a regular expression into an array of hashes containing the individual values of each match extracted by scanning the string.