Reason #30 • January 30th, 2026

String#scan

A useful method for extracting substrings from a string based on a pattern is Ruby's String#scan. This method takes a regular expression and returns all occurrences that match the pattern.

Ruby
log = "path=/reasons duration=21 host=lovingruby.com protocol=https"

# If we scan without capture groups, we get an array of matches
log.scan(/\w+=[^ ]+/)
# => ["path=/reasons", "duration=21", "host=lovingruby.com", ...]

# With capture groups, we get an array of arrays
log.scan(/(\w+)=([^ ]+)/)
# => [["path", "/reasons"], ["duration", "21"], ...]

# When the groups come in pairs, we can convert to a hash using #to_h
parsed_logfmt = log.scan(/(\w+)=([^ ]+)/).to_h
# => {
#  "path" => "/reasons",
#  "duration" => "21",
#  "host" => "lovingruby.com",
#  "protocol" => "https",
# }
      
JavaScript
const log = "path=/reasons duration=21 host=lovingruby.com protocol=https";

// matchAll is similar to Ruby's String#scan but returns an iterator
const matches = [...log.matchAll(/\w+=[^ ]+/g)];
// => [["path=/reasons"], ["duration=21"], ...]

// With capture groups, results contain both the full match & groups
const matchesWithGroups = [...log.matchAll(/(\w+)=([^ ]+)/g)];
// => [["path=/reasons", "path", "/reasons"], ...]

// We can convert the pairs to an object using Object.fromEntries
const result = Object.fromEntries(
  matchesWithGroups.map(([, key, value]) => [key, value])
);
// => {
//  "path": "/reasons",
//  "duration": "21",
//  "host": "lovingruby.com",
//  "protocol": "https",
// }
      

What I find particularly interesting about the design of String#scan is how it adapts its return value based on the apparent intent of the user. If no capture groups are used, it returns an array of matches. But if capture groups are present, it returns an array of arrays containing only the captured groups.

In the case of our simple logfmt parser, this sets us up perfectly to convert the resulting array of pairs directly into a Hash using #to_h.

History

String#scan has been part of Ruby all the way since version 1.0 released in 1996.

The idea of adapting return values based on the presence of capture groups was likely inspired by Perl, which used the same approach when matching regexes suffixed with the /g modifier.

Python's re.findall method, added in Python 1.5 in 1998, also adapts its return values in the same way.