Regular expression (regex) is a critical feature in Ruby. However, developers often say, "Regex is complicated and causes bugs.", so they avoid using regex. This seems to be due to their incorrect understanding of how regex matching works. The behavior of regex matching is actually simple, and a small regex engine (kantan-regex
) can be implemented in a program of less than 300 lines.
In this talk, I will describe the behavior of regex matching through the implementation of kantan-regex
. Furthermore, I will show how, with a few modifications, extensions such as look-around and optimizations such as memoization can be easily implemented. We believe that this talk will help to deepen our understanding of regex implementations and make using regex in everyday life more enjoyable.
A tutorial on implementing kantan-regex
is available on Web (Japanese only).