(Re)make Regexp in Ruby: Democratizing internals for the JIT
With the rise of JIT implementations like YJIT and ZJIT, rewriting Ruby primitives in Ruby has become a realistic way for optimization. This brings a dream of a pure-Ruby Regexp engine that works with JIT and integrates naturally with Ruby features (e.g., timeouts).
However, turning this dream into a production-ready reality faces a massive hurdle: compatibility. To replace Onigmo, we must replicate its exact behaviors, which is notoriously difficult. From my experience, the hardest barriers to a compatible reimplementation are the parsing and character class semantics.
I propose a pragmatic solution: exposing Onigmo's parser and character class logic as Ruby APIs. This democratizes the engine, offloading the complexity of compatibility to C while allowing us to focus on rewriting the matching logic in Ruby. I will demonstrate a PoC and discuss the future of a hackable, JIT-friendly Regexp engine.
-
Hiroya FujinamiPh.D. student at SOKENDAI (NII, National Institute of Informatics). Researcher for information security and formal language. Ruby committer. I am the author of Regexp optimization to prevent ReDoS in Ruby 3.2.0.