Schedule

Mari Imaizumi
Mari Imaizumi
  • @ima1zumi

Software engineer at ESM, Inc. 👾

String Meets Encoding

Ruby's String has Encoding, which allows for very flexible character encoding. What is the trade-off for that flexibility? I recently looked at the bottleneck in CSV.read and found that in one file with Encoding CP932, 30% of the processing time was spent on String#split. From the perspective of optimizing String#split, we will explain the relationship between String and Encoding in Ruby, how String knows its own Encoding, and which process is the bottleneck. Then we will discuss approaches toward faster encoding.