Ruby's String has Encoding, which allows for very flexible character encoding. What is the trade-off for that flexibility? I recently looked at the bottleneck in CSV.read and found that in one file with Encoding CP932, 30% of the processing time was spent on String#split. From the perspective of optimizing String#split, we will explain the relationship between String and Encoding in Ruby, how String knows its own Encoding, and which process is the bottleneck. Then we will discuss approaches toward faster encoding.
Schedule
Software engineer at ESM, Inc. 👾