Ruby strings serve two distinct purposes: the representation of textual data and the representation of binary data. These two use cases generally require different operations, but today they're both accessible via String
. Combining the two creates a discoverability issue and can be error-prone. Many String
operations have no logical meaning for arbitrary binary data. Having to use strings with a special encoding to pass binary data around is a non-obvious solution and hampers Ruby's usability. Moreover, binary data can sometimes look like ASCII text, which may help build false trust in code with logic errors. Such errors are nuanced and difficult to debug.
This talk takes a high-level look at Ruby's strings and encodings, highlighting potentially problematic areas and suggesting ways to improve. While the emphasis is on the logical interface for text and binary data, we'll also look at the performance ramifications of the current design and how that might improve as well.