Recently, I stumbled upon a nifty feature in the Ruby CSV library which allows for a very neat preprocessing experience. Ruby’s CSV module’s usual entry point is
parse. This takes an argument called
converters which accepts an array of anonymous functions that can be used to transform the data. For example, consider the following CSV data:
There are a couple of problems with our data. First up, the
age column has numbers, but when we parse the file using
CSV.parse, the first column data will be strings. Second, some of the fields have spaces embedded in them so we get strings with those spaces once we parse. This is how the first row looks like, for example, for the following parsing code:
To clean this data up, we can use the converter feature:
string_converter function will strip each data point and convert that to an integer.
By default, the CSV library ships with a bunch of converters which you can use via the
CSV::Converters constant. For example, if we were to convert the first column value to a Float value, we can use the built-in
:float converter like so:
This API is so much fun to use!