Recently, I stumbled upon a nifty feature in the Ruby CSV library which allows for a very neat preprocessing experience. Ruby’s CSV module’s usual entry point is parse
. This takes an argument called converters
which accepts an array of anonymous functions that can be used to transform the data. For example, consider the following CSV data:
There are a couple of problems with our data. First up, the age
column has numbers, but when we parse the file using CSV.parse
, the first column data will be strings. Second, some of the fields have spaces embedded in them so we get strings with those spaces once we parse. This is how the first row looks like, for example, for the following parsing code:
To clean this data up, we can use the converter feature:
The string_converter
function will strip each data point and convert that to an integer.
By default, the CSV library ships with a bunch of converters which you can use via the CSV::Converters
constant. For example, if we were to convert the first column value to a Float value, we can use the built-in :float
converter like so:
This API is so much fun to use!