Before I grokked the Unmarshaler
interface, it was hard to know how to parse a complex JSON string into a type in one-shot, with or without preprocessing. There are many good blog posts on techniques to parse JSON in Go, but I had to learn this by experimentation to finally wrap my head around it.
I’ll use an example from GitHub’s /commits
REST API, using PR: ruby/ruby#3365. I’ve saved the response in the repo where I’ve added full implementation of the example used in this post. The commits response from GitHub REST API is very verbose, depending on the PR size, and having depth greater than 1. In the hypothetical application that I’m writing, I need a list of “objects” that have the following information:
That is, I want parse this response into a []MetaData
slice. I do not want to traverse the structs in the format of the responses in my main “business logic”, as that makes it hard to follow the important bits. I don’t want to use interface{}
as a placeholder. A better trade-off, in my opinion and use case, is to do as much as possible during the parse phase to massage the data into the structure you want1. I’m positive that this is a common use case. I ended up learning one way to do this cleanly almost by accident. First, the components involved:
Use anonymous structs
Anonymous structs can be used to avoid defining a concrete type and skip giving it a name for one-off use-cases. It’s heavily used in parsing and marshalling code paths, and testing. In our case, this technique can be used to define a “dirty” struct inside the UnmarshalJSON
function on the fly, and use that for parsing the JSON.
Implementing Unmarshaler interface
Any type that has a UnmarshalJSON
function on it implements the Unmarshaler
interface. This type then can be used as the target for parsing a JSON sub tree or the entire JSON itself!
Implementation
First step is to mock out the main function:
The JSON response of /commits
endpoint is a list of commit
objects, and I’m using a list of MetaData
types to match that interface. For each commit item from the JSON array, the raw bytes get passed as the argument to the UnmarshalJSON
function on MetaData
.
Next step is to implement the UnmarshalJSON
function using an anonymous struct to parse out the raw commit object JSON string into it:
Final step is to process the commit
struct, and set the appropriate fields on MetaData
struct:
That’s it! An additional advantage to this type of narrow types is it’s easier to test.
Bonus: Filtering the slice further
For bonus points, I want to skip certain []MetaData
elements based on a condition. A way to do this, keeping the same principles as above in mind, is to define a type that covers []MetaData
, which implements the Unmarshaler
interface:
Like before, I’m using a temporary type of the kind that matches our main type, and using that to parse into. Then I’m clean out slice based on a condition—I want to skip all the commits that start with WIP
. Note that the metadatas
variable defined inside the UnmarshalJSON
function is defined as []MetaData
and not as MetaDatas
, since doing that would result in a parse-loop. By design, var metadatas Metadatas
and var metadatas []MetaData
are not the same type.
Finally, the filtered slice gets assigned to the underlying object that the JSON is getting parsed into.
A note about performance
In these examples, the parse flow will create the entire []MetaData
slice, even though we filter out many of the elements. To my knowledge, this seems like a necessary hit to take. I’m not aware if there’s a way to avoid allocations by pre-pre-processing the incoming bytes to avoid the allocation in the first place. My thought process here is that if we didn’t filter, or cleanup the JSON data, it will anyway allocate all the objects, so this may not be a huge difference in allocations per se, but that’s just my opinion at this point.
-
This may not apply everywhere. There are valid cases where parsing should be as fast and light as possible. ↩