Composing Promises like there's no tomorrow

One of the features of hubot-pr-fu, a Slack/HipChat bot that provides commands to aggregate information around GitHub Pull Requests, is: list out all PRs that are in mergeable status. The GitHub API for listing out all Pull Requests doesn’t include the merge-ability status for each PR object that gets returned. Instead, we have to iterate over the list of objects and make calls to fetch more information about each request. Written in CoffeeScript, it looks like:

@allPrs =
  repo.pulls.fetch({status: "open"}).then (prs) =>
    Q.all _.map(prs, (pr) => repo.pulls(pr.number).fetch())

permalink

I’ve used the equivalent of Promise.all back then because I didn’t know better. I’ve since found a neat way to think about the problem, and a more fine-grained solution for “making one request after the other”—also known as cooperative scheduling in computer-speak.

Documentation for the pulls and pull API can be found on GitHub

Preliminaries
Problem
Cooperative Scheduling
Aside
- Error handling
Conclusion
Bonus
Resources

Preliminaries

In this article I’m using the a couple of modern syntax/features of JavaScript. Skip this part if you are aware of arrow functions and spread operator.

Arrow functions

JavaScript engines now have a new syntax for defining functions:

const myFunction = () => { console.log('I\'m a function') };

If the braces are omitted, it means the inner value gets returned. i.e., the following functions are identical:

const myFunction1 = () => { return 12 };
const myFunction2 = () => 12;

Spread operator

A new feature in JavaScript is the ... syntax. Ruby programmers would know this as the “splat” operator. Any iterable value can be “spread” into another construct using this. So you could do things like concatenating arrays, shallow-clone JavaScript object and more:

const [ head, ...tail ] = [ 1, 2, 3 ];
console.log(head); // 1
console.log(tail); // [2, 3]

const head = [ 1 ];
const tail = [ 2, 3 ];
const combined = [ ...head, ...tail ];
console.log(combined); // [ 1, 2, 3]

const headers = {
	'Content-Type': 'application/json',
	Host: 'www.github.com',
	Accept: 'text/html'
};
const cloned = { ...headers, Accept: 'application/json' };
console.log(cloned); // { Accept: 'application/json', 'Content-Type': 'application/json', Host: 'www.github.com' }

Array.reduce

If you’re unfamiliar with Array.reduce, Sarah Drasner has a very accessible introduction to this very powerful function: https://css-tricks.com/understanding-the-almighty-reducer/. She does a super job explaining this possibly-confusing function.

fetch API

I won’t go into too much details, but fetch is a replacement for XMLHttpRequest in browsers (there’s a package for NodeJS). It replaced the old callback-heavy API with a new Promise-based API. The high-level API is more closer to the ajax function in jQuery. More on the Mozilla Developer Network website.

Problem

Take a list of pull request URLs (if you’re unaware of GitHub, think of some other list of URLs)
Make calls to each of these, and aggregate the “mergeable”-ity status of each. Something like this, probably:

[
	{ pr: 1100, mergeable: false },
	{ pr: 1101, mergeable: true },
	...
]

There are a couple of ways to do this. Promise.all is one:

function aggregate(responses) {
	return responses.map(prObject => {
		const status = { pr: prObject.id, mergeable: prObject.mergeable };

		return status;
	});
}

Promise.all([fetch(url - 1), fetch(url - 2) /*... so on*/])
	.then(responses => aggregate(responses))
	.then(mergeability => console.log(mergeability));

Promise.all will resolve once all the fetch calls are completed. While this works fine, it might choke the network or the target server if the list of URLs is very long! Another problem with Promise.all is that if any of the requests fails, the entire promise chain gets rejected. We only need to mark the failed request as such, and continue on to the next one.

What if we can, given a list of URLs, make calls one-by-one? Can that be done without an external library?

Cooperative Scheduling

Promises are composable. This means that we can create promises and pass them around like they were primitive values—the term used to describe this is called “first-class”. Another key idea is that we operate on an array(s). Modern JavaScript has a good set of functions that operate on arrays. Combining both helps us achieve what we want.

We’ll call fetch with each of the URLs from the list one after the other, and try the next URL only if the current one has completed (could be an error, or a success). Just calling fetch(url) (returns a promise) will start the request (almost) immediately, so we need to generate this fetch promise only when we are ready. Once the call finishes we’ll loop over to the next URL in the list. We have to figure out a way to chain the requests to each other. And chaining is a super-power of Promises. Any invocation of Promise API methods always return a Promise, so they are infinitely chain-able.

The final action when unrolled should look something like this:

fetch(url1)
	.then(() => fetch(url2))
	.then(() => fetch(url3));

So given a list of URLs, we can use Array.reduce to chain fetch promises over multiple iterations:

function fetchPr(url) {
	return fetch(url);
}

[url1, url2, url3].reduce((promise, url) => {
	return promise.then(() => fetch(url));
}, Promise.resolve());

If you’re unfamiliar with reduce, Sarah Drasner has a very accessible introduction to this very powerful function: https://css-tricks.com/understanding-the-almighty-reducer/. I like to think of Promise.resolve() as the undefined-equivalent of Promises :)

In our code, if we unroll the loop, the chain would look like:

Promise.resolve()
	.then(() => fetch(url1))
	.then(() => fetch(url2))
	.then(() => fetch(url3));

Now to handle the data that’s coming in.

Aggregating state – using global object

I’m going to use a slightly different aggregate example compared to the one mentioned above. The structure is going to be:

{
	[<pr id 1>]: <mergeable state>
}

We need an object to aggregate each Pull Request’s mergeable state. This object should be passed down to each loop iteration, and should store the value of the mergeable key from the network response. One way to do this is to each reduce iteration take in and return a aggregate state object that looks like this:

{
	promise: <the promise from the previous iteration,
	mergeability: <object that stores per-PR-id mergeability state>,
	id: <PR ID of the request being made>
}

And our program could look like this:

const openPrs = [1455, 1450, 1448];

function fetchPr(prId) {
	const url = `https://api.github.com/repos/sinatra/sinatra/pulls/${prId}`;

	return fetch(url);
}

const [firstId, ...restIds] = openPrs;

const combinedPromise = restIds.reduce((aggregate, nextPrId) => {
	// Assuming response is an already parsed JSON object.
	const promise = aggregate.promise.then(response => {
		aggregate.mergeability[aggregate.id] = response.mergeable;
		return fetchPr(nextPrId);
	});

	return {
		promise,
		id,
		mergeability: aggregate.mergeability
	};
});

combinedPromise.then(_ => console.log(combinedPromise.mergeability));

For every loop iteration, we return a new object that represents an aggregate state of the iteration: a new promise that, once resolved, updates the merge-ability status of the completed request, the new PR ID for use in the next iteration, and the copied over mergeability status tracker object.

If you’re finding it hard to understand this, you’re not alone. “What is the value of aggregate.mergeability mergeability: aggregate.mergeability above?” “Why are we aliasing it outside and modifying it on the inside? Will there be race conditions?”, “why are we resolving the combinedPromise promise in the end, but discarding the value?” might be some of the confusing questions that readers of this code (even you, in 3 months’ time) might have.

In general, when we discard the value of the promise, it’s not completely clear as to the construction/shape of the promise object while reading code.

Aggregating State – without globals

We need a way to compose the return values from the promises between the reduce iteration loops better. Instead of thinking of the iteration first, let’s think of the loop function. It needs to know:

mergeability status object, in order to set the status in the current loop
PR ID to make the fetch call in that loop once the previous one resolves

Let’s imagine a pure* function that takes each of these values, and returns a promise that resolves with the current (post-resolve) mergeability state object.

function fetchPrAndUpdateMergeability(id, mergeability) {
	return new Promise(resolve => {
		return fetchPr(id).then(response => {
			mergeability[id] = response.mergeable;

			return resolve(mergeability);
		});
	});
}

^*pure in the sense that it takes the values it needs, without relying on the surrounding context. Not a pure function in the Haskell sense

By using a promise constructor, we can control how the final “fulfilled” object of the promise will look like, and we also get control of when the promise will be considered fulfilled. Instead of returning the response object from fetch, we are returning the modified aggregate state after the fetch resolves. That is, in fetchPrAndUpdateMergeability(id, {}).then(variable => ()), the variable will be set to the aggregate object, which can then be passed down to the subsequent fetchPrAndUpdateMergeability call:

fetchPrAndUpdateMergeability(pr1).then(aggregate => fetchPrAndUpdateMergeability(pr2, aggregate))

Which is now similar to what our reduce step did in the initial examples! So, plugging this back in our reduce iterator, we end up with:

const openPrs = [1455, 1450, 1448];

function fetchPr(prId) {
	const url = `https://api.github.com/repos/sinatra/sinatra/pulls/${prId}`;

	return fetch(url);
}

function fetchPrAndUpdateMergeability(id, mergeability) {
	return new Promise(resolve => {
		return fetchPr(id).then(response => {
			mergeability[id] = response.mergeable;

			return resolve(mergeability);
		});
	});
}
const [firstId, ...restIds] = openPrs;
const finalPromise = restIds.reduce((promise, prId) => {
	return promise.then(mergeability =>
		fetchPrAndUpdateMergeability(prId, mergeability)
	);
});

finalPromise.then(mergeability => console.log('mergeabilty: ', mergeability));

This is way cleaner, and we get the advantage of not maintaining a global object, which makes it easy to add more logic or decompose further, like I’ll explain below.

JSON parse handling

Till now we’ve assumed that response object is the parsed JSON of the pull request data, but in reality it’s a Response object. We have to parse the body into a JSON object. The Response object has a built-in method to do this: json(), which returns a promise that’s “resolved” once the parsing finishes. Let’s add the parsing step inside the success handler of fetchPr call.

const openPrs = [1455, 1450, 1448];

function fetchPr(id) {
	const url = `https://api.github.com/repos/sinatra/sinatra/pulls/${prId}`;

	return fetch(url);
}

function fetchPrAndUpdateMergeability(id, mergeability) {
	return new Promise(resolve => {
		return fetchPr(id).then(response => {
			return response.json().then(pullData => {
				mergeability[id] = pullData.mergeable;

				return resolve(mergeability);
			});
		});
	});
}

const [firstId, ...restIds] = openPrs;
const finalPromise = restIds.reduce((promise, prId) => {
	return promise.then(mergeability =>
		fetchPrAndUpdateMergeability(prId, mergeability)
	);
});

finalPromise.then(mergeability => console.log('mergeabilty: ', mergeability));

Apart from the obvious increase in indentation, one more complication is that we haven’t yet added error handling. Doing that will complicate this function even further, where it starts to resemble callback hell. Isn’t this what Promises were supposed to help avoid? This is a natural progression, and is something I’ve seen happen a lot. Without careful thought around the correct abstraction at every step of change, it’s not easy to make such code…less indented.

Instead of adding the JSON parsing step in the fetchPrAndUpdateMergeability function, let’s move it to the fetchPr function instead:

const openPrs = [1455, 1450, 1448];

function fetchPr(prId) {
	const url = `https://api.github.com/repos/sinatra/sinatra/pulls/${prId}`;

	return fetch(url).json();
}

function fetchPrAndUpdateMergeability(id, mergeability) {
	return new Promise(resolve => {
		return fetchPr(id).then(pullData => {
			mergeability[id] = pullData.mergeable;

			return resolve(mergeability);
		});
	});
}

const [firstId, ...restIds] = openPrs;
const finalPromise = restIds.reduce((promise, prId) => {
	return promise.then(mergeability =>
		fetchPrAndUpdateMergeability(prId, mergeability)
	);
}, fetchPrAndUpdateMergeability(prId, {}));

finalPromise.then(mergeability => console.log('mergeabilty: ', mergeability));

We saved the indentation, but yet haven’t handled error cases. Each promise object can “redirect” its output to either the success or error callback of the next promise. In that sense, it’s almost like *nix pipelines if you’re aware of those (Standard Out, Standard Error streams). Think of two data pipelines between which the data flows, and we have control over how to pass it around. In *nix, if we want only the error output of a particular command, we can redirect that to a different file without mixing it up with the normal output. Similar technique can be used in our example:

function fetchPr(prId) {
	const url = `https://api.github.com/repos/sinatra/sinatra/pulls/${prId}`;

	return fetch(url).then(
		rawresponse => Promise.resolve(rawresponse.json()),
		error => Promise.reject(error.json())
	);
}

If the fetch call succeeds or fails—with a network error for instance—we try parsing the body, and explicitly passing down the parsed object either towards the success or error pipelines. If we didn’t add the calls to Promise.resolve/reject and simply returned the values, both will get diverted to the next success callback. Any further chaining would receive the actual pull request data in the success handler.

fetchPr(12)
	.then(response => response.mergeable /* => true/false */)
	.then(undefined, error => error /* => network error / GitHub error object */);

Aside

There is one more principle around promises in this example. If you pass undefined or null as one of the callbacks, then the corresponding value flows down to the next joint in the pipeline. In the first then joint, even though it’s not explicit, we are passing undefined as the callback for error case. If fetchPr threw an error, the error message would be passed down to the second then. In the second then joint, we are passing undefined explicitly to the success case so success passed from previous then gets passed to next. This technique can be used to create composable functions that can be reused in multiple places. I found this very useful at work while refactoring a complicated API-calling interface.

Error handling

Now that we have our parsing step in place, we can move on to add error handling. The fetch API’s success handler gets invoked for any non-5xx errors. The response object has a property ok attached to it to signify a 2xx status. For our use case, all non-200 status messages can be treated as errors, and that PR’s mergeabilty status should be marked as ‘unknown’. To do this, we ‘ll modify our success handler in fetchPR like so:

function fetchPr(prId) {
	const url = `https://api.github.com/repos/sinatra/sinatra/pulls/${prId}`;

	return fetch(url).then(
		rawresponse => {
			if (rawresponse.status === 200) {
				return rawresponse.json();
			}

			return Promise.reject({ mergeable: 'unknown' });
		},
		error => Promise.reject(error.json())
	);
}

Again, indentation alarm rings! We can do better. Instead of adding the if condition inline, we could move that out into a function on its own:

function parseIfSuccess(rawresponse) {
	if (rawresponse.status === 200) {
		return rawresponse.json();
	}

	return Promise.reject({ mergeable: 'unknown' });
}

function fetchPr(prId) {
	const url = `https://api.github.com/repos/sinatra/sinatra/pulls/${prId}`;

	return fetch(url).then(parseIfSuccess, error => Promise.reject(error.json()));
}

This decomposition also helps with testing. Adding tests to parseIfSuccess doesn’t need any HTTP mocks! Now on to handling both the error at network level, and any JSON-parsing errors:

function readMergeableStatusIfSuccess(rawresponse) {
	if (rawresponse.status === 200) {
		// note: always return promises. Never just call.
		return rawresponse.json();
	}

  // We are losing information about the actual error that happened, but for
  // this example I'm skipping that part.
	return Promise.reject({ mergeable: 'unknown' });
}

function fetchPr(prId) {
	const url = `https://api.github.com/repos/sinatra/sinatra/pulls/${prId}`;

  // we can reuse the function above for the error block,
  // because the `if` condition is entirely skipped, and we don't
  // care about the actual error in our case!
	return fetch(url).then(
		readMergeableStatusIfSuccess,
		readMergeableStatusIfSuccess
	);
}

function fetchPrAndUpdateMergeability(id, mergeability) {
	return new Promise(resolve => {
		return fetchPr(id).then(
			parsedResponse => {
				mergeability[id] = parsedResponse.mergeable;

				return resolve(mergeability);
			},
			normalizedError => {
				mergeability[id] = normalizedError.mergeable;

        // technically, we should be maintaining a list of all the
        // errors and the reasons why they happened, and/or log the
        // error in a structured log format. They help in debugging why
        // something failed. By resolving the error here, we're losing
        // information about why a PR's status is set to 'unknown'.
				return resolve(mergeability);
			}
		);
	});
}

const [firstId, ...restIds] = openPrs;
const finalPromise = restIds.reduce((promise, prId) => {
	return promise.then(mergeability =>
		fetchPrAndUpdateMergeability(prId, mergeability)
	);
}, fetchPrAndUpdateMergeability(firstId, {}));

finalPromise.then(mergeability => console.log('mergeabilty: ', mergeability));

We can move the success and error callbacks inside fetchPrAndUpdateMergeability into separate functions. I’m leaving that as an exercise to the reader. The full working example can be found here: kgrz-promises-composition-example

Conclusion

By using the basic building blocks provided in JavaScript, we’ve been able to build a solution in a very small sequence of steps. We could extend our program’s functionality a little further and make it truly batched, where we make n queries concurrently, wait for all the resolve, and then move onto the next batch—by converting the PR ID list we have to a 2-D array, and using Promise.all inside each loop, for example.

Some of the techniques explained here were used in refactoring a gnarly piece of API calling code at work. The result was a set of simple functions like readMergeableStatusIfSuccess that could be reused if we were to change a small part of the entire pipeline, instead of duplicating majority of the logic. Try out these techniques in your next project and see if they work for you.

Bonus

async-await is the new 🔥in JavaScript. These are new syntax elements which were added to help simplify the usage and behaviour of promises. Instead of the pipelines when using Promises, async-await code looks more “serial”, and so it’s a little easier to follow. I’ve put up an example based on the post here: kgrz/promises-composition-example/async-await

I personally use Promises a lot because of the environment we use at work, and I like them. If you don’t have that restriction, I recommend using async-await, but learn about Promises just enough that you won’t get stuck searching on internet.

Resources

There have been some very informative posts on how Promises work (or don’t). Here’s a list that I think will help you if you’ve reached this far, in no specific order:

github.com/getify/You-Dont-Know-JS

Where Kyle Simpson does a super great job going in depth about callbacks and promises. The entire series is a must read for any JavaScript developer.

dist-prog-book.com/chapter/2/futures

A more holistic view of Futures and Promises that goes through some details on internal implementation, and execution semantics.

2ality.com/promise-callback-data-flow

A simple and informative post on various ways to pass data from one promise to another.

jcoglan.com/callbacks-are-imperative-promises-are-functional

A thorough post on how Promises help you write more abstractions based off of other existing abstractions to build more complex programs.

mathiasbynens.be/notes/async-stack-traces

Article on why capturing stack traces when using async-await is cheaper than with promises. Also mentions some details about the differences between async-await and promises.

staltz.com/promises-are-not-neutral-enough

A post that goes through some design issues in the Promise API.

brianmckenna.org/blog/category_theory_promisesaplus

A post that talks about some design issues in the Promise API.

Lastly, I gave a lightning talk at DotJS 2018 on the same topic.

← Google IO 2018 announcements I liked Understanding inter-container networking, or how to avoid docker-compose →