Color Your Functions!

Faithlife’s Android engineers took a departure from our typical boring approach to adopt Kotlin Coroutines early—before structured concurrency landed in 0.26.0. In other words, when we departed the boring tech train, we hopped on an experimental maglev train’s maiden voyage. We weren’t sure we’d make it, but fortunately we did, and our bet paid off handsomely. Some of our codebases have never seen an AsyncTask implementation, which is a breath of fresh air. kotlinx.coroutines is generally loved by Android engineers.

Function Coloring

The function color metaphor is most often applied to distinguish synchronous vs asynchronous functions and comes from Bob Nystrom’s post “What Color is Your Function?”. In his system, red functions are those that can only be called from other red functions. Blue functions can be called from anywhere. This neatly maps onto many programming languages’ concurrency models. Kotlin’s suspending functions are red. They can only be called from other suspending functions.

Kotlin chose a colored system for coroutines, as opposed to Go’s colorless system. Java’s Project Loom is aiming for a colorless approach also.

When a function is denoted with suspend, the compiler tacks on the capability to pause the execution of the function. Function coloring makes concurrency explicit. Since a suspending function knows how to suspend itself, it also knows how to pause suspending functions it calls.

I’ve written on the mechanics of kotlinx.coroutines in “Kotlin Suspend Bytecode”.

Structured Concurrency

Structured concurrency is a mechanism for limiting the lifetime of running coroutines to objects that have a well-defined lifetime, like UI elements.

In early versions of kotlinx.coroutines, launch (and the other coroutine builders) could be called from anywhere. There were essentially free functions. launch returned a Job, which provided a mechanism to join the coroutine on the current thread or to cancel it. Jobs had to be tracked manually, and forgetting to cancel them appropriately was a pretty easy mistake. The design of the coroutine library didn’t compel consideration of the asynchronous task’s lifetime.

When kotlinx.coroutines shipped structured concurrency, the improved API design compelled reconsidering every coroutine builder call in the codebase. The coroutine builder functions were now extensions on CoroutineScope, so they could only be called with an associated scope. This scope dictated the limits of the associated coroutine’s life. If a coroutine is suspended waiting on the network and its CoroutineScope is associated with the lifetime of UI element that was just dismissed, it’s cancelled.

Swift adopted structured concurrency recently for its concurrency system, so iOS engineers can learn from our mistakes!

Breaking Structured Concurrency

While the kotlinx.coroutine API is designed to encourage best practices, it’s pretty easy to break structured concurrency. One mistake is implementing CoroutineScope on an object without a well-defined lifetime.

This is a real function in one of our codebases, slightly modified to remove irrelevant details:

class ReadingPlanClient : CoroutineScopeBase() {

    @Synchronized
    fun getPlanTemplates(): List<ReadingPlanTemplate> {
        dtoCache.getUnexpiredCachedDto()?.let { resourceIdList ->
            return resourceIdList.readingPlanTemplateList
        }

        dtoCache.getExpiredCachedDto()?.let {
            launch(defaultErrorHandler) {
                fetchAndCache()
            }
            return it.readingPlanTemplateList
        }

        // ...
    }

ReadingPlanClient is a web service client that creates new coroutines in its own scope via the call to launch. That job has no chance of being preempted by cancellation since ReadingPlanClient does not have a well-defined lifetime and thereby is missing a good place to cancel running work. This usurps the design of coroutines.

Imagine a static field kept a reference to an instance of this class. When a fragment fetches reading plan templates and the user immediately navigates away from the screen, all of the fetching and caching work would still execute.

Consider another function that cooperates better with the structured concurrency model:

@ViewModelKey(ConversationViewModel::class)
@ContributesMultibinding(AppScope::class)
class ConversationViewModel @Inject constructor(
// ...
) : ViewModel() {
	// ... 

    suspend fun prepareConversation(conversationId: Long, threadId: Long? = null) {
        this.conversationId = conversationId
        this.threadId = threadId

        if (_conversation.value == null) {
            coroutineScope {
                val deferredConversation = async {
                    if (initialConversation != null) {
                        initialConversation
                    } else {
                        messagesRepository.getConversation(conversationId.toString())
                    }
                }
                val deferredDraftMessage = async {
                    messagesRepository.loadDraftMessage(conversationId)
                }
                val deferredRecipients = async {
                    messagesRepository.getConversationRecipientsWithPresence(conversationId)
                }

                _conversation.value = deferredConversation.await()
                deferredDraftMessage.await()?.let(_draftMessage::setValue)
                _recipients.value = deferredRecipients.await()
            }
        }
    }

The important difference here is that the function is suspending. A suspending function can only be called from other suspending functions (a red function in Nystrom’s taxonomy). Since it can only be called from other suspending functions, it pushes the responsibility for a coroutine scope to the caller. Suspending functions don’t have a scope of their own. The call to coroutineScope provides the caller’s CoroutineScope so the suspending function can launch its own child coroutines via async.

In circumstances where prepareConversation is called from a fragment’s view lifecycle scope, the child coroutines will automatically be cancelled when the parent scope is cancelled. This might save a handful of network requests and subsequent caching work.

That is what structured concurrency buys.

Keeping a few simple guidelines in mind goes a long way toward getting the most out of the concurrency model.

API Design & Scope Lifetimes

When building an API with coroutines, take care to avoid hiding suspending functions behind blue functions unless the class can reasonably own the coroutine scope required to house its coroutines. Since the red function calls have become implementation details of blue functions, this interface seems colorless. Ultimately that tactic spreads around details of coroutine execution and makes for a leaky abstraction.

Consider:

  • Who’s responsible for cancelling these coroutines?
  • When are these coroutines being cancelled?
  • What’s the default dispatcher for this scope?
    (hint: it should always be the main dispatcher)
  • Has someone overridden the default error handler?

Many Android classes provide a scope already—usually named lifecycleScope. So unless you have a really specific requirement, prefer what is provided there. When a new CoroutineScope appears in review, ask if there’s an existing scope that already meets the requirements.

Unless it is reasonable for your class to own the coroutine scope, expose suspending functions instead of blue functions that launch coroutines. This provides more flexibility to callers and better indicates the function’s nature.

Suspending Functions

An API that exposes suspending functions is easier to fold into new suspending functions than one that exposes synchronous functions—which may or may not launch coroutines with a dubious lifetime and no control mechanism. Pushing asynchronous work into suspending functions means that once you have bridged the red/blue gap by creating a coroutine your functions are semantically similar to any other function. Calling red functions from a red context is simple. They accept parameters and return values. They execute from top to bottom.

Bridging the red/blue gap might seem tricky, but since suspending functions can only be called from other suspending functions, calling red functions from a blue context requires consideration of the most appropriate coroutine scope in which to create a new coroutine. This is a benefit of the design.

Red function semantics aren’t a bug, they’re a feature. Avoid coroutine scopes that have unbounded lifetimes. When in doubt, paint functions responsible for asynchronous work red. Be mindful of coroutine scopes and of function colors.

If you’re interested in a more complete description of structured concurrency’s design benefits, check out “Notes on structured concurrency, or: Go statement considered harmful”.

Posted by Justin Brooks on June 10, 2022