Being Like Water

I often forget just how young the software craft is. The landscape moves quickly, and many I’ve known have difficulty keeping up with the shifting ground beneath them. A couple of years ago I stumbled onto Huyen Tue Dao’s talk Be Like Water. She highlights the importance of being adaptable and reminds us that skills to build good software aren’t necessarily coupled to the systems we work with any given day. Shortly after joining Faithlife, a colleague shared Dan McKinley’s essay Choose Boring Technology, which tempers the virtue of adaptability in a helpful way. Some of these ideas stick with me today in some form or another and have substantially influenced the direction of the Faithlife Android app’s systems under the hood. In the year since I last wrote about the Faithlife app, several things have changed. Many have stayed the same.

Architecture

Overall, our high level architecture is pretty similar when compared to the previous post. The notable difference is that we have come to adopt more of a Model-View-ViewModel (MVVM) paradigm.

Current Architecture Overview

Instead of contract interfaces facilitating precise communication between the view and presentation/business logic in the presenter, the view takes inputs from the user and requests that the viewmodel perform some action. The view is notified of new information by observing properties on the viewmodel.

Android Jetpack

Jetpack continues to be a huge boon to our small team’s productivity. We make use of several components like navigation, viewmodel, and livedata and continue to look to Jetpack components first when we have need of a supplement to the framework itself. Google forms the vessel for Android development, so using their materials will make us more adaptable in the future with less hassle.

ViewModel & LiveData

ViewModel has made handling the lifecycle issues of the old days much less painful. There were solutions before (like headless fragments, shudder) for handling orientation changes well, but none provide the simplicity of ViewModel. Recently, some nice Kotlin coroutine features were added to the class that make structured concurrency easier than ever.

LiveData<T> make observing changes in viewmodel state a breeze by providing observability in a lifecycle-aware way. I wish the API had a more idiomatic affordance for setting values on the main thread with coroutines. Currently, you either have to call postValue or use withContext(Dispatchers.Main) and set value property directly. The latter is more idiomatic, the former is a little safer as it’s impossible to forget to set values on the main thread. We’ve made a habit of the latter since our viewmodel suspend functions are typically called from the main dispatcher context anyway. It’s a small concern for now.

Data Binding

The Jetpack data binding library is something we’re gravitating away from. It has some nice qualities that definitely encourage a separation of view and logic concerns. However, the build time impact is—while not huge by any means—considerable. We decided to try the system some time after making the decision to adopt the MVVM paradigm. Before then, we just modified view properties directly in the LiveData observers we registered in their parent Fragment. The code generator for data binding generates some impressively elaborate Java code, but doesn’t play all that well with Kotlin. One example of this is that to use a top-level extension function in a data binding expression, you had to be aware that the function was compiled into a class that had the name of the file in which the function was defined appended with ‘Kt’. Then you had to know the way extension functions work. The object being extended (the receiver) is passed as the first argument of the function, then all the other arguments are passed.

For example, to use an AliasKind extension defined in MentionExtensions.kt:

<layout>
    <data>
    	<import type="com.faithlife.mobile.MentionExtensionsKt" />
        <variable
            name="aliasKind"
            type="com.faithlife.mobile.AliasKind" />
    </data>

    <androidx.constraintlayout.widget.ConstraintLayout>
    	<TextView
    		android:text="@{MentionExtensionsKt.getDisplayName(aliasKind, context)}"
    		/>
    </androidx.constraintlayout.widget.ConstraintLayout>
</layout>

Slightly modified markup from an internal discussion on the subject

The data binding expression language also had magic variables that you could seemingly reference out of nowhere (like context). None of this is insurmountable, but we’ve agreed that manipulating views in observer listeners is better.

We’re likely going to try out the relatively new view binding system that seems to be a more type-safe way to get references to views from framework components. Adios findViewById.

We’re also keeping an eye on Jetpack Compose. Data binding expressions are a step in the wrong direction considering the new UI toolkit doesn’t use any markup at all.

Dagger 2

We’re also continuing to improve our use of Dagger as a facilitator of dependency injection. We never got into Dagger-Android as it seemed like it was somehow both more magical and constraining than it was worth. Google has shown they agree recently by announcing its deprecation. We’ve worked toward isolating components to subgraphs where possible and have used relatively recent support for setting custom fragment factories on any FragmentManager, in order to make dependency injection more possible in framework components where passing arguments to constructors was historically a bad idea. Our fragments and viewmodels can be passed dependencies via simple @Inject annotated constructors.

All of this makes isolating tests much easier, encourages reuse among different systems, and makes the separation of concerns much clearer among components in the system.

More Than Water

Engineering requires balance. Every choice is a trade-off. Two of our company values, shipping and elegance, characterize this tension well. We were early adopters of Kotlin coroutines and Jetpack Navigation. We didn’t find much value in Koin. We saw promise in databinding so we gave it a shot, but we’re reconsidering so that we might more easily pick up Compose down the road. Choosing how you invest your effort is a tremendously important skill for building good software. We don’t always nail it, but we certainly aspire to. We are adaptable, but we’re more than water; aware of our environment we strive to make choices that will put us in the best position to build a great product.

Posted by Justin Brooks on October 30, 2019


Improving WPF Text Display Performance

Logos 8 is a WPF Windows desktop application. It includes fonts that are embedded in the application itself (instead of being installed in the system’s Fonts folder). When we added a Chinese Bible to the Logos library, we bundled Noto Sans CJK for display of the Chinese text. But we found that drawing a screen of text would take multiple seconds (even on a high-end machine) and allocate gigabytes of RAM.

Time usage of ReadFileFragment

Memory usage of ReadFileFragment

The time is spent in WPF functions called by DWrite. (These are also the same functions responsible for the excessive memory allocation.) DirectWrite is a hardware-accelerated text layout and rendering subsystem in Windows. It should provide high-performance text rendering, but it’s clearly not in this situation.

Like many Win32 APIs, DirectWrite is built with COM, a 25-year-old technology for creating software components. DWrite provides COM interfaces that an application can call to render text, and also defines interfaces that a client can implement to extend DWrite, e.g., by providing a custom font loader.

WPF implements a number of DWrite COM interfaces to make fonts embedded in .NET assemblies available to DWrite:

Our primary problem is with IDWriteFontFileStream and now, thanks to Microsoft making WPF open source, then recently adding all the code, we can see exactly where the problem lies. ReadFileFragment allocates a buffer, copies unmanaged memory into it, pins the buffer, then returns its address to DWrite (source code link).

This is almost a worst-case scenario for the .NET garbage collector: hundreds of megabytes of memory are being allocated, every buffer is being pinned so the GC can’t compact the heap, and the buffers are probably living long enough (at least across multiple native/managed interop calls) to make it out of gen 0.

Moreover, it’s completely unnecessary. IDWriteFontFileStream only needs to return a pointer to a portion of the font (no copying is necessary), which is simple if the font is already loaded in memory. And it is: embedded fonts are concatenated in the “Resources” section of the assembly, and every .NET DLL is mapped into the virtual address space of the process.

I wrote an implementation of IDWriteFontFileStream that is initialised with a pointer to the beginning of the font data. ReadFileFragment becomes simply a pointer addition and assignment: no allocation, no memcpy, extremely low overhead.

Getting a pointer to the beginning of the font data is somewhat trickier. We can parse the PE header to find the .text section that contains the CLR Header. From this we can find the offset to the embedded resources and use ResourceReader.cs as a guide for parsing the binary format. This will give us the address of each font file in memory, and enough information to construct a pack URI for each font.

We now just have to find a way to replace WPF’s inefficient FontFileStream with our optimised version. The interface-based nature of COM works to our advantage here. If we can replace the IDWriteFactory interface that WPF calls into when it calls RegisterFontCollectionLoader, we could substitute our own IDWriteFontCollectionLoader implementation, and ultimately return the efficient stream from IDWriteFontFileLoader::CreateStreamFromKey.

I first looked into replacing WPF’s reference to the IDWriteFactory object, e.g., by using reflection to change a private static field. But it was created by a static constructor and didn’t seem possible to intercept early enough in the application’s lifetime.

Instead, I found a great library for Windows API hooking: EasyHook. This let me override the DWriteCreateFactory method exported from DWrite.dll for our application’s process. We hook the API very early in startup, before WPF has called it. Then, when WPF does call it, we instantiate the real DWriteFactory but return our own custom implementation of IDWriteFactory that simply forwards most API calls to the real factory.

The two calls that aren’t forwarded directly are RegisterFontCollectionLoader and RegisterFontFileLoader. Instead, another proxy object is created (around WPF’s loaders) and the proxy is registered with the real DWrite. Finally, when DWrite calls IDWriteFontFileLoader::CreateStreamFromKey on our proxy, we examine the “reference key” that’s supplied as a font identifier. If we don’t recognise it, we forward the call to the original loader. But if it’s a pack URI that matches one created for our assembly resources (above), an optimised IDWriteFontFileStream is created and returned instead.

The results are incredible: instead of displaying one page of text every 2–3 seconds, we can now refresh the display dozens of times per second. Managed allocations have been eliminated, there are no native/managed interop calls from DWrite to the font loader, and CPU usage has been reduced by at least 100×.

Updated profile with optimised code

Faithlife isn’t using WPF on .NET Core yet but if we do, we’ll consider contributing this back to dotnet/wpf so that we don’t have to use hacks like API hooking and so that all WPF applications can benefit.

(The code mentioned in this post was primarily written in 2017, extending code we first wrote in 2015 to integrate HarfBuzz with DWrite, so structuring it to be contributed to an open source project was not a concern at the time.)

Posted by Bradley Grainger on June 03, 2019


Model-View-Presenter on Android with Kotlin Coroutines

Kotlin just released a major version that brings coroutines out of experimental status. We’ve been using coroutines for around six months now and have learned quite a bit about how to use coroutines and how not to. The coroutines library is a powerful tool and seems to be on the rise in popularity. There is a lot of great info out there about using coroutines in MVVM projects and in projects that make heavy use of Android Jetpack’s Architecture Components. Instead, we’d like to share how we are using coroutines in a model-view-presenter architected application.

Disclaimer: I’ll assume you’re familiar with coroutines at a high level. If you need an intro, there are plenty of resources online that would serve you better than I would. I’ll link some in the Extras section.

Our Architecture at a Glance

MVP is a tried and true architecutral pattern for Android. Recently we’ve been successfully leveraging new technologies, like coroutines and architecture components, to make MVP nicer than ever. Our current implementation is a stepping stone toward a better future that embraces a reactive paradigm with coroutines as the bedrock.

Current Architecture Overview

Implementation

There are certainly ways we can improve our architecture. We know we have a way to go yet, but we’ve found success on the road to a clean architecture.

Structured Concurrency

A driving principle of coroutines development is that coroutines are like lightweight threads. The early experimental versions of coroutines were a bit of a wild west that encouraged creating new coroutines all the time. A typical presenter might look something like this previous to coroutines version 0.26.

class PlaylistPresenter : PlaylistContract.Presenter {

    private var job: Job? = null

    override fun long fetchSongs() {
        // The default context was 'CommonPool' which 
        // delegated the work to a pool of non-ui threads.
        // A new job is created for this coroutine.
        job = launch {
            val songs = fetchSongsFromNetwork()
            withContext(UI) {
                view?.updateSongList(songs)
            }
        }
    }

    override fun cleanup() {
        // imagine tracking many job objects for multiple
        // coroutines executing simultaneously. _shudder_
        job?.cancel()
    }
}

Unlike threads, which are often created in a global scope, coroutines can be tightly scoped to the entities that own them. Instead of firing coroutines into the ether and hoping everything goes well, the team working on coroutines introduced a better way with kotlinx.coroutines 0.26.

The CoroutineScope interface facilitates constraining a coroutine’s lifetime by an object with lifetime semantics.

class Presenter : PlaylistContract.Presenter, CoroutineScope {

    private val job = Job()
    override val coroutineContext: CoroutineContext = job + Dispatchers.IO
	
    override fun long fetchSongs() {
        // This launch uses the coroutineContext defined
        // by the coroutine presenter.
        launch {
            val songs = fetchSongsFromNetwork()
            withContext(Dispatchers.Main) {
                view?.updateSongList(songs)
            }
        }
    }

    override fun cleanup() {
        // By default, every coroutine initiated in this context
        // will use the job and dispatcher specified by the 
        // coroutineContext.
        // The coroutines are scoped to their execution environment.
        job.cancel()
    }
}

Now the presenter is a CoroutineScope, and coroutines started in this scope can more appropriately be cleaned up or cancelled when the presenter is no longer necessary.

A View-Presenter Contract for Coroutines

Fortunately, generalizing the scope impementation to all presenters in a project is fairly straightforward.

interface BaseContract {
    interface View
    interface Presenter<T: View> {
        @CallSuper
        fun takeView(view: T) {
            this.view = view
        }

        @CallSuper
        fun releaseView() {
            this.view = null
        }

        var view: T? = null
    }
}

interface CoroutineContract {
    interface View : BaseContract.View, CoroutineScope {
        private val job = Job()
        override val coroutineContext: CoroutineContext = job + Dispatchers.Main
    }
    abstract class Presenter<T: View> : BaseContract.Presenter<T>, CoroutineScope {
        private val job = Job()
        override val coroutineContext: CoroutineContext = job + Dispatchers.IO

        override fun releaseView() {
            job.cancel()
            super.releaseView()
        }
    }
}

View methods will be called from the presenter. They should be modified with suspend and, importantly, they are responsible for changing the coroutine execution context back to the main thread via withContext(this.coroutineContext) where appropriate.

Presenter methods return are also suspend methods. Since presenters are coroutine scopes, you can easily use a coroutine builder on the presenter like launch to call the suspend methods on the presenter.

Example MVP Interaction

Testing

Isolating and testing presenters on the JVM is easy if a few rules are followed:

Rules for Presenters
  1. Avoid direct references to the Android framework. If necessary, they reference framework components through the view interface. This is antithetical to the passive view philosophy, but the trade-off is worth sealing the Android framework in a box for testing purposes.
  2. Presenter methods that are called by the view are suspending functions. Those functions should always be called within the presenter’s coroutine scope.
  3. takeView is the cue for the presenter that ui is ready for information. releaseView is the cue for the presenter that the ui is being torn down. Hard dependencies on the MVP view’s lifecycle are discouraged and should use the contract interfaces to communicate if necessary.

Android Test is out now and provides hope for a future where testing on the JVM can involve the Android framework. Fingers crossed.

The Future

We aspire to a more reactive variant of the MVP pattern. The good news is that we have several paths to making our reactive aspirations a reality.

Future Architecture

A possible future using Room for a local data store that serves as an authority on data presented to the user. Maybe Room will support coroutine channels by the time we take this on. Maybe we’ll use LiveData. Who knows?

Dagger 2, Retrofit, Lifecycle, & Coroutines make the foundation of our stack, but we’re always evaluating technologies that can help us write better code faster.

Extras

We like Kotlin a lot. As it currently stands, the app is exclusively written in the language. If you’re an Android developer and find this interesting, we’re hiring!

We have some exciting things planned for the Faithlife App.


Thanks to Logan Ash and Jacob Peterson for reviewing early versions of this post.

Update 01/19/2019

An earlier version of this post suggested that presenter methods should return an instance of Job so that tests can synchronize on the called coroutine in order to prevent race conditions. We ran into some problems with the way exceptions of child coroutines within a runBlocking coroutine builder are handled. runBlocking immediately cancels child coroutines when they thrown an exception. After some consideration, we determined that the presenter should be responsible for exactly how the methods are called, just with the fact that they are suspending.

The structured concurrency updates and the CoroutineScope interface helped us achieve this separation. We can start presenter coroutines in the presenter’s scope from the view and in tests we wrap the entire test in a runBlocking coroutine builder so that suspend methods are executed synchronously.

We’re happier with the new system in it’s flexibility to treat suspending functions differently depending on the context in which they are called.

Posted by Justin Brooks on November 05, 2018


Making renders faster with the React 16.5 profiler

React 16.5 recently shipped, which added support for some new Profiling tools. We recently used these tools to identify a major source of slow render performance.

Faithlife.com is a web application powered by React 16.3. The homepage consists of a reverse-chronological timeline of posts. We received some reports that interactions with posts (such as replying) caused the browser to lag, depending on how far down the post was on the page. The further down the page the post was, the more lag occurred.

After updating React to 16.5 on a local copy of Faithlife, our next step was to start profiling and capture what components were re-rendering. Below is a screenshot of what the tools showed us clicking the ‘Like’ button on any post:

Slow renders screenshot

The blue blocks below NewsFeed show render being called on all the posts in the feed. If there were 10 items loaded, NewsFeedItem and all its children would get rendered 10 times. This can be fine for small components, but if the render tree is deep, rendering a component and its children unnecessarily can cause performance problems. As a user scrolls down on the page, more posts get loaded in the feed. This causes render to get called for posts all the way at the top, even though they haven’t changed!

This seemed like a good time to try changing NewsFeedItem to extend PureComponent, which will skip re-rendering the component and its children if the props have not changed (a shallow comparison is used for this check).

Unfortunately applying PureComponent was not enough - profiling again showed that unnecessary component renders were still happening. We then uncovered two issues preventing us from leveraging PureComponent’s optimizations:

First roadblock: Use of children props.

We had a component that looked something like this:

<NewsFeedItem contents={item.contents}>
  <VisibilitySensor itemId={item.id} onChange={this.handleVisibilityChange} />
</NewsFeedItem>

This compiles down to:

React.createElement(
  NewsFeedItem,
  { contents: item.contents },
  React.createElement(VisibilitySensor, { itemId: item.id, onChange: this.handleVisibilityChange })
);

Because React creates a new instance of VisibilitySensor during each render, the children prop always changes, so making NewsFeedItem a PureComponent would make things worse, since a shallow comparison in shouldComponentUpdate may not be cheap to run and will always return true.

Our solution here was to move VisibilitySensor into a render prop and use a bound function:

<NewsFeedItemWithHandlers
  contents={item.contents}
  itemId={item.id}
  handleVisibilityChange={this.handleVisibilityChange}
/>

class NewsFeedItemWithHandlers extends PureComponent {
  // The arrow function needs to get created outside of render, or the shallow comparison will fail
  renderVisibilitySensor = () => (
    <VisibilitySensor
      itemId={this.props.itemId}
      onChange={this.handleVisibilityChange}
    />
  );

  render() {
    <NewsFeedItem
      contents={this.props.contents}
      renderVisibilitySensor={this.renderVisibilitySensor}
    />;
  }
}

Because the bound function only gets created once, the same function instance will be passed as props to NewsFeedItem.

Second roadblock: Inline object created during render

We had some code that was creating a new instance of a url helper in each render:

getUrlHelper = () => new NewsFeedUrlHelper(
	this.props.moreItemsUrlTemplate,
	this.props.pollItemsUrlTemplate,
	this.props.updateItemsUrlTemplate,
);

<NewsFeedItemWithHandlers
	contents={item.contents}
	urlHelper={this.getUrlHelper()} // new object created with each method call
/>

Since getUrlHelper is computed from props, there’s no point in creating more than one instance if we can cache the previous result and re-use that. We used memoize-one to solve this problem:

import memoizeOne from 'memoize-one';

const memoizedUrlHelper = memoizeOne(
	(moreItemsUrlTemplate, pollItemsUrlTemplate, updateItemsUrlTemplate) =>
		new NewsFeedUrlHelper({
			moreItemsUrlTemplate,
			pollItemsUrlTemplate,
			updateItemsUrlTemplate,
		}),
);

// in the component
getUrlHelper = memoizedUrlHelper(
	this.props.moreItemsUrlTemplate,
	this.props.pollItemsUrlTemplate,
	this.props.updateItemsUrlTemplate
);

Now we will create a new url helper only when the dependent props change.

Measuring the difference

The profiler now shows much better results: rendering NewsFeed is now down from ~50ms to ~5ms!

Better renders screenshot

PureComponent may make your performance worse

As with any performance optimization, it’s critical to measure the how changes impact performance.

PureComponent is not an optimization that can blindly be applied to all components in your application. It’s good for components in a list with deep render trees, which was the case in this example. If you’re using arrow functions as props, inline objects, or inline arrays as props with a PureComponent, both shouldComponentUpdate and render will always get called, because new instances of those props will get created each time! Measure the performance of your changes to be sure they are an improvement.

It may be perfectly fine for your team to use inline arrow functions on simple components, such as binding onClick handlers on button elements inside a loop. Prioritize readability of your code first, then measure and add performance optimizations where it makes sense.

Bonus experiment

Since the pattern of creating components just to bind callbacks to props is pretty common in our codebase, we wrote a helper for generating components with pre-bound functions. Check it out on our Github repo.

You can also use windowing libraries, such as react-virtualized to avoid rendering components that aren’t in view.

Thanks to Ian Mundy, Patrick Nausha, and Auresa Nyctea for providing feedback on early drafts of this post.

Posted by Dustin Masters on September 21, 2018


Converting BibleWorks Hebrew

We recently announced that we’ll support importing BibleWorks notes into Logos 7. This is mostly a matter of converting RTF into our internal format, a fairly well-understood process.

The one wrinkle is supporting BibleWorks Greek and Hebrew fonts. BibleWorks didn’t support Unicode for many years; instead it used the Bwhebl and Bwgrkn 8-bit fonts to simulate Greek and Hebrew characters.

In Unicode, b and β and ב are all separate characters (that in some cases are all supported by a single font). With 8-bit fonts, one uses Latin characters (a, b, c) but changes the font so that “b” looks like β or ב. Behind-the-scenes, κύριος is stored as ku,rioj and אֲדֹנָ֣י as yn’ådoa}. This makes text processing more difficult as you can no longer perform a search for Greek or Hebrew using the Unicode versions of those characters. It also means the user must have these specific fonts installed and can’t change their preferred Greek or Hebrew font. For a good customer experience, we needed to convert the characters for users who had BibleWorks notes predating Unicode support.

The Greek was relatively straightforward, but Hebrew presented a bigger challenge. Not only is BibleWorks Hebrew stored using Latin characters, it’s also stored in display (i.e., left-to-right) order. In Unicode, characters are stored in logical order (which is right-to-left for fragments of Hebrew text); the display system will lay them out correctly. The string needs to be reversed, but with a catch: in both BibleWorks Hebrew and in Unicode, Hebrew vowels and accents are entered after the character that they’re positioned on top of. We can’t naively reverse the entire string; we have to reverse it one grapheme cluster at a time.

Moreover, Unicode has a concept of bidirectional mirroring in which “neutral” characters are replaced by their mirrored versions in a RTL run; for example ( will be displayed as ) in right-to-left text. When reversing the string, these characters need to be replaced by their mirrored version.

Finally, the documentation we found gave BibleWorks Hebrew characters as decimal numbers representing entries in an 8-bit font; due to the way we were reading the RTF source of BW Notes, these bytes had already gone through a Windows-1252 to Unicode conversion, so our character map had to be based off the Unicode characters that corresponded to Windows 1252 bytes.

Step                    
Initial input 191 121 110 39 229 100 111 97 125 192
Decode Windows-1252 to Unicode ¿ y n å d o a } À
Untransliterate ( י נ ָ ֣ ד ֹ א ֲ )
Reverse grapheme clusters ) א ֲ ד ֹ נ ָ ֣ י (
Flip punctuation ( א ֲ ד ֹ נ ָ ֣ י )

The final result: (אֲדֹנָ֣י)

Our complete BibleWorks Hebrew mapping table is available here.

Posted by Bradley Grainger on August 15, 2018


Faster API calls with JWT access tokens

At Faithlife, we’ve been using OAuth 1.0a to handle authentication between services. Instead of designing our apps as monoliths, we’ve been perferring to build lightweight frontend applications that call RESTful microservices, returning entities as JSON. These frontend applications don’t touch our databases directly. Among other benefits, this allows us to better allocate hardware resources (CPU, RAM, disk) to applications that need them.

A typical request to Faithlife might look something like this:

mermaid
sequenceDiagram
	participant Frontend
	participant Accounts
	participant Community Newsfeed
	participant Amber API
	participant Notifications API
	participant OAuth
	Frontend->> OAuth: Authenticate user
	Frontend->> Accounts: List groups
	Frontend->> Community Newsfeed: Fetch newsfeed
	Community Newsfeed->> OAuth: Authenticate user
	Community Newsfeed->> Amber API: Get post images
	Amber API->> OAuth: Authenticate user
	Frontend->> Notifications API: Get notifications
	Notifications API->> OAuth: Authenticate user

At the beginning of the request, Faithlife makes a call to OAuth API to ensure the current OAuth access token and secret are still valid. After that check passes, the current user’s OAuth credentials are also passed to all downstream services that require auth.

An authorization header presented to a downstream API looks something like:

Authorization: OAuth oauth_consumer_key="1E18E56BD0C3A51A945D98136D6462FCEAE65199",oauth_signature="0B847E32C6DE692A7BA899DF67EF5C1BCCAEFA89%262D3F6B2BD18B2DD85821EFF0F07EB130AD46E5C5",oauth_signature_method="PLAINTEXT",oauth_version="1.0",oauth_token="FE009074810F3D2E3A2EB6BF5603B1CA08082AB7"

However, this poses a problem - our microservices do not have access to the OAuth database directly, and can’t validate the current user’s authorization header without first calling OAuth API. These calls are not free - on average, we measured the time taking from 10-35 ms for apps within the same data center, depending on a variety of factors. As we add more API calls to Faithlife, it gets progressively worse:

  • Pages take longer to load, as each API dependency needs to call OAuth API.
  • APIs may need to fetch data from other downstream APIs, and each API needs to validate the authenticated user.
  • APIs hosted outside the datacenter (Azure, GKE, etc) can magnify this problem significantly if the app is not hosted geographically close to where OAuth API lives.
  • Locally caching the oauth validation state on a web node only solves part of the problem. Many APIs are backed by multiple web nodes, with round robin request balancing, so there could be a cache miss.

JWT Access Tokens

What we needed was a way to pass a token to downstream APIs that identifies the current user. OAuth 1.0a was suitable for a long time, and several years ago we decided to hold off on migrating to OAuth 2 because the need was not strong enough. OAuth 2 adds a few steps to the authentication flow:

  1. When signing a user in, obtain a refresh token and an access token.
  2. Use the access token for API calls. The OAuth 2 and OpenID Connect standards do not define the format that these access tokens have to be in, but OpenID Connect mandates JSON web tokens (JWTs) for identity tokens, and identity tokens can be used as access tokens. To future proof our implementation, we chose to use JWTs signed with ES256. This blog post explains the token differences in greater detail..
  3. When the access token expires, use the refresh token to obtain a new access token

JWT access tokens have a few desirable properties for our use case. Tokens contain claims about the current user (such as the user ID and current roles), an expiration date, and are signed with a public/private key pair. Downstream APIs can validate their integrity using the public key, but only the signing authority can issue new ones. For our use case, we established some requirements for all JWT access tokens:

  • Only OAuth API has access to the private key and is solely responsible for issuing JWT access tokens. The public key is available via a public API.
  • JWT access tokens can only be created from plaintext OAuth 1 access tokens. Creating a JWT access token from a previous JWT access token is not allowed.
  • JWT access tokens are signed with ES256. All other signatures must be rejected.
  • Expiration date is 10 minutes from the current time, and not valid before 10 minutes prior to current time.
  • JWT access tokens contain claims for the current user ID, frontend app consumer ID, and any other properties that would be normally obtained when validating the current user via OAuth API.

JWT access tokens are presented to the downstream APIs that Faithlife calls. On the very first request, the current public key is requested from OAuth API. Using this public key, tokens can now be validated locally. All future tokens for the lifetime of the app are validated with this public key. Because the token has claims stored within it, we now have no need to call OAuth API for successful authentication attempts. If the token can’t be validated locally, either because the token appears to be expired due to clock skew, or because the signing key was changed, the downstream API makes a validation call to OAuth API (just like it did before).

We did not end up implementing the full OAuth 2 authorization flow when adding support for JWT access tokens. Instead, we used the OAuth 1 credentials in place of OAuth 2 refresh tokens to obtain access tokens.

How is this approach different from OAuth 1?

This approach follows the full OAuth 1.0a authorization flow, but replaces OAuth 1 plaintext tokens with JWT access tokens when communicating with downstream APIs. An OAuth 1 plaintext token is still obtained and stored by the frontend web application (in an encrypted cookie), and then upgraded to an JWT access token at the beginning of a frontend request. We could have migrated our auth services to a full OAuth 2 implementation, but this would be a non-trivial amount of work, and was more than we wanted to take on in this iteration. We were mainly interested in the scalability wins of using JWT access tokens, and leaving open the future possibility of using the full OAuth 2 authorization flow in the future.

Tokens in action

When Faithlife gets a web request, it makes a call to OAuth API:

GET /oauth/v1/users/current
Authorization: OAuth oauth_consumer_key="1E18E56BD0C3A51A945D98136D6462FCEAE65199",oauth_signature="0B847E32C6DE692A7BA899DF67EF5C1BCCAEFA89%262D3F6B2BD18B2DD85821EFF0F07EB130AD46E5C5",oauth_signature_method="PLAINTEXT",oauth_version="1.0",oauth_token="FE009074810F3D2E3A2EB6BF5603B1CA08082AB7"

And gets back a JWT:

X-Bearer-Authorization: Bearer eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIyOTg2Njg5IiwiY29uc3VtZXJOYW1lIjoiRmFpdGhsaWZlIiwiY29uc3VtZXJUb2tlbiI6IjRFNTdGQTk1MDE1MTJDMUM0RjdFMzQ1NzE0NjNDMjI0QjBCMzc1NEQiLCJpc0FkbWluQ29uc3VtZXIiOiJ0cnVlIiwibmJmIjoxNTMyOTgyMDQ2LCJleHAiOjE1MzI5ODMyNDYsImlhdCI6MTUzMjk4MjY0NiwiaXNzIjoiYXV0aC5mYWl0aGxpZmUuY29tIiwiYXVkIjoiZmFpdGhsaWZlLWJhY2tlbmQtYXBpcyJ9.7GSdItnnCr8QOLS3uCbJMY0X-D7jTjp_XUAp8clo9LY4X5Zlf_5I7RSMZr3J6kOihwbHjEbuh0AFMXmF5YQZLg

Which decodes to:

{
  "sub": "2986689",
  "consumerName": "Faithlife",
  "consumerToken": "4E57FA9501512C1C4F7E34571463C224B0B3754D",
  "isAdminConsumer": "true",
  "nbf": 1532982046,
  "exp": 1532983246,
  "iat": 1532982646,
  "iss": "auth.faithlife.com",
  "aud": "faithlife-backend-apis"
}

This JWT is passed to all downstream services in the Authorization header. Our request graph now looks much better:

mermaid
sequenceDiagram
	participant Frontend
	participant Accounts
	participant Community Newsfeed
	participant Amber API
	participant Notifications API
	participant OAuth
	Frontend->> OAuth: Authenticate user and obtain JWT
	Frontend->> Accounts: List groups
	Frontend->> Community Newsfeed: Fetch newsfeed
	Community Newsfeed->> Amber API: Get post images
	Frontend->> Notifications API: Get notifications

On average we measured validating this token taking less than 4 ms per request. We’re happy with the results so far and are in the process of rolling support out to all of our APIs.

Alternate strategies

There is more than one way to solve this problem. A few other strategies we considered:

  • Using a shared service account to communicate with downstream services. This increases the chance that a frontend regression reveals access to data that the user is not allowed to see (e.g. posts to a secret group). There are many different teams in charge of the APIs that Faithlife calls (e.g. commerce APIs are separated from community APIs), and validation at the API layer is much easier to enforce across team boundaries.

  • Using a shared key and pass a signed token or cookie. This presents several security problems. Rotating the shared token would have been very complicated, and having the signing key on all of our web nodes increases the attack surface. We wanted a solution that was standardized and would scale well into the future. Some of these solutions are also tighly coupled to the web application stack (ASPXAUTH cookies for example), and our auth solution needs to work on multiple API platforms.

Technologies used

We primarily use ASP.NET here at Faithlife, although we also host services using NodeJS and .NET Core. A sample .NET Core app that demonstrates both signing and validating ES256 tokens is hosted here:

https://github.com/Faithlife/ES256-Demo

This demo uses these NuGet packages:

Thanks for reading!

If you’re intersted in working on projects like this, come work with us! Thanks to Robert Bolender, Justin Brooks, and Bradley Grainger for giving feedback on early drafts of this post.

Posted by Dustin Masters on August 06, 2018


Inspecting Response Headers in the Android WebView

The Android WebView is great for presenting users with web content in native or hybrid applications. Its ability to bind JavaScript code to a client-side interface, navigation controls, and relatively small API surface are all great. However, this week I ran into an unfortunate omission. There is no proper mechanism for inspecting response headers on WebView requests.

Getting the Response Headers

In our particular case, we needed some information from faithlife.com that is returned as a custom header in order to handle a specific edge-case in our app well. After some exploration, it seemed our best bet was to make the HTTP request ourselves without the help of WebView so we could inspect the headers. We use OkHttp for networking, but the idea is the same so long as you’re able to inspect response headers using your networking client of choice. That looks something like this:

val okHttpClient = OkHttpClient.Builder().Build()
val request = Request.Builder()
		.url(requestUrl)
		.build()
val response = okHttpClient.newCall(request).execute()
val importantInfo = response.headers.firstOrNull { it.key == "X-Important-Info" }

importantInfo?.let {
	// We have our information.
}

Note: When using this type of logic in shouldOverrideUrlLoading(WebView, WebViewRequest), be sure to copy the request headers on the second parameter into the headers of the initial get request (part of the RequestBuilder if you use OkHttp).

Handling Cookies

Since this WebView is showing user content on faithlife.com and is running within the context of our app, we’ll need to make sure faithlife.com knows that we’re authenticated. The site handles authentication via cookies. That cookie information is passed back from the initial request via Set-Cookie headers. Fortunately, it’s pretty easy to get that data where it needs to go.

response.headers("Set-Cookie")?.forEach { setCookieHeader ->
	CookieManager.getInstance()
		.setCookie(requestUrl, setCookieHeader)
}

Loading the Web Content

Now we have access to the response headers and we’ve made sure our cookies are up to date, but we still have to show the web page. Since we already have the HTML of the page from the initial GET, we can avoid loading the webpage again by loading that HTML into the WebView directly. The thing to pay attention to here is that the first parameter to loadDataWithBaseURL is used to resolve relative paths and for applying JavaScript’s same origin policy. You want to be sure to use the response’s last known url when loading this data in case the first call to the requestUrl triggers a redirect. response.request().url() is the okhttp3.Response way of getting the last url in the redirect chain.

if (response.isSuccessful) {
	val responseBody = response.body()
	responseBody?.let {
		webView.loadDataWithBaseURL(
			response.request().url().toString(),
			responseBody,
			response.header("Content-Type") ?: "text/html",
			null,
			requestUrl
		)
	}
}

Now the WebView takes back over as it renders the HTML, loading other resources just as if it had handled the initial get request also.

Extra Reading

We like Kotlin a lot. As it currently stands, the upcoming version of the app is exclusively written in the language. If you’re an Android developer and find problems like this fun, we’re hiring!

We have some exciting things planned for the Faithlife App [play store link]. Stay tuned!


Thanks to Dustin Masters & Logan Ash for reviewing earlier versions of the post.

Posted by Justin Brooks on July 31, 2018


Inspecting application state with the SOS debugging tools

In this post, we’ll cover how to use the SOS debugging tools to inspect variables from a process dump of a .NET Framework / .NET Core application.

Required:

Obtaining a memory dump

In this first example, we’ll use a running ASP.NET MVC 5 application hosted by IIS, but the steps here can be used on a normal .NET framework Windows application. Let’s start by taking a full memory dump of a running application.

Download ProcDump and copy it to the server that runs the application you want to debug. Obtain the process ID from the application you want to profile by using Task Manager, and then pass it as an argument to procdump.

procdump -ma <pid>

You should now have a dump named similar to w3wp_171229_151050.dmp in the working directory.

Note:

If you’re running several applications under a single app pool in IIS, it may be easier to debug by changing the app to run under its own application pool, which allows the ASP.NET app to run under a dedicated process.

Inspecting the ASP.NET application state (.NET Framework)

Now that we have a memory dump, it’s time to look at the suspended state of the application. Copy the dump file to your workstation, and then open it via File > Open Crash Dump in WinDBG. Your screen should look like this:

Load the SOS debugging extension, which will allow us to inspect the managed threads:

!loadby sos clr

Then, list the stack trace of every thread:

!eestack

Note:

If get an exception when running this command and you are using IIS Express, try the command again. There appears to be a bug that throws an exception only for the first command run from WinDbg, which should not affect the rest of your debugging session.

You should see a lot of threads in the output. To narrow the results down, search for the namespace of your project in the output text.

We can see that there is an external web request being made in Thread 34. Let’s look at what external URL is being requested. Switch to the thread, and then run clrstack -p to get some more detailed information about each method call.

~34 s
!clrstack -p

Note:

You may see many arguments that contain the value <no data>. This can be caused by compiler optimizations; inspecting the state of these parameters is beyond the scope of this article.

The controller is present in this call stack, so let’s inspect the object instance by clicking on the this instance address, which is a shortcut for the !DumpObj command.

This instance contains a field named _request, which contained a field named requestUri, which has the original URI for this request:

That’s it! The commands vary slightly for dumping different field types.


.NET Core application on Linux

Required:

  • LLDB 3.9
  • Locally-built copy of the SOS plugin in the CoreCLR repo - instructions

In this next scenario, we’ll look at inspecting a core dump from a .NET Core app running on an Ubuntu x64 instance. The instance will have a core dump taken while a request is processing, which we will then inspect.

Take a core dump of the process using the createdump utility. These commands assume you have the coreclr repo checked out to ~/git/coreclr, and that you’re running an application built with .NET Core 2.0.

sudo ~/git/coreclr/bin/Product/Linux.x64.Debug/createdump -u (pid)

Load the dump in LLDB. This command also loads the SOS debugging extension.

lldb-3.9 dotnet -c /tmp/coredump.18842 -o "plugin load ~/git/coreclr/bin/Product/Linux.x64.Debug/libsosplugin.so"

After a few moments, a CLI will become available. Run eestack to dump the state of all CLR threads. If you get an empty output or a segmentation fault, verify that you are running the correct version of lldb and are loading libsosplugin from the bin directory, and that you have created the core dump with createdump.

eestack

There is an instance of HomeController in the stack of Thread 17. Switch to it to reveal more information about the current request. This time, we’ll inspect the state of an internal .NET Core request frame, since information about the current request isn’t as accessible as it was in ASP.NET MVC 5.

thread select 17
sos DumpStackObjects

Look for the address of Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Frame'1[[Microsoft.AspNetCore.Hosting.Internal.HostingApplication+Context, Microsoft.AspNetCore.Hosting]] in the output, and then dump the object. The name of this class might differ slightly based on the version of the framework you’re running.

Identify the QueryString field address:

Dumping that field reveals the query part of the URL the browser requested!


If debugging applications like this sounds interesting to you, join us! Thanks to Kyle Sletten, Justin Brooks, and Bradley Grainger for reviewing early drafts of this post.

Further reading:

Posted by Dustin Masters on January 09, 2018


Mitigating cross-site scripting with Content Security Policy

In this post, we’re going to look at using Content Security Policy (CSP) as a defense-in-depth technique to block script injection attacks.

When building website that hosts user-generated content, such as:

Great to be here!
<script>window.location='https://example.com'</script>

It’s necessary to encode user-generated content so that browsers don’t mistake it for markup, and execute an untrusted script. This is easy to do for plain text, but what if a page needs to render user-generated HTML? Here’s an example of HTML that contains inline Javascript, which browsers could execute:

<p>Great to <b>be</b> here!</p>
<img src="" onerror="alert(0)" />
<a href="javascript:alert(0)">Hi</a>
<script>window.location='https://example.com'</script>

This content must be sanitized before rendering. Libraries such as HTMLAgilityPack or DOMPurify provide a way to parse the HTML and strip out elements or attributes known to execute scripts.

Sanitization is important, but what if an attacker has discovered a way around the filter? This is where Content Security Policy comes in.

If the Content-Security-Policy header is present when retrieving the page, and contains a script-src definition, scripts will be blocked unless they match one of the sources specified in the policy. A policy might look something like:

script-src 'self'; object-src 'none'; base-uri 'none';

This policy disallows:

  • External scripts not hosted on the same domain as the current page.
  • Inline script elements, such as <script>
  • Evaluated Javascript, such as <img src="" onerror="alert(0)" />
  • Base elements, which could break scripts loaded from a relative path
  • Object elements, which can host interactive content, such as Flash

Whitelisting inline scripts

Sometimes it is necessary to run inline scripts on your page. In these cases, the nonce attribute on script elements can be used to whitelist scripts that you control.

<script nonce="00deadbeef">doSomething()</script>

A matching nonce must be present in the CSP for the script to run. For compatibility with older browsers, unsafe-inline allows scripts to run if the nonce tag is unsupported.

script-src 'self' 'nonce-00deadbeef' 'unsafe-inline'; object-src 'none'; base-uri 'none';

It is critical that this nonce is derived from a cryptographic random number generator so that an attacker can’t guess a future nonce. In .NET, RNGCryptoServiceProvider.GetBytes can be used to fill a 16 byte array:

using (var random = new RNGCryptoServiceProvider())
{
    byte[] nonce = new byte[16];
    random.GetBytes(nonce);
    return Convert.ToBase64String(nonce);
}

Whitelisting external scripts

strict-dynamic can be used to allow scripts hosted on a third-party domain to be loaded by scripts that you control. However, at the time of writing, this isn’t supported by all major browsers, so a host whitelist should be used as well as a fallback until it has broad support.

script-src 'self' 'nonce-00deadbeef' 'unsafe-inline' 'strict-dynamic' https://example.com; object-src 'none'; base-uri 'none';

When using strict-dynamic, you will also need to add a nonce to any external scripts that are referenced.

<script nonce="00deadbeef" src="https://example.com/analytics.js" />

Note:

Be careful what sources you whitelist. If any endpoints return JSONP and fail to sanitize the callback, it is possible to inject code. For example:

<script src="https://example.com/getuser?callback=window.location='http://google.com';test"></script>

Might return window.location='http://google.com';test({}) in the JSONP response, which would cause arbitrary code to be executed!

There are other policies that you can define to strengthen your site’s security, such as restricting where stylesheets are loaded from. This post only focuses on mitigating cross-site scripting attacks.

Further Reading

Thanks to Bradley Grainger and Kyle Sletten for reviewing this implementation.

Posted by Dustin Masters on December 21, 2017


‘in’ will make your code slower

Problem

The new in keyword (for parameters) in C# 7.2 promises to make code faster:

When you add the in modifier to pass an argument by reference, you declare your design intent is to pass arguments by reference to avoid unnecessary copying.

However, naïve use of this modifier will result in more copies (and slower code)!

This side-effect is implied by the MSDN documentation:

You can call any instance method that uses pass-by-value semantics. In those instances, a copy of the in parameter is created.

It’s also mentioned in passing in the readonly ref proposal:

After adding support for in parameters and ref redonly [sic] returns the problem of defensive copying will get worse since readonly variables will become more common.

Consider the example method from MSDN:

private static double CalculateDistance(in Point3D point1, in Point3D point2)
{
    double xDifference = point1.X - point2.X;
    double yDifference = point1.Y - point2.Y;
    double zDifference = point1.Z - point2.Z;

    return Math.Sqrt(xDifference * xDifference + yDifference * yDifference + zDifference * zDifference);
}

And assume this implementation of Point3D:

public struct Point3D
{
    public Point3D(double x, double y, double z)
    {
        X = x;
        Y = y;
        Z = z;
    }

    public double X { get; }
    public double Y { get; }
    public double Z { get; }
}

A number of C# features now combine in an unfortunate way:

  1. An in parameter is readonly
  2. Calling an instance method on a readonly struct makes a copy
    • Because the method might mutate this, a copy has to be made to ensure the readonly value isn’t modified
  3. Property accessors are instance methods

Every time a property on an in parameter is accessed in CalculateDistance, the compiler has to defensively create a temporary copy of the parameter. We’ve now gone from avoiding one copy per argument (at the call site) to three copies per argument (inside the method body)!

This is not a new problem; see Jon Skeet’s post on The Surprising Inefficiency of Readonly Fields. But using in makes it a much more common problem.

Solution

The solution is also in C# 7.2: readonly struct.

If we change public struct Point3D to public readonly struct Point3D (the implementation doesn’t have to change because all fields are already readonly), then the compiler knows it can elide the temporary copy inside the body of CalculateDistance. This makes the method faster than passing the structs by value.

Note that we could have achieved the same effect in C# 7.1 by passing the struct by ref. However, this allows the caller to mutate its fields (if it’s mutable) or reassign the entire variable to a new value. Using in expresses the intent that the caller will not modify the variable at all (and the compiler enforces that).

Demonstration

I’ve created a test harness that benchmarks the various combinations of in, ref, struct and readonly struct. (Note that I increased the struct size to 56 bytes to make the differences more obvious; smaller structs may not be impacted as much.) The full benchmark results are in that repo; the summary is:

Method Mean
PointByValue 25.09 ns
PointByRef 21.77 ns
PointByIn 34.59 ns
ReadOnlyPointByValue 25.29 ns
ReadOnlyPointByRef 21.78 ns
ReadOnlyPointByIn 21.79 ns

Summary

  • If you’re using in to express design intent (instead of ref), be aware that there may be a slight performance penalty when passing large structs.
  • If you’re using in to avoid copies and improve performance, only use it with readonly struct.

Posted by Bradley Grainger on December 07, 2017