Improving WPF Text Display Performance

« Model-View-Presenter on Android with Kotlin Coroutines

Improving WPF Text Display Performance

Logos 8 is a WPF Windows desktop application. It includes fonts that are embedded in the application itself (instead of being installed in the system’s Fonts folder). When we added a Chinese Bible to the Logos library, we bundled Noto Sans CJK for display of the Chinese text. But we found that drawing a screen of text would take multiple seconds (even on a high-end machine) and allocate gigabytes of RAM.

Time usage of ReadFileFragment

Memory usage of ReadFileFragment

The time is spent in WPF functions called by DWrite. (These are also the same functions responsible for the excessive memory allocation.) DirectWrite is a hardware-accelerated text layout and rendering subsystem in Windows. It should provide high-performance text rendering, but it’s clearly not in this situation.

Like many Win32 APIs, DirectWrite is built with COM, a 25-year-old technology for creating software components. DWrite provides COM interfaces that an application can call to render text, and also defines interfaces that a client can implement to extend DWrite, e.g., by providing a custom font loader.

WPF implements a number of DWrite COM interfaces to make fonts embedded in .NET assemblies available to DWrite:

IDWriteFontCollectionLoader: creates a IDWriteFontFileEnumerator that enumerates fonts
IDWriteFontFileEnumerator: enumerates a collection of IDWriteFontFile objects
IDWriteFontFile: provides a “reference key” that identifies a font and a IDWriteFontFileLoader that can load the font
IDWriteFontFileLoader: creates a stream from a reference key
IDWriteFontFileStream: gets a font file’s size and reads chunks of the file

Our primary problem is with IDWriteFontFileStream and now, thanks to Microsoft making WPF open source, then recently adding all the code, we can see exactly where the problem lies. ReadFileFragment allocates a buffer, copies unmanaged memory into it, pins the buffer, then returns its address to DWrite (source code link).

This is almost a worst-case scenario for the .NET garbage collector: hundreds of megabytes of memory are being allocated, every buffer is being pinned so the GC can’t compact the heap, and the buffers are probably living long enough (at least across multiple native/managed interop calls) to make it out of gen 0.

Moreover, it’s completely unnecessary. IDWriteFontFileStream only needs to return a pointer to a portion of the font (no copying is necessary), which is simple if the font is already loaded in memory. And it is: embedded fonts are concatenated in the “Resources” section of the assembly, and every .NET DLL is mapped into the virtual address space of the process.

I wrote an implementation of IDWriteFontFileStream that is initialised with a pointer to the beginning of the font data. ReadFileFragment becomes simply a pointer addition and assignment: no allocation, no memcpy, extremely low overhead.

Getting a pointer to the beginning of the font data is somewhat trickier. We can parse the PE header to find the .text section that contains the CLR Header. From this we can find the offset to the embedded resources and use ResourceReader.cs as a guide for parsing the binary format. This will give us the address of each font file in memory, and enough information to construct a pack URI for each font.

We now just have to find a way to replace WPF’s inefficient FontFileStream with our optimised version. The interface-based nature of COM works to our advantage here. If we can replace the IDWriteFactory interface that WPF calls into when it calls RegisterFontCollectionLoader, we could substitute our own IDWriteFontCollectionLoader implementation, and ultimately return the efficient stream from IDWriteFontFileLoader::CreateStreamFromKey.

I first looked into replacing WPF’s reference to the IDWriteFactory object, e.g., by using reflection to change a private static field. But it was created by a static constructor and didn’t seem possible to intercept early enough in the application’s lifetime.

Instead, I found a great library for Windows API hooking: EasyHook. This let me override the DWriteCreateFactory method exported from DWrite.dll for our application’s process. We hook the API very early in startup, before WPF has called it. Then, when WPF does call it, we instantiate the real DWriteFactory but return our own custom implementation of IDWriteFactory that simply forwards most API calls to the real factory.

The two calls that aren’t forwarded directly are RegisterFontCollectionLoader and RegisterFontFileLoader. Instead, another proxy object is created (around WPF’s loaders) and the proxy is registered with the real DWrite. Finally, when DWrite calls IDWriteFontFileLoader::CreateStreamFromKey on our proxy, we examine the “reference key” that’s supplied as a font identifier. If we don’t recognise it, we forward the call to the original loader. But if it’s a pack URI that matches one created for our assembly resources (above), an optimised IDWriteFontFileStream is created and returned instead.

The results are incredible: instead of displaying one page of text every 2–3 seconds, we can now refresh the display dozens of times per second. Managed allocations have been eliminated, there are no native/managed interop calls from DWrite to the font loader, and CPU usage has been reduced by at least 100×.

Updated profile with optimised code

Faithlife isn’t using WPF on .NET Core yet but if we do, we’ll consider contributing this back to dotnet/wpf so that we don’t have to use hacks like API hooking and so that all WPF applications can benefit.

Update: We’ve opened PR 6254 to fix this problem in WPF.

(The code mentioned in this post was primarily written in 2017, extending code we first wrote in 2015 to integrate HarfBuzz with DWrite, so structuring it to be contributed to an open source project was not a concern at the time.)

Posted by Bradley Grainger on June 03, 2019