Logos 8 is a WPF Windows desktop application. It includes fonts that are embedded in the application itself (instead of being installed in the system’s Fonts folder). When we added a Chinese Bible to the Logos library, we bundled Noto Sans CJK for display of the Chinese text. But we found that drawing a screen of text would take multiple seconds (even on a high-end machine) and allocate gigabytes of RAM.
The time is spent in WPF functions called by DWrite. (These are also the same functions responsible for the excessive memory allocation.) DirectWrite is a hardware-accelerated text layout and rendering subsystem in Windows. It should provide high-performance text rendering, but it’s clearly not in this situation.
Like many Win32 APIs, DirectWrite is built with COM, a 25-year-old technology for creating software components. DWrite provides COM interfaces that an application can call to render text, and also defines interfaces that a client can implement to extend DWrite, e.g., by providing a custom font loader.
WPF implements a number of DWrite COM interfaces to make fonts embedded in .NET assemblies available to DWrite:
IDWriteFontCollectionLoader
: creates a IDWriteFontFileEnumerator
that enumerates fontsIDWriteFontFileEnumerator
: enumerates a collection of IDWriteFontFile
objectsIDWriteFontFile
: provides a “reference key” that identifies a font and a IDWriteFontFileLoader
that can load the fontIDWriteFontFileLoader
: creates a stream from a reference keyIDWriteFontFileStream
: gets a font file’s size and reads chunks of the fileOur primary problem is with IDWriteFontFileStream
and now, thanks to Microsoft making WPF open source,
then recently adding all the code,
we can see exactly where the problem lies. ReadFileFragment
allocates a buffer, copies unmanaged
memory into it, pins the buffer, then returns its address to DWrite
(source code link).
This is almost a worst-case scenario for the .NET garbage collector: hundreds of megabytes of memory are being allocated, every buffer is being pinned so the GC can’t compact the heap, and the buffers are probably living long enough (at least across multiple native/managed interop calls) to make it out of gen 0.
Moreover, it’s completely unnecessary. IDWriteFontFileStream only needs to return a pointer to a portion of the font (no copying is necessary), which is simple if the font is already loaded in memory. And it is: embedded fonts are concatenated in the “Resources” section of the assembly, and every .NET DLL is mapped into the virtual address space of the process.
I wrote an implementation of IDWriteFontFileStream
that is initialised with a pointer to the beginning
of the font data. ReadFileFragment
becomes simply a pointer addition and assignment: no allocation,
no memcpy, extremely low overhead.
Getting a pointer to the beginning of the font data is somewhat trickier. We can parse the PE header
to find the .text
section that contains the CLR Header.
From this we can find the offset to the embedded resources and use ResourceReader.cs
as a guide for parsing the binary format. This will give us the address of each font file in memory, and
enough information to construct a pack URI for each font.
We now just have to find a way to replace WPF’s inefficient FontFileStream
with our optimised
version. The interface-based nature of COM works to our advantage here. If we can replace the
IDWriteFactory
interface that WPF calls into when it calls RegisterFontCollectionLoader
, we could
substitute our own IDWriteFontCollectionLoader
implementation, and ultimately return the efficient
stream from IDWriteFontFileLoader::CreateStreamFromKey
.
I first looked into replacing WPF’s reference to the IDWriteFactory
object, e.g., by using reflection
to change a private static field. But it was created by a static constructor
and didn’t seem possible to intercept early enough in the application’s lifetime.
Instead, I found a great library for Windows API hooking: EasyHook.
This let me override the DWriteCreateFactory method exported from DWrite.dll for our application’s
process. We hook the API very early in startup, before WPF has called it. Then, when WPF does call it,
we instantiate the real DWriteFactory
but return our own custom implementation of IDWriteFactory
that simply forwards most API calls to the real factory.
The two calls that aren’t forwarded directly are RegisterFontCollectionLoader
and RegisterFontFileLoader
.
Instead, another proxy object is created (around WPF’s loaders) and the proxy is registered with the
real DWrite. Finally, when DWrite calls IDWriteFontFileLoader::CreateStreamFromKey
on our proxy,
we examine the “reference key” that’s supplied as a font identifier. If we don’t recognise it, we forward
the call to the original loader. But if it’s a pack URI that matches one created for our assembly resources
(above), an optimised IDWriteFontFileStream
is created and returned instead.
The results are incredible: instead of displaying one page of text every 2–3 seconds, we can now refresh the display dozens of times per second. Managed allocations have been eliminated, there are no native/managed interop calls from DWrite to the font loader, and CPU usage has been reduced by at least 100×.
Faithlife isn’t using WPF on .NET Core yet but if we do, we’ll consider contributing this back to dotnet/wpf so that we don’t have to use hacks like API hooking and so that all WPF applications can benefit.
Update: We’ve opened PR 6254 to fix this problem in WPF.
(The code mentioned in this post was primarily written in 2017, extending code we first wrote in 2015 to integrate HarfBuzz with DWrite, so structuring it to be contributed to an open source project was not a concern at the time.)
Posted by Bradley Grainger on June 03, 2019