Local Functions and Allocations

C# 7 introduces local functions. By default, the compiler generates very efficient code for local functions, even if the argument to the method is captured.

public int Function(int value)
{
    // Imagine this is a complicated implementation.
    // Note: this captures 'value' from the outer function.
    int Impl() => value + value;

    // perform argument validation (and early-out) here, then delegate to implementation
    return value == 0 ? 0 : Impl();
}

The decompiled C# compiler output is:

[CompilerGenerated]
[StructLayout(LayoutKind.Auto)]
private struct <>c__DisplayClass0_0
{
    public int value;
}

public int Function(int value)
{
    LocalFunctions.<>c__DisplayClass0_0 <>c__DisplayClass0_;
    <>c__DisplayClass0_.value = value;
    if (<>c__DisplayClass0_.value != 0)
        return LocalFunctions.<Function>g__Impl0_0(ref <>c__DisplayClass0_);
    return 0;
}

[CompilerGenerated]
internal static int <Function>g__Impl0_0(ref LocalFunctions.<>c__DisplayClass0_0 ptr)
{
    return ptr.value + ptr.value;
}

The struct incurs no allocations, and the JITter will inline the local function so it’s extremely efficient at runtime.

In order to hide the implementation in C# 6, you would have had to use a local Func (see decompiled code):

public int Function(int value)
{
    Func<int> Impl = () => value + value;

    return value == 0 ? 0 : Impl();
}

This would allocate an instance of a compiler-generated class and an instance of Func<int> whether or not Impl was actually invoked. The C# 7 code is much better. (Read more about local functions versus lambdas.)

However, there is an edge case in C# 7. If your local function would be implemented with a compiler-generated class (two examples I’ve found are iterator methods and async methods), then that class will always be allocated when the outer function is invoked, whether it’s used or not.

public IEnumerable<int> Function(int value)
{
    IEnumerable<int> Impl()
    {
        yield return value;
    }

    return value == 0 ? Enumerable.Empty<int>() : Impl();
}

The decompiled C# compiler output:

[CompilerGenerated]
private sealed class <>c__DisplayClass0_0
{
    // iterator state machine elided
}

public IEnumerable<int> Function(int value)
{
    LocalFunctions.<>c__DisplayClass0_0 <>c__DisplayClass0_ = new LocalFunctions.<>c__DisplayClass0_0();
    <>c__DisplayClass0_.value = value;
    if (<>c__DisplayClass0_.value != 0)
    {
        return <>c__DisplayClass0_.<Function>g__Impl0();
    }
    return Enumerable.Empty<int>();
}

If the “early out” condition is taken >90% of the time, and if this function is called frequently, you may wish to avoid the overhead of the unnecessary allocations of the compiler-generated class. Fortunately, there is a workaround: instead of implicitly capturing the outer function’s parameter, explicitly alias it:

public IEnumerable<int> Function(int value)
{
    IEnumerable<int> Impl(int value_)
    {
        yield return value_;
    }

    return value == 0 ? Enumerable.Empty<int>() : Impl(value);
}

The decompiled C# compiler output:

[CompilerGenerated]
private sealed class <<Function>g__Impl0_0>d : IEnumerable<int>, IEnumerable, IEnumerator<int>, IDisposable, IEnumerator
{
    // iterator state machine elided
}

public IEnumerable<int> Function(int value)
{
    if (value != 0)
        return LocalFunctions.<Function>g__Impl0_0(value);
    return Enumerable.Empty<int>();
}

[CompilerGenerated, IteratorStateMachine(typeof(LocalFunctions.<<Function>g__Impl0_0>d))]
internal static IEnumerable<int> <Function>g__Impl0_0(int value_)
{
    LocalFunctions.<<Function>g__Impl0_0>d expr_07 = new LocalFunctions.<<Function>g__Impl0_0>d(-2);
    expr_07.<>3__value_ = value_;
    return expr_07;
}

Now the “happy path” has zero allocations and the compiler-generated class is only instantiated if the local function is actually called.

I wouldn’t recommend doing this by default, but if profiling indicates that extra allocations are a problem, you may want to perform this refactoring to avoid them. I strongly recommend checking the compiled IL to ensure that you’ve solved it. If your outer function has many parameters or locals and (due to a typo) you inadvertently capture just one of them, then the optimization won’t kick in.

And note that in the typical “argument validation” use case for local functions, there’s no point in doing this because the outer function should call the local function 100% of the time. (The only reason it wouldn’t is if a boneheaded exception is thrown because the function was called incorrectly.) This refactoring is only useful if the local function ends up being called extremely infrequently.

I encountered a “real world” example of this issue in MySqlConnector. When I refactored the code to use local functions, I noticed an increase in allocations (and a decrease in performance). The solution was to avoid implicitly capturing local variables in the local functions.

In that specific scenario, the MySQL client library was reading bytes from a TCP socket. Most of the time, the bytes have already arrived and are in an OS buffer. Thus, we can immediately (and synchronously) return a ValueTask<int> containing the number of bytes copied into the caller’s buffer. Infrequently, we have to asynchronously wait on network I/O. Only then do we want the overhead of allocating the compiler-generated async state machine in order to return a (wrapped) Task<int> to the caller.

Posted by Bradley Grainger on August 02, 2017