Intro ¶
Having some recent experience with F# I realised some of the functional programming principles could also be successfully used in C#. One could go completely crazy and create/use library like language-ext. However, I would like to focus on the way of writing code rather than utilizing any libraries. The idea I would like to introduce in this article is pure functions.
Pure has no side effects ¶
In a non-functional mindset there is a common misconception that side effects are something unwanted in good code. I guess it comes from the medical definition of side effects1 where indeed any other than healing effects are undesirable. However, in the functional mindset we define side effects as changes to any external state. External i.e. not exclusively owned by the function. Here are a few examples of side effects:
- saving data to db
- throwing an exception
- raising an event
- posting a data to an API
- logging
- setting input argument’s property/field
- setting parent object’s property/field
Having known pure functions must not have side effects the above list might indicate they are pretty much useless. The reality is slightly different though.
Pure is deterministic ¶
Pure function must be deterministic, i.e. for a given input parameter set it must always return the same output. This is how the most of basic mathematical functions work e.g. $f(x) = x + 1$ or $f(x) = sin(x)$. This indicates that functions with a random component are not pure. It also implies that pure functions in theory may be treated as lookup tables2.
Consequences of purity ¶
Immutable input ¶
Input arguments of a pure function must be immutable. Why is that? Let’s have a look at a (somewhat contrived) counterexample.
An impure function ¶
// input class
class MutableInput
{
int MagicNumber {get; set;}
}
int Impure(MutableInput input)
{
input.MagicNumber++; // side effect
return input.MagicNumber - 42;
}
var input = new MutableInput{MagicNumber=42};
// input.MagicNumber is 42
var output = Impure(input);
// output is 1. However, the function has also affected an external state
// input.MagicNumber is 43 now
As you can see above, having mutable input can cause side effects rendering function impure. However, we can easily prevent it making the input immutable. A fix could look like this:
Making class immutable ¶
public class NoLongerMutableInput
{
public int MagicNumber { get; }
public NoLongerMutableInput(int magicNumber)
{
MagicNumber = magicNumber;
}
}
Now the compiler doesn’t let us increment MagicNumber
. Another fix could look like this:
Making the function pure ¶
int Pure(int magicNumber)
{
magicNumber++; // no side effect here
return magicNumber - 42;
}
var input = 42;
var output = Pure(input);
// output is 1 and the input is still 42
Changing the input to int
only lets the function modify a copy of the input. No external state gets changed in either of the fixes. C# comes with a whole bunch of immutable types: strings, simple value types like int
or decimal
, enums, readonly structs, and other. You can also create immutable classes3 making sure all non-private properties are get-only, fields are readonly
, and both properties and fields are of immutable types themselves. Sometimes it might not be trivial. What about the output parameter? Once a function terminates the output becomes available to the outer scope but the function itself has already lost its ability to affect its output value. So it might not need to be immutable although being so it can be fed into another pure function with no additional work. What about no output (void
in C#)?
Pure function with no output ¶
void Useless(int magicNumber)
{
var moreMagic = magicNumber + 1;
}
Yes, it is possible to declare one. However, from the logical point of view this function doesn’t do anything. That’s why if we see a function with a void
return and immutable input in real code we can be sure this function is impure and exists only to cause side effects (in most of cases - otherwise we discover redundant code and delete it).
Pure all the way down ¶
Pure functions can call other functions only if they are pure too. Calling an impure function renders an otherwise pure function impure.
Pure is stateless ¶
Pure functions are stateless4 because a state would be external to it and any change to it would be a violation of the no side effects principle. This also indicates that
functions using yield
or async/await
are impure.
Pure doesn’t throw exceptions ¶
Yes, throwing exceptions is a side effect. Pure functions should not be using exceptions to indicate an invalid result. They could extend the output type to include invalid values5. An alternative approach could be limiting input range only to a set of valid inputs rendering the invalid ones unrepresentable.
Pure benefits ¶
Pure functions beat impure ones in a number of scenarios.
Parallelism ¶
Having no state also implies there is no shared state. It means we can easily make a pure function run in parallel without worrying about intricacies of multi-threaded programming (like thread-safe read/write access to a shared state). Having an environment capable of running, let’s say, 8 “copies” of our function at a time we could split the input by 8 and theoretically6 we’d get our results 8 times faster that in the serial flow.
Testability ¶
Pure functions usually have no dependencies. No dependencies = no mocking = less maintenance = profit. We also can parallelise our tests easier.
Comprehension ¶
Pure functions can be complex too. However, without side effects they’re easier to follow as we can focus on the “here and now” of the function. Also we can be sure the function changes no external state in the middle of execution.
Pure issues ¶
Pure functions have downsides too. The most common and problematic one is the impact of immutable input. Immutable data is often copied when it becomes a function argument (compare to passing values by value) causing excessive memory allocation. C# 7 has introduced a couple of improvements to solve the issue. The most important are readonly structs and the in operator and ref readonly locals and returns. They are a must in high performance scenarios.
Applied pure ¶
There are two common scenarios where pure functions may be useful.
Calculations ¶
Simply mathematical equations. Mostly used in science and science-heavy industries like finance, marketing, and the like.
Business rules ¶
They’re similar to maths formulas and it’s really hard to imagine a business without them. Unlike typical maths formulas business rules usually operate on non-numerical types and are governed by boolean logic.
What to do with side effects? ¶
This is a sane question. If we’re not supposed to have side effects in pure functions how can we make our program do anything? Well, the truth is we can never get rid of them. What we can do, however, is to push any effects out of the business logic towards the boundaries of a program. You can think about it like having two separate concerns of code. One which acts and depends on the one that decides what to do. The decisive one is complex and pure whereas the acting one is the opposite. This way we can get the best of both worlds.
Wrapping up ¶
We have gone through the most important aspects of pure functions. As you’ve seen they can be easily used in C# and they come with a whole lot of benefits. I hope I’ve managed to give you a clear impression of what they are and when to use them.
-
In theory pure functions could be coded as lookup tables with inputs as keys and outputs as values. It would obviously require an infinite memory. In practical applications, though, we’d hardly use the whole image (a set of all possible outputs) of a function. If we add lazy evaluation into the mix and only cache a key/value pair once it’s calculated we end up with a technique called memoization. ↩︎
-
Speaking of classes I mean classes containing data only (unlike in the object-orientated world where classes usually encapsulate data + behaviour). It implies that classes used as pure functions' input shouldn’t have any impure methods. Keep in mind we’re leaving the typical OOP principles here in favour of functional programming. That’s why you may (and you will) encounter many principles that are contradictory to the OOP principles. ↩︎
-
This is funny because memoized versions of pure functions do have state. It’s the cache holding previous results so they can be retrieved when function is called with the same parameters again. So technically there is indeed a shared state between function executions. Also note caching itself is a side effect. So although memoization can only be applied to pure functions memoized functions are no longer pure (although still deterministic). ↩︎
-
For example
System.Double
hasNaN
to indicate a result of operation is not a valid number. ↩︎ -
I said theoretically because parallelism usually comes with a certain initial performance cost. The decision to make things parallel or not should always be made after measuring performance. ↩︎