How we used Refined to improve type safety and error reporting in Scala
Introduction
A significant part of one’s journey with Scala is to figure out how to make its sophisticated type system work for them, not against them. Traits, polymorphism, type bounds, variance, typeclasses, all these have but one purpose: allow developers to encode more information into their code, so as to shift the responsibility of it doing what it is supposed to from the developers to the compiler.
Encoding more information into your types brings you closer and closer to the famous sentence you will
often hear in Haskell circles: “If it compiles, it works”. Type-Oriented Programming is the
generalization of this idea: provided you put enough care into defining your types, writing logic
becomes a matter of writing type signatures, such as String => Option[Int]
.
You then only have to “fill in the blanks” by writing the actual logic that satisfies your type
signatures.
As well as alleviating some cognitive load off of the developers, it also improves your tests: they no longer need to verify the coherence of your output, only the actual business logic. Your code and your tests no longer need to be as defensive, which makes them become more terse and expressive.
Today, we are going to talk about a way to refine our types to make them more expressive and increase confidence in our code, using the aptly named refined library.
Quick tour
Here is a basic overview of the basics of this library:
First we are defining a positive integer. Validation passes, we get back our 10
with a little
added bonus: its type now reflects the fact that it positive.
Then, we try refining a string to a non-empty one. We get a compilation error, because the string is actually empty.
At this point you’re probably thinking “Okay that’s cool, but that’s only useful for hardcoded values, which is to say it is useless”. And you would be right. But this is where it becomes interesting:
Using the refineV
function, we can refine arbitrary values, and get an Either
back.
To make our code even more terse, we can reach for ad-hoc methods provided by the library:
Later in this article, we will see how to compose multiple refinements in order to perform the validation of a complex data structure, while not discarding any error message in the event that multiple predicates fail.
Our use case
The use case we decided to use refined for is very straightforward, and is something any developer will often face: we get data from upstream (in our case, from a Kafka topic), indirectly provided by a source we cannot trust (in our case, the front-end).
We do not want our refinement process to halt as soon as the first error is encountered. If a payload is deemed invalid for multiple reasons, we want to know all these reasons so that we can act upon them swiftly and fix the upstream issues.
For this reason, a simply monadic composition of a chain of Either
does not fit our use case.
In other-words, a “railway-oriented” validation that stops
at the first failing predicate would not be sufficient.
How we use refined
Let’s first address the problem of accumulating parsing/validation errors. We do not want to bind validation steps one after the other in a monadic fashion, because it would mean only ever returning the first error message. Instead, we want all these steps to be —conceptually speaking— on the same level. In fact, whether they are executed one after the other or in parallel should only be an implementation detail. We should run them all, then aggregate their result into either the desired structure if everything went well, or a list of errors otherwise.
If you have been using functional programming for long enough, your instincts are likely tingling at this point and you are getting ready to reach for an Applicative.
That is exactly what we did with ValidatedNec
from the cats library (the reason to use Nec
instead of Nel
has to do with the time complexity of “append” operations, and are covered in more detail in the
cats documentation).
Luckily for us, refined
provides a refined-cats
extension, that allows the validation steps
to return ValidatedNec[String, A]
instead of Either[String, A]
:
Let’s now reach for our familiar mapN
from cats, and combine our validation steps:
Here, we can see a refined description of our payload (SafePayload
), and the method responsible
for turning a raw Payload
into a refined one.
Our method refinePayload
does exactly what we need: it applies all the predicates to the input data
so as to validate it, but also to return more specific types that we will be able to operate on more
safely going forward. It will also make our type signatures more explicit. And since we are using an
Applicative, all the refinement steps will always run, and in case some of them fail, their errors
will be accumulated in a NonEmptyChain
.
The ipAddresses
field showcases how we can compose multiple predicates to express the
properties our data must satisfy. Here, ipAddresses
is a collection that must contain at least one
string that is either an IPv4 or an IPv6.
It is worth noting that due to the complex nature of this type, we had to reach for our friend
refineV
again.
Conclusion
As we have seen, refined types are of tremendous value when it comes to increasing confidence in our code, and can often do away with all the validation steps by encoding predicates into the types themselves. It provides us with a way of ensuring data coming into our codebase and flowing through it satisfies said predicates, at the cost of very little ceremony.
Refining types during parsing not only ensures that we cannot forget about the validation step, but it also gives us more precise, constrained types to use all throughout our code. If you are interested in reading more about this idea, I cannot recommend you this article enough: Parse, don’t validate.
We should keep in mind however, that more is not always better when it comes to typing.
Sometimes, it is better to keep a type generic or to generalize it back to something simpler (for
example, unwrapping aList[Int] Refined Xor[MinSize[10], Exists[Positive]]
back into a
List[String]
) when the downstream logic does not actually need that type to be refined.
Otherwise, your logic becomes tightly knit with what you are doing at the moment, and you lose the reusability that is so dear to functional programmers.
A rule of thumb is: always encode what your logic needs, not just everything you can. (And don’t be afraid to loosen constraints down the line, but that’s not quite as catchy).