How we used Refined to improve type safety and error reporting in Scala

Introduction

A significant part of one’s journey with Scala is to figure out how to make its sophisticated type system work for them, not against them. Traits, polymorphism, type bounds, variance, typeclasses, all these have but one purpose: allow developers to encode more information into their code, so as to shift the responsibility of it doing what it is supposed to from the developers to the compiler.

Encoding more information into your types brings you closer and closer to the famous sentence you will often hear in Haskell circles: “If it compiles, it works”. Type-Oriented Programming is the generalization of this idea: provided you put enough care into defining your types, writing logic becomes a matter of writing type signatures, such as String => Option[Int]. You then only have to “fill in the blanks” by writing the actual logic that satisfies your type signatures.

As well as alleviating some cognitive load off of the developers, it also improves your tests: they no longer need to verify the coherence of your output, only the actual business logic. Your code and your tests no longer need to be as defensive, which makes them become more terse and expressive.

Today, we are going to talk about a way to refine our types to make them more expressive and increase confidence in our code, using the aptly named refined library.

Quick tour

Here is a basic overview of the basics of this library:

import eu.timepit.refined._
import eu.timepit.refined.api.Refined
import eu.timepit.refined.auto._
import eu.timepit.refined.numeric._
import eu.timepit.refined.collection._
scala> 10: Int Refined Positive
val res0: Refined[Int,Positive] = 10
scala> "": String Refined NonEmpty
^
error: Predicate isEmpty() did not fail.

First we are defining a positive integer. Validation passes, we get back our 10 with a little added bonus: its type now reflects the fact that it positive.

Then, we try refining a string to a non-empty one. We get a compilation error, because the string is actually empty.

At this point you’re probably thinking “Okay that’s cool, but that’s only useful for hardcoded values, which is to say it is useless”. And you would be right. But this is where it becomes interesting:

// same imports as before, omitted for conciseness
scala> val foo = "bar"
scala> val baz = ""
scala> refineV[NonEmpty](foo)
val res0: Either[String,Refined[String,NonEmpty]] = Right(bar)
scala> refineV[NonEmpty](baz)
val res1: Either[String,Refined[String,NonEmpty]] = Left(Predicate isEmpty() did not fail.)

Using the refineV function, we can refine arbitrary values, and get an Either back.

To make our code even more terse, we can reach for ad-hoc methods provided by the library:

import eu.timepit.refined.types.numeric.NonNegInt
scala> NonNegInt.from(3)
res0: Either[String,NonNegInt] = Right(3)

Later in this article, we will see how to compose multiple refinements in order to perform the validation of a complex data structure, while not discarding any error message in the event that multiple predicates fail.

Our use case

The use case we decided to use refined for is very straightforward, and is something any developer will often face: we get data from upstream (in our case, from a Kafka topic), indirectly provided by a source we cannot trust (in our case, the front-end).

We do not want our refinement process to halt as soon as the first error is encountered. If a payload is deemed invalid for multiple reasons, we want to know all these reasons so that we can act upon them swiftly and fix the upstream issues.

For this reason, a simply monadic composition of a chain of Either does not fit our use case. In other-words, a “railway-oriented” validation that stops at the first failing predicate would not be sufficient.

How we use refined

Let’s first address the problem of accumulating parsing/validation errors. We do not want to bind validation steps one after the other in a monadic fashion, because it would mean only ever returning the first error message. Instead, we want all these steps to be —conceptually speaking— on the same level. In fact, whether they are executed one after the other or in parallel should only be an implementation detail. We should run them all, then aggregate their result into either the desired structure if everything went well, or a list of errors otherwise.

If you have been using functional programming for long enough, your instincts are likely tingling at this point and you are getting ready to reach for an Applicative.

That is exactly what we did with ValidatedNec from the cats library (the reason to use Nec instead of Nel has to do with the time complexity of “append” operations, and are covered in more detail in the cats documentation).

Luckily for us, refined provides a refined-cats extension, that allows the validation steps to return ValidatedNec[String, A] instead of Either[String, A]:

import cats.data.ValidatedNec
import eu.timepit.refined.types.numeric.NonNegInt
import eu.timepit.refined.cats.syntax._
scala> NonNegInt.validateNec(3)
res0: Validated[NonEmptyChain[String],NonNegInt] = Valid(3)

Let’s now reach for our familiar mapN from cats, and combine our validation steps:

import cats.data.ValidatedNec
import cats.implicits._
import eu.timepit.refined.cats.syntax._
import eu.timepit.refined.refineV
import eu.timepit.refined.api.Refined
import eu.timepit.refined.boolean.Or
import eu.timepit.refined.string.{IPv4, IPv6}
import eu.timepit.refined.numeric.NonNegative
import eu.timepit.refined.collection.{Exists, NonEmpty}
import eu.timepit.refined.types.numeric.NonNegInt
import eu.timepit.refined.types.string.NonEmptyString
final case class Payload(projectId: Int, userId: String, ipAddresses: List[String])
final case class SafePayload(
projectId: Int Refined NonNegative,
userId: String Refined NonEmpty,
ipAddresses: List[String] Refined Exists[Or[IPv4, IPv6]],
)
def refinePayload(payload: Payload): ValidatedNec[String, SafePayload] =
(
NonNegInt.validateNec(payload.projectId),
NonEmptyString.validateNec(payload.userId),
refineV[Exists[Or[IPv4, IPv6]]](payload.ipAddress).toValidatedNec,
).mapN(SafePayload.apply)

Here, we can see a refined description of our payload (SafePayload), and the method responsible for turning a raw Payload into a refined one.

Our method refinePayload does exactly what we need: it applies all the predicates to the input data so as to validate it, but also to return more specific types that we will be able to operate on more safely going forward. It will also make our type signatures more explicit. And since we are using an Applicative, all the refinement steps will always run, and in case some of them fail, their errors will be accumulated in a NonEmptyChain.

The ipAddresses field showcases how we can compose multiple predicates to express the properties our data must satisfy. Here, ipAddresses is a collection that must contain at least one string that is either an IPv4 or an IPv6.

It is worth noting that due to the complex nature of this type, we had to reach for our friend refineV again.

Conclusion

As we have seen, refined types are of tremendous value when it comes to increasing confidence in our code, and can often do away with all the validation steps by encoding predicates into the types themselves. It provides us with a way of ensuring data coming into our codebase and flowing through it satisfies said predicates, at the cost of very little ceremony.

Refining types during parsing not only ensures that we cannot forget about the validation step, but it also gives us more precise, constrained types to use all throughout our code. If you are interested in reading more about this idea, I cannot recommend you this article enough: Parse, don’t validate.

We should keep in mind however, that more is not always better when it comes to typing. Sometimes, it is better to keep a type generic or to generalize it back to something simpler (for example, unwrapping aList[Int] Refined Xor[MinSize[10], Exists[Positive]] back into a List[String]) when the downstream logic does not actually need that type to be refined.

Otherwise, your logic becomes tightly knit with what you are doing at the moment, and you lose the reusability that is so dear to functional programmers.

A rule of thumb is: always encode what your logic needs, not just everything you can. (And don’t be afraid to loosen constraints down the line, but that’s not quite as catchy).