Memoization in Groovy

Memoization is a form of caching.  Judging by the article on Wikipedia and other sites, memoization is very useful for recursive calculations, though it can be used anywhere you need to store previously calculated data based on specific input.  It’s important to note that memoization should be used differently than caching (in the context of Spring Cache, for example).  Caching generally implies that that data will eventually expire or change in the future, whereas memoization is geared towards things that will always be true.  5! will always be 120, whereas the result of a call to a remote system to get user information based on a userId may change.  Groovy has had closure memoization since version 1.8.  Method memoization was recently added in version 2.2.  Looking at the source, it appears that method memoization uses the existing memoization capability of closures.  The @Memoized annotation just wraps the method call in a memoized closure.

Here is a basic example using closures:

def myClosure = { Integer x ->
    println "My Closure argument $x"
}.memoize()

myClosure 3
myClosure 4
myClosure 3
myClosure 4

And the resulting output:

My Closure argument 3
My Closure argument 4

You can see that the code inside the closure was executed only once for each distinct input parameter.

More people will be interested in method memoization.  A common recursive calculation is the factorial of a number.  If you’ve already calculated 4!, calculating 5! should only involve one multiplication operation if you have your method memoized since 5! == 4! * 5.

Here’s an example:

@Memoized
static Long factorial(Long f) {
  println "Factorial for $f"
  if (f == 0 || f == 1) {
       return 1
   }
   factorial(f - 1) * f
}

Now to run this code:

println MyMath.factorial(4)
println MyMath.factorial(3)
println MyMath.factorial(5)

And the output:

Factorial for 4
Factorial for 3
Factorial for 2
Factorial for 1
24
6
Factorial for 5
120

Most of the work was done when calculating 4! so that when calculating 3!, nothing extra had to be done.  When calculating 5!, only one extra operation had to be performed.

Fibonacci sequences exemplify this even more.  Here is the method I wrote:

static Long fib(Long f) {
   println "Fibonacci for $f"
   if (f <= 1) {
       return f
   }
   fib(f - 1) + fib(f - 2)
}

The non-memoized output is as follows for fib(5):

Fibonacci for 5
Fibonacci for 4
Fibonacci for 3
Fibonacci for 2
Fibonacci for 1
Fibonacci for 0
Fibonacci for 1
Fibonacci for 2
Fibonacci for 1
Fibonacci for 0
Fibonacci for 3
Fibonacci for 2
Fibonacci for 1
Fibonacci for 0
Fibonacci for 1

Notice the many repeated calls.  Adding the @Memoized annotation produces this output:

Fibonacci for 5
Fibonacci for 4
Fibonacci for 3
Fibonacci for 2
Fibonacci for 1
Fibonacci for 0

A lot of computational savings here!

The @Memoized annotation also has a way of limiting the size of the cache by adding in the attribute “maxCacheSize”.  The eviction strategy is based on a last recently used algorithm.  Here is an example:

@Memoized(maxCacheSize = 3)
static Integer myCalc(Integer input) {
   println "myCalc for $input"
   return input
}

Here is a sample execution:

println MyMath.myCalc(1) //1 is added to cache
println MyMath.myCalc(2) //2 is added to cache
println MyMath.myCalc(3) //3 is added to cache
println MyMath.myCalc(1) //1 is read from the cache
println MyMath.myCalc(4) //4 is added, 2 is removed (it’s the last recently used value)
println MyMath.myCalc(1) //1 is read from cache
println MyMath.myCalc(1) //1 is read from cache
println MyMath.myCalc(2) //2 is added, 3 is removed
println MyMath.myCalc(5) //5 is added, 4 is removed
println MyMath.myCalc(1) //1 is read from the cache.  1, 2, and 5 remain in the cache

The resulting output:

myCalc for 1
1
myCalc for 2
2
myCalc for 3
3
1
myCalc for 4
4
1
1
myCalc for 2
2
myCalc for 5
5
1

Setting the maximum cache size is useful if you need to limit the amount of memory used, but could also be useful if you know that the further you get along in a calculation, you won’t need previously calculated values after you’ve reached a certain point.

There is also a “protectedCacheSize” attribute that acts as a “minimum cache size” value.  The cache will hold on to a minimum number of values and keep them from being garbage collected.  Closure memoization has similar ways to size your cache if needed.

Use memoization on:

  1. Calculations that are slow (slower than looking up a value in a map).
  2. Methods that only depend on the input to calculate a value.
  3. Methods that do not have side effects (i.e. update a database, write to a file, etc.).
  4. Calculations that ALWAYS return the same value based on a given input.

Use other caching techniques when:

  1. Data may change in the future (i.e. a person’s age, odds the Vikings will win the Super Bowl, etc.).
  2. More control of the cache is needed (i.e. spooling to disk, eviction strategy, etc.)

One thought on “Memoization in Groovy

  1. Mikhail says:

    I like this post, especially the idea to use memoization in recursion.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

*