aggra-guide

Memoization

At a previous job, my organization used an architecture that relied heavily on memoization. Because of how we used it, it crippled our ability to make quick changes in a modular fashion. This wiki aims to describe the properties of memoization and what ultimately failed for us.

Definition

simple-memoization

What do I mean by “memoization”? Wikipedia defines it here. To summarize, take a look at the diagram above. It’s a call graph (all arrows are in the direction of the request/call). It shows some unknown caller calling function A. A calls two other functions, B and C, with inputs B’ and C’, respectively. At that point, B and C call some arbitrary network of function calls. The networks can be completely separate or overlap in some way… it doesn’t matter. At some point, each of those networks call function Z separately but with the same input Z’. Memoization is the feature such that function Z only runs once. If in the future, another function calls Z within the same operational context with the same input, Z will return the response it previously calculated (or is currently in the process of calculating). (“Operational context” means the conditions under which the same input-to-output mapping/memory is expected to apply.)

Memoization Motivations

Memoization is used primarily for four reasons:

  1. Cost – a function call may be expensive: high CPU, high latency, or really any large resource usage. Memoization helps reduce this cost by only performing the function at most once within the same operational context.
  2. Modularity – different code locations may want to be kept as isolated as possible. In the diagram above, imagine that each caller to Z wants to call Z arbitrarily, without negotiating with any of the other callers. Memoization helps allow this, so long as each caller ends up making the same call when expected.
  3. Latency – similar to the last point, by avoiding negotiation, by starting the call as quickly as possible, and by making use of any previous calls; each caller tends to get its result as quickly as possible.
  4. Data consistency – data can change over time. For a given operational context, a system may desire that the same snapshot-in-time of the data is used. Memoization can help achieve this goal by avoiding multiple lookups or computations of that data.

Problems

I’m going to argue that memoization can go wrong when there’s an over-reliance on it working and when factors play together to make it more likely to fail. In the discussion that follows, I’m going to assume that the memoization framework itself works perfectly (i.e. given the same input, the function will only run once) and instead focus on when constructing the input itself can diverge.

When it’s critical that a function should perform its logic only once in a given operational context, that means it’s critical that every caller agree what the input should be. Each caller that uses a different input will result in the function performing its logic another time. A good example to look at is making an external service call. Different code locations may make the same service call and rely on memoization to make sure the external service is called only once. What can happen when things go wrong? For the caller, multiple calls might cause increased latency and data-inconsistencies. For the service called, multiple calls might overwhelm them: each different input from a caller would result in an increasing multiplication factor of services calls: two different inputs could mean 2 times the calls, 3 could mean 3, and so on.

What are the factors that could cause the inputs to be different in the same operational context?

(An interesting thing to note here is that modularity is both one of the reasons that memoization is used and one of the reasons that it can fail.)

Mitigations

What are some of the different ways we can mitigation these problems?

Simple, Obvious

A simple, obvious step would be to focus on each factor and reduce its individual impact.

Call Beforehand/No Memoization

An extreme approach would be to abandon memoization entirely and call the function beforehand. This diagram illustrates this approach:

no-memoization

Here, A calls Z and then passes the result along to B and C, which pass it along to their network. Whoever needs the result can use it. Now there’s only a single place where Z is called.

Proxy

Another mitigation that addresses multiple problems is to introduce proxies in front of the function call. Here’s a diagram explaining how this would work:

proxy-memoization

Whereas the B and C networks were previously calling Z directly, they’re now calling the proxy P. The hope is that even if P is called with different inputs Z’ and Z’’, P will somehow be able to massage them into a common Z’’ input.

An example where this setup would work is well when there are going to be known changes to the input of function Z, say adding a new field. P can continue calling Z without that new field. In any call with that new field added, P will strip the field before passing it along to Z. In this way, P will allow callers to add the field to their calls at their discretion. Only when P is actually sure that all callers have been updated to add the new field will it be modified to pass the field along in its call to Z as well.

Getting Rid of Input with Potentials

Another extreme approach, but still using memoization, is to get rid of the input entirely. This diagram illustrates this approach:

no-input-memoization

Here, A calls a new function P with input Z’. P returns a new function Z* which, when invoked, will call Z with Z’. The best word I’ve found to describe Z* is a “potential”, in that it potentially calls Z. Z* is passed down to B and C and eventually to B’s and C’s networks. Those networks can now invoke Z* with no arguments. Z*, with memoization enabled, will only ever run its logic once.

Using potentials is very similar to the previous approach, with one key difference: potentials keep the choice of the conditions when to call Z in the hands of Z’s individual callers rather than centralizing them.

Have a Centralized Executor Execute the Dependency Graph

This is another extreme approach. Imagine the original diagram above as a dependency graph rather than a call graph. Then, imagine a centralized executor execute the graph. Clients are only responsible for defining the nodes and the wiring between nodes. The executor is in charge of executing the graph, including memoization. Using a centralized executor makes sure that the passing of arguments is controlled by a single actor.

For example, imagine that a client requests node A. The executor realizes B and C must be executed first… but before that, B’s and C’s networks must be executed first… but before that, Z must be executed first. So, the executor calls Z and then uses the result to execute the necessary nodes in B’s and C’s networks. Then, the executor executes nodes B and C with the results. Finally, the executor executes node A.

This approach has similar pros and cons as potentials. I’ll only call-out the differences here:

Common Theme

A common theme in almost all the mitigations I’ve looked at is that there is a loss in modularity/independence and an increase in ownership. Some central team must be responsible for deciding what it means to call the expensive function (e.g. who owns the proxy in the proxy mitigation or the potential definition in the potential mitigation). The only thing that differs is the extent to which they take ownership.

Anecdote: Architecture at Previous Job

A good example of how memoization can go wrong is the architecture at my previous job.

old-architecture

This architecture was based around something I’ll call “modules”. For example, my team might own a module that retrieved order information based on an order id. Now, my team owned a large web page, and given the size of that system, we wanted to encourage external teams to take ownership of their own modules. This ownership included the ability to manage deployments of their modules separately from ours. For example, if an external team wanted to add a new widget to a web page, they would create and own their own module. This external team module would read the top-level request, call our order module to retrieve order information, and then use that order information to create their widget. So, there might be some n number of externally-owned modules from m external teams calling our order module in order to collaboratively create the web page that we owned. That order module was expensive, too. We could only make one call to it per top-level request, so each module implicitly had to cooperate to make sure memoization worked.

This didn’t work well. Let’s take a look at all of the factors listed above to see why:

All factors considered, the architecture’s reliance on memoization and the properties it had made it extremely fragile and incapable of meeting the goals laid out for it. There are lots of good lessons in here about how the lure of memoization can end up hurting you if applied naively.