Pattern matching in Java with the Visitor pattern

February 11, 2015
I once read an essay—I cannot find it now—that talked about how learning a more advanced programming language can improve your coding even in a language that lacks those features because those language features are really ways of thinking, and there’s no reason you cannot think in terms of more powerful abstractions just because you need to express them in a more limited language. The phrase that stuck with me was that you were “writing into” that language, rather than writing in the language as it’s meant to be used. At Wealthfront, while the majority of our backend code is Java, we use a variety of methods that originate in functional languages. We’ve written before about our Option. This article is about pattern matching in Java. I’m going to take a digression into explaining what pattern matching is and why it’s so fantastic. If you like, you can also skip ahead to the actual Java examples

Inspiration from post-Java languages

Pattern matching is a feature common in modern functional languages that allows a structure similar to switch but on the type of the argument. For example, let’s take a base class that might have two subclasses, and we want to write logic that handles the two subclasses differently. An example might be a payment record that varies in type according to the method of payment (e.g. Bitcoin vs. credit card). Or maybe an owner that varies depending on whether it represents a single user or a group. This is useful for any class hierarchy representing some sort of data that has a base set of fields and subclasses that may have other data. The visitor pattern was around before Haskell and Scala wowed everyone with pattern matching, but seeing pattern matching makes it easier to see why it’s useful.

Scala pattern matching

Scala supports a match operator that concisely expresses the idea of switching on the type of the object.
object patterns {
  abstract class Foo
  class Bar(val bar: Int) extends Foo
  class Baz(val baz: String) extends Foo

  def handle(f: Foo) =
    f match {
      case b: Bar => b.bar
      case b: Baz => b.baz
    }
  
  handle(new Bar(42))                       //> res0: Any = 42
  handle(new Baz("Luhrmann"))               //> res1: Any = Luhrmann
}
Code language: Scala (scala)
What does this do? First, we have an abstract class Foo with two subtypes Bar and Baz. These have different parameters. Bar stores an Int, Baz a String. Then we have a handle method that uses match to extract the appropriate field from either of these. Each matched case could have whatever logic you may need, and the type of b in both of these is the specific subclass, not just Foo.

Swift enums

Swift offers this same functionality through enums, which behave more like types than values.
enum Foo {
    case Bar(bar: Int)
    case Baz(baz: String)
}

func handle(f: Foo) -> Any {
    switch f {
    case .Bar(let bar):
        return bar
    case .Baz(let baz):
        return baz
    }
}

handle(.Bar(bar: 42))
handle(.Baz(baz: "Luhrmann"))
Code language: Swift (swift)
The syntax differs, but it works out to the same. The let keyword is a helpful reminder of what’s going on here — a new variable is created holding the same value that was in f but now it’s of the specific type.

Simpler java solutions

instanceof

One simple solution that comes to mind is to use instanceof.
abstract class BadFoo {

  static class Bar extends BadFoo {
    int bar;

    private Bar(int bar) {
      this.bar = bar;
    }

  }

  static class Baz extends BadFoo {
    String baz;

    private Baz(String baz) {
      this.baz = baz;
    }

  }

  static void handle(BadFoo f) {
    if (f instanceof Bar) {
      Bar b = (Bar) f;
      System.out.println("I have " + b.bar + " bars");
    } else if (f instanceof Baz) {
      Baz b = (Baz) f;
      System.out.println("I have the " + b.baz + " baz");
    }
  }

  public static void main(String[] args) {
    handle(new Bar(42));
    handle(new Baz("Luhrmann"));
  }
}
Code language: Java (java)
There are a few problems with this. 1. because Hibernate wraps the class in a proxy object, and then intercepts method calls to make sure the right code is called, objects loaded from Hibernate will never be instances of the derived type, and this will not work. 2. Correctness is not enforced by the compiler. It’s perfectly valid Java to say if (f instanceof Bar) Baz b = (Baz) f;, but it will fail every time. 3. Lastly, there is no way to ensure completeness. There’s nothing I can do to the existing code to make sure that it gets updated when someone adds a new subtype Qux.

Moving logic into the class

Another solution is to embed this logic in the class, like OOP says we should.

abstract class OoFoo {
  abstract void handle();

  static class Bar extends OoFoo {
    private int bar;

    private Bar(int bar) {
      this.bar = bar;
    }

    @Override
    void handle() {
      System.out.println("I have " + bar + " bars");
    }
  }

  static class Baz extends OoFoo {
    private String baz;

    private Baz(String baz) {
      this.baz = baz;
    }

    @Override
    void handle() {
      System.out.println("I have the " + baz + " baz");
    }
  }

  public static void main(String[] args) {
    new Bar(42).handle();
    new Baz("Luhrmann").handle();
  }
}
Code language: Java (java)
This works fine when it’s one method, or a few, but what happens when it grows to dozens? It means that your data objects start to be the location of all your business logic. If that’s your style, that’s okay and you have a lot of company, but I find it difficult to think about things this way. For example, if I want to verify the identity of owners of individual, joint, and custodial accounts, I could put a “verify identity” method in the AccountOwner type, but I’d prefer to create a single IdentityVerifier class that encapsulates all the business logic about verifying identity. The visitor pattern fits in a model where data objects are simple and business logic in primarily implemented in various processor or “noun-verber” classes. Another issue with business logic in the data class is that it makes it harder to mock for testing. With a processor interface, it’s easier to mock it and return whatever data you want. With business logic in the class, you need to either set up the class so that your data actually satisfies all those rules, or you need to override the methods to return what you want. It makes it harder than it should be to write a test saying something like “accounts whose owners can be identified may be opened immediately”.

The visitor pattern in Java

Basic Visitor

The basic visitor pattern in java consists of the following:
  • An abstract base class with an abstract method match or visit taking a parameterized Visitor.
  • A parameterized Visitor class with a case method for each subclass.
  • Subclasses of the base class that each call the appropriate method of the visitor.
  • Application code that creates an anonymous instance of the visitor implementing whatever behavior is desired for that case.
abstract class Foo {
  abstract <T> T match(Visitor<T> visitor);

  interface Visitor<T> {
    T caseBar(Bar b);
    T caseBaz(Baz b);
  }

  static class Bar extends Foo {
    final int bar;

    private Bar(int bar) {
      this.bar = bar;
    }
    
    @Override
    <T> T match(Visitor<T> visitor) {
      return visitor.caseBar(this);
    }
  }

  static class Baz extends Foo {
    final String baz;

    private Baz(String baz) {
      this.baz = baz;
    }
    
    @Override
    <T> T match(Visitor<T> visitor) {
      return visitor.caseBaz(this);
    }
  }

  static void handle(Foo f) {
    System.out.println(f.match(new Visitor<String>() {
      @Override
      public String caseBar(Bar b) {
        return "I have " + b.bar + " bars";
      }
      @Override
      public String caseBaz(Baz b) {
        return "I have the " + b.baz + " baz";
      }
    }));
  }

  public static void main(String[] args) {
    handle(new Bar(42));
    handle(new Baz("Luhrmann"));
  }
}
Code language: Java (java)

Default visitor

It’s sometimes useful to have special logic for some of the subclasses, and a default value for others. This can make the code more readable because it removes boilerplate which isn’t part of what the code is trying to accomplish. You can do this with an implementation of the interface that supplies a default value for anything not overridden.
class DefaultFooVisitor<T> implements Foo.Visitor<T> {
  T defaultValue;

  private DefaultFooVisitor(T defaultValue) {
    this.defaultValue = defaultValue;
  }

  @Override
  public T caseBar(Bar b) {
    return defaultValue;
  }

  @Override
  public T caseBaz(Baz b) {
    return defaultValue;
  }

  static int countBars(Foo f) {
    return f.match(new DefaultFooVisitor<Integer>(0) {
      @Override
      public Integer caseBar(Bar b) {
        return b.bar;
      }
    });
  }
}
Code language: Java (java)
The downside of this pattern is that updating default visitors can be overlooked when a new case is added. One way to handle this in practice is while adding the new case, make the default visitor abstract without implementing the new case, review all the code that breaks, and once satisfied that the behavior is correct, adding in the default implementation for the new case.

Void or Unit return values

We generally define our visitors as being parameterized by a return type, but sometimes no return value is needed. At Wealthfront we have a Unit type with a singleton Unit.unit value to represent a return value that isn’t meaningful, but java.lang.Void is also used.
class PartitionFoos {
  void doStuff(Collection<Foo> foos) {
    final List<Bar> bars = new ArrayList<>();
    final List<Baz> bazes = new ArrayList<>(); 
    for (Foo f : foos) {
      f.match(new Visitor<Void>() {

        @Override
        public Void caseBar(Bar b) {
          bars.add(b);
          return null;
        }

        @Override
        public Void caseBaz(Baz b) {
          bazes.add(b);
          return null;
        }
        
      });
    }
  }

}
Code language: Java (java)
I’ve used Void in this example to avoid intordu I feel compelled to link to a discussion of why this is not ideal from a functional perspective: void vs. unit.

Destructuring pattern matching

These make up all that you likely need and probably 90% of our use of the visitor pattern, but there’s one more item that is worth mentioning. My Scala example above doesn’t actually show the full power of pattern matching, because I’m just matching on the type. With case classes, or with a custom unapply method, I can actually match not just on the types of the objects, but on details of their internal structure. For example, using types similar to what I used before, here’s a version that treats anything above 10 as “many”.
object patterns {
  abstract class Foo
  case class Bar(val bar: Int) extends Foo
  case class Baz(val baz: String) extends Foo

  def handle(f: Foo) =
    f match {
      case Bar(i) if (i > 10) => 10
      case Bar(i) => i
      case Baz(s) => s
    }
  
  handle(new Bar(42))                             //> res0: Any = 10
  handle(new Bar(3))                              //> res0: Any = 3
  handle(new Baz("Luhrmann"))                     //> res1: Any = Luhrmann
}
Code language: Scala (scala)
Since this is a language feature in Scala, it’s flexible and easy to use. You can simulate the same behavior in Java, but you need to encode the cases that are allowed into the visitor itself.
abstract class Foo {
  abstract <T> T match(Visitor<T> visitor);

  interface Visitor<T> {
    T caseManyBar();
    T caseBar(int i);
    T caseBaz(String s);
  }

  static class Bar extends Foo {
    private int bar;

    private Bar(int bar) {
      this.bar = bar;
    }
    
    @Override
    <T> T match(Visitor<T> visitor) {
      return bar > 10 ? visitor.caseManyBar() : visitor.caseBar(bar);
    }
  }

  static class Baz extends Foo {
    private String baz;

    private Baz(String baz) {
      this.baz = baz;
    }
    
    @Override
    <T> T match(Visitor<T> visitor) {
      return visitor.caseBaz(baz);
    }
  }

  static void handle(Foo f) {
    System.out.println(f.match(new Visitor<String>() {
      @Override
      public String caseBar(int i) {
        return "I have " + i + " bars";
      }
      @Override
      public String caseManyBar() {
        return "I have many bars";
      }
      @Override
      public String caseBaz(String s) {
        return "I have the " + s + " baz";
      }
    }));
  }

  public static void main(String[] args) {
    handle(new Bar(42));
    handle(new Bar(3));
    handle(new Baz("Luhrmann"));
  }
}
Code language: Java (java)
In some sense, 16 vs. 58 lines is a big difference, but you could also argue that 42 lines of additional boilerplate to simulate this powerful functionality in Java is worth it. This destructuring pattern matching is most useful for value types. That is, objects that just represent collections of data but don’t have any other identity attached to them. For entity types, that represent something that is defined as itself regardless of what values it currently has, it’s better to use the basic pattern matching.

Is this really pattern matching, and how useful is it?

Some might object that this isn’t “really” pattern matching, and I would agree. Pattern matching is a language level feature that allows you to operate on subclasses in a type-safe way (among other things). The type-safe visitor pattern allows you to operate on subclasses in a type-safe way even without language support for pattern matching. As to its utility, I can say that we use it extensively at Wealthfront, and once people become familiar with it, it’s great. Pretty much every polymorphic entity will have a visitor, which makes it safe for us to add new subtypes, since the compiler will let us find all the places we need to make sure it’s handled. Visitors on value types, especially destructuring visitors, are much less common. We use it in a few places for things like Result objects that represent a possible result or error. Give it a try the next time you run into a ClassCastException in your code.