Tuesday, June 18, 2013

Implementing an external service lookup cache

We have a form in our web application that takes an address. The field is an auto-complete field and as the user types their address, the form shows them matching addresses. There is a web service that is used in this organization to help with address lookup. This service takes a string (partial address) and returns a list of addresses that start with that particular string, up to a maximum of 1000 addresses. I wanted to implement a cache of this look up and it turned out to be an interesting little problem. How do we cache the addresses and more importantly, how can we figure out if we should hit the server or the cache?

An initial thought was that we could store all addresses from the service in a HashSet to avoid duplicates. We could check if there are any items in the cache that start with the search string and if so, we can return those items. Here's what this lookup class might look like.

public class AddressLookup
{
 public static AddressLookup Instance { get; private set; }

 private ISet<AddressType> Addresses
 {
  get
  {
   lock (this)
   {
    if (HttpRuntime.Cache["Addresses"] == null)
    {
     HttpRuntime.Cache["Addresses"] = new HashSet<AddressType>();
    }
    return (ISet<AddressType>)HttpRuntime.Cache["Addresses"];

   }
  }
 }

 static AddressLookup()
 {
  Instance = new AddressLookup();
 }

 public IEnumerable<AddressType> GetByPartialAddress(string searchAddress)
 {
  lock (this)
  {
   searchAddress = searchAddress.ToUpper();
   var cachedAddresses = Addresses.Where(address => address.FormattedAddress.StartsWith(searchAddress, StringComparison.InvariantCultureIgnoreCase)).ToArray();
   if (cachedAddresses.Length > 0)
   {
    return cachedAddresses;
   }

   var addresses = new AddressManager().RetrieveByPartialAddress(searchAddress);
   Addresses.UnionWith(addresses);
   return addresses;
  }
 }
}

This idea works, but consider a worst case performance issue of traversing a large collection of values, not finding the value and then making a call to the web service after all. Not to mention the issue of having an incomplete cache of values due to the web service only returning the top 1000 results. For an illustration of the issue consider a web service that only returns the top 2 results.

Total data set1, 12, 122
Uncached results for search string "1"1, 12
Uncached results for search string "12"12, 122
Cached results for search string "12" after searching string "1"12

To solve this issue I decided in addition to caching the address list, I would also cache the searches. This will fix both issues, the first by improving cache check performance, the second by returning a better result set from the cache. Below is the implementation:

public class AddressLookup
{
 public static AddressLookup Instance { get; private set; }

 private ISet<string> Searches
 {
  get
  {
   lock (this)
   {
    if (HttpRuntime.Cache["AddressSearches"] == null)
    {
     HttpRuntime.Cache["AddressSearches"] = new HashSet<string>();
    }
    return (ISet<string>)HttpRuntime.Cache["AddressSearches"];
   }
  }
 }

 private ISet<AddressType> Addresses
 {
  get
  {
   lock (this)
   {
    if (HttpRuntime.Cache["Addresses"] == null)
    {
     HttpRuntime.Cache["Addresses"] = new HashSet<AddressType>();
    }
    return (ISet<AddressType>)HttpRuntime.Cache["Addresses"];

   }
  }
 }

 static AddressLookup()
 {
  Instance = new AddressLookup();
 }

 public IEnumerable<AddressType> GetByPartialAddress(string searchAddress)
 {
  lock (this)
  {
   searchAddress = searchAddress.ToUpper();
   if (Searches.Contains(searchAddress))
   {
    return Addresses.Where(address => address.FormattedAddress.StartsWith(searchAddress, StringComparison.InvariantCultureIgnoreCase));
   }

   Searches.Add(searchAddress);
   var addresses = new AddressManager().RetrieveByPartialAddress(searchAddress);
   Addresses.UnionWith(addresses);
   return addresses;
  }
 }
}

The next potential for improvement would be to implement a rolling cache. One that only keeps the n most recent searches. I haven't taken this implementation that far, but a potential implementation might be to associate the search results on a per search basis in a dictionary, rather than keeping a global address list. Let me know how you might implement a rolling cache or if you have any improvement suggestions with my implementation.