Maintaining the HTML cache clearer

The HTML cache clearer does exactly what its name implies: it clears the Sitecore HTML caches.  More specifically, it clears Sitecore’s HTML caches upon observing a publish end event.  More specifically still, it clears the HTML cache of all websites that are registered with the HTML cache clearer upon observing a publish end event.  There are quite a few posts around about how performance of the HTML cache can be improved (*cough* John West *cough*), but one thing that people don’t seem to mention often is how annoying it is to maintain the configuration correctly.  This post outlines the method that I prefer to use, and it works pretty well for me so hopefully someone else will find it useful too.

In a nutshell, the HTML cache is analogous to the ASP.Net output cache, when used on individual controls (indeed, in some places Sitecore refers to it as the ‘output cache’, but I prefer to use ‘HTML cache’ to avoid confusion).  In Sitecore terms, the HTML cache takes the rendered HTML of a particular rendering and stores it in memory until it is next required.  The next time that rendering is needed on a page, Sitecore will use the cached HTML rather than instantiating the rendering.  On top of this, Sitecore also understands various “vary by” directives to grant better control over how your HTML gets cached.  For more information, I’d suggest referring to the Sitecore documentation – this cache has been around for a very long time, and it’s pretty well documented.  The key thing to understand is that, unlike the ASP.Net output cache, HTML fragments do not expire by default (though this can be configured in a variety of ways).

The HTML cache clearer is managed by the following snippet of configuration in the <events> section of Sitecore’s configuration:

<handler type="Sitecore.Publishing.HtmlCacheClearer, Sitecore.Kernel" method="ClearCache">
  <sites hint="list">
    <site>website</site>
  </sites>
</handler>

As you can see, it contains a single entry for website by default – this is the name of the default site that Sitecore ships with.  When invoked, Sitecore will iterate over the registered websites and clear the HTML cache of each site.  Most people’s initial response to hearing this is one of horror: “What’s the point of having a cache if you just nuke it periodically?”  The problem is, that it’s not quite that straight-forward.  There isn’t a 1:1 mapping between an item and the HTML generated from it.  Rather than try and track the items that go into each chunk of HTML, Sitecore pragmatically opts to clear the caches.  If you want to look further into changing this yourself – feel free, I know a lot of people would be interested, but it’s not quite as simple as it sounds.

So, the above XML is irritating to maintain for three reasons:

  1. The above XML actually appears twice in the configuration: once in publish:end, and once in publish:end:remote.
  2. You can’t add extra <site> entries to the list using configuration include files (well, you can, but only after pretty much replacing the whole declaration).
  3. Each time you add a website, you have to remember to add a corresponding entry to ensure the caches get cleared.

The first irks me because you’re forced to repeat yourself (Don’t Repeat Yourself), but it’s only really a nuisance because it has to be constantly maintained (point 3).  Indeed, the repetition is fairly common when dealing with local and remote events.  The second is a pain because many developers – and I include myself in this – can be somewhat lazy at times.  If you make the “right way” hard enough, people are going to start taking the path of least resistance and start changing the web.config directly.  The third is the worst because it means that you have to keep re-visiting this obscure bit of configuration every time you add a new site.  Sooner or later, someone will forget to update it.

Solving the maintenance problem is actually rather easy.  Taking away the need to constantly alter the HtmlCacheClearer configuration upon adding a new site solves the latter two, and makes the first point bearable.  If you have a solution with any reasonable number of websites, you’re probably already using site property inheritance.  If you aren’t, then you should check out one of my previous posts about it.  So, if we have this custom property on all our websites, we can identify them at start-up and register them automatically (incidentally, having such a property is useful in so many other circumstances – I’d always recommend having it).  Assuming you’re using the default site provider (effectively, assuming new websites can’t pop up at runtime) then the following works:

public CustomHtmlCacheClearer(string propertyName, string propertyValue)
{
    foreach (var siteInfo in Factory.GetSiteInfoList())
    {
        if (string.Equals(propertyValue, siteInfo.Properties[propertyName], StringComparison.Ordinal))
        {
            Log.Debug("Registered site '" + siteInfo.Name + "' with HtmlCacheClearer.", this);
            Sites.Add(siteInfo.Name);
        }
    }
}

Then you create an include file a bit like the one below:

<?xml version="1.0"?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:set="http://www.sitecore.net/xmlconfig/set/">
  <sitecore>
    <events>
      <event name="publish:end">
        <handler type="Sitecore.Publishing.HtmlCacheClearer, Sitecore.Kernel">
          <patch:delete />
        </handler>
        <handler type="MyAssembly.CustomHtmlCacheClearer, MyAssembly" method="ClearCache">
          <param desc="propertyName">contentManaged</param>
          <param desc="propertyValue">true</param>
        </handler>
      </event>
      <event name="publish:end:remote">
        <handler type="Sitecore.Publishing.HtmlCacheClearer, Sitecore.Kernel">
          <patch:delete />
        </handler>
        <handler type="MyAssembly.CustomHtmlCacheClearer, MyAssembly" method="ClearCache">
          <param desc="propertyName">contentManaged</param>
          <param desc="propertyValue">true</param>
        </handler>
      </event>
    </events>

    <sites>
      <site name="website" set:contentManaged="true" />
    </sites>
  </sitecore>
</configuration>

You should now have a HtmlCacheClearer that will consistently clear the HTML caches of any registered websites within Sitecore that have a property of contentManaged equal to true.  And if you do use site property inheritance, then any new websites you add will inherit it automatically.

For anyone who’s really against having the repeated handler definitions, it is actually possible to trim the event registrations down further to reduce duplication, but I’m actually planning to cover that in a future post.

3 thoughts on “Maintaining the HTML cache clearer”

  1. Thanks for sharing – I’ve decided to replace the _sites initialization with SiteManager.GetSites().Where(x => x.Properties[“contentManaged”] == “true”) which seams to be most compact and readable for me. That does away with your property Name/Value definition in the config and is even more DRY and compact – though the convention for the attribute name and value becomes less obvious.

    1. Hi Jan, thanks for the feedback!
      I agree that your method is definitely neater and more compact, but one thing I like to try and show in my posts is the various ways that you can use Sitecore configuration to make things a bit more flexible and that’s why I went down the route of using the constructor parameters. I’m actually planning a post on how you can use the ref directive in your configuration to re-use config fragments.

Comments are closed.