RLP's Computing Blog

Making a "Recent Changes" Sidebar In Hakyll

Article Info

Recent Changes

About


A while back (Nov 2017; current writing is Dec 2020) I was trying to figure out how to do a Recent Changes sidebar in Hakyll (a Haskell-based static site generator), and it turns out that there are some gotchas. In particular, a Recent Changes list needs to render at least some of each page that it wants to talk about, and if you do it the naive way you end up with circular dependencies; each page depends on every other page, including itself, since it renders every page to make the recency list.

Note that generating a recency page is easy, and there are a number of tutorials on it; it’s generating a recency sidebar that’s hard.

In my post about this I reference a couple of places where people have talked about this; this has come up in several other posts.

The thing that people mostly refer to, though, is a post from the Hakyll author in which he lays out two basic approaches:

  1. Compile the items twice.
  2. Get only the metadata for the recency items.

For some reason I (and others, see my stack overflow post) had trouble with #2, so I ended up with #1, like so:

    match "posts/**.md" $ version "recents" $ do
        route $ (gsubRoute "posts/" (const "")) `composeRoutes` setExtension "html"
        compile $ do
            pandocCompilerWithTransform hblogPandocReaderOptions hblogPandocWriterOptions (titleFixer titles)
                >>= loadAndApplyTemplate "templates/post.html"    (postCtx allTags allCategories gitTimes)
                >>= relativizeUrls

    match "posts/**.md" $ do
        route $ (gsubRoute "posts/" (const "")) `composeRoutes` setExtension "html"
        compile $ do
            myId <- getUnderlying
            -- Load the posts we need for the Recent Changes list;
            -- see the 'version "recents"' explanation above.
            recents <- (selectRecents myId) =<< (myRecentFirst gitTimes) =<< loadAll ("posts/**.md" .&&. hasVersion "recents")
            let postsContext = postCtx allTags allCategories gitTimes `mappend`
                               -- Distinguish things like archive.html from regular posts
                               constField "article" "yes"            `mappend`
                               -- Show recent posts
                               listField "recents" (postCtx allTags allCategories gitTimes) (return recents)

            pandocCompilerWithTransform hblogPandocReaderOptions hblogPandocWriterOptions (titleFixer titles)
                >>= loadAndApplyTemplate "templates/post.html"    postsContext
                >>= loadAndApplyTemplate "templates/default.html" postsContext
                >>= relativizeUrls

However, it turns out that there are some surprise problems waiting in the wings with this approach, because of some subtle details of how it works, specifically:

  1. It compiles everything twice, as discussed.
  2. The routes for both compilations are exactly the same; this is important because you need the recency list to point to the same URLs that the final compiled pages end up at, so like if your post is at posts/foo.html, having the recency list point to recents/foo.html is silly and unhelpful.
  3. The recency compilation is in a limited context with a different template set. In particular, in my case the “default.html” template has all the visual prettiness (as well as the reference to the recents template variable, which is what actually gets us the recents sidebar), so the things compiled for the recency list are very bare looking (and, obviously, don’t have the recency sidebar, since if they did we’d be right back to the circular dependency).

This means that this solution implicitly relies on Hakyll’s automatic dependency ordering: the regular posts must depend on their own recency posts, so that the regular posts/foo.html is laid down after the recency version of posts/foo.html, otherwise you end up with the bare version. This is the key insight: Hakyll does, in fact, write out both versions; this setup working at all relies on the order in which it does so.

Having realized that, it already feels a lot more hacky than it did at first, huh?

Anyway, this worked fine, but my blog also has categories (i.e. sub-sites (and honestly if I had it to do over again I might have just made them entirely separate Hakyll instances, but whatever)), and I was having a problem where every category was seeing every other category in their recency lists; this is awkward when, in particular, the main reason I have categories is so that my super-TMI personal posts are entirely distinct from my computing posts. So I did this:

    match "posts/**.md" $ version "recents" $ do
        route $ (gsubRoute "posts/" (const "")) `composeRoutes` setExtension "html"
        compile $ do
            pandocCompilerWithTransform hblogPandocReaderOptions hblogPandocWriterOptions (titleFixer titles)
                >>= loadAndApplyTemplate "templates/post.html"    (postCtx allTags allCategories gitTimes)
                >>= relativizeUrls

    match "posts/**.md" $ do
        route $ (gsubRoute "posts/" (const "")) `composeRoutes` setExtension "html"
        compile $ do
            myId <- getUnderlying
            categs <- myGetCategory myId
            -- Load the posts we need for the Recent Changes list;
            -- see the 'version "recents"' explanation above.
            recents <- (selectRecents myId) =<< (myRecentFirst gitTimes) =<< loadAll ((fromGlob $ "posts/" ++ (mconcat categs) ++ "/**") .&&. hasVersion "recents")
            let postsContext = postCtx allTags allCategories gitTimes `mappend`
                               -- Distinguish things like archive.html from regular posts
                               constField "article" "yes"            `mappend`
                               -- Show recent posts
                               listField "recents" (postCtx allTags allCategories gitTimes) (return $ take 3 recents)

            pandocCompilerWithTransform hblogPandocReaderOptions hblogPandocWriterOptions (titleFixer titles)
                >>= loadAndApplyTemplate "templates/post.html"    postsContext
                >>= loadAndApplyTemplate "templates/default.html" postsContext
                >>= relativizeUrls

The relevant bit of the diffs:

         route $ (gsubRoute "posts/" (const "")) `composeRoutes` setExtension "html"
         compile $ do
             myId <- getUnderlying
+            categs <- myGetCategory myId
             -- Load the posts we need for the Recent Changes list;
             -- see the 'version "recents"' explanation above.
-            recents <- (selectRecents myId) =<< (myRecentFirst gitTimes) =<< loadAll ("posts/**.md" .&&. hasVersion "recents")
+            recents <- (selectRecents myId) =<< (myRecentFirst gitTimes) =<< loadAll ((fromGlob $ "posts/" ++ (mconcat categs) ++ "/**") .&&. hasVersion "recents")
             let postsContext = postCtx allTags allCategories gitTimes `mappend`
                                -- Distinguish things like archive.html from regular posts
                                constField "article" "yes"            `mappend`

Seems super innocuous and reasonable, right?

Now, I never figured out exactly what happened here, honestly, but somehow this changed the shape of the dependency tree, with the result that the pages with no category (of which I have only a few) ended up with the recency version of the page (the bare, boring one) being the thing that got written out last.

It is worth noting that you can actually see this occur in the output:

  updated posts/index.md
  updated posts/meta/index.md
  updated posts/index.md (recents)
  updated posts/meta/index.md (recents)

That “(recents)” bit means “here’s where I compiled this thing with the version set to ‘recents’”, and the fact that those come second is bad. I didn’t realize that the output had been telling me about the problem the whole time until way, way late in the process.

I played around with many ways of handling this, but the thing that made me realize that I was dealing with hacky bullshit, and led to me figuring out the explanation I gave above, was when I discovered that adding these lines right after the “categs <-” line above made the problem go away:

            loads1 <- loadAll ("posts/computing/**" .&&. hasVersion "recents")
            traceM ("loads1: " ++ show (loads1 :: [Item String]))
            loads2 <- loadAll ("posts/**" .&&. hasVersion "recents")
            traceM ("loads2: " ++ show (loads2 :: [Item String]))

This, apparently, rejiggered the dependency tree for the non-category posts in whatever way was required to make their non-recency versions get written out last, because suddenly everything worked. Once I realized that simply adding or removing those lines would fix or restore the problem, I realized it had to be the dependency tree.

And also, EEEEWWWWWW.

I spent far more time than I should have figuring this out, but the key was this mailing list post, in which the author mentions that Danny Su’s blog totally has a recency list and it works great.

It turns out that Danny Su went the other way, the metadata way that I couldn’t get to work; you can check out his source, but the relevant bit is super, super simple:

recentPosts :: Compiler [Item String]
recentPosts = do
    identifiers <- getMatches "posts/*"
    return [Item identifier "" | identifier <- identifiers]

I mean, clearly not that simple because I couldn’t figure it out, but anyway.

What that does is it uses getMatches, which pulls metadata out of a store that is generated for all compiled items (where does that live?, I have no idea) to get the identifiers for everything we care about (which is basically everything). It then builds Item instances out of those identifiers by simply setting their content to the empty string. These items then get turned into compilers through magic that I frankly don’t understand, but it’s enough to let them retrieve the metadata, including the post’s title, which is all we actually need; we don’t need the body at al.l

The relevant bits of my code ended up like so:

hblogMain :: IO ()
hblogMain = hakyll $ do
[snip]
    match "posts/**.md" $ do
        route $ (gsubRoute "posts/" (const "")) `composeRoutes` setExtension "html"
        compile $ do
            myId <- getUnderlying
            categs <- myGetCategory myId

            -- Normally we find recents from the current category,
            -- but for the meta pages we use computing *and* career
            let pattern = if categs == [] then
                    (fromGlob "posts/computing/**.md") .||. (fromGlob "posts/career/**.md")
                else
                    (fromGlob $ "posts/" ++ (head categs) ++ "/**.md")

            -- Load the posts we need for the Recent Changes list
            recents <- (selectNotSelf myId) =<< (myRecentFirst gitTimes) =<< myGetIdentifiers pattern

            let postsContext = postCtx allTags allCategories gitTimes `mappend`
                               -- Distinguish things like archive.html from regular posts
                               constField "article" "yes"            `mappend`
                               -- Show recent posts
                               listField "recents" (postCtx allTags allCategories gitTimes) (return $ take 3 recents)

            pandocCompilerWithTransform hblogPandocReaderOptions hblogPandocFinalWriterOptions (titleFixer titles)
                >>= loadAndApplyTemplate "templates/post.html"    postsContext
                >>= loadAndApplyTemplate "templates/default.html" postsContext
                >>= relativizeUrls

-- Given a pattern, searches the already-loaded metadata with
-- getMatches for items matching that pattern and returns them (with
-- no body)
--
-- Mostly stolen from https://github.com/dannysu/hakyll-blog/blob/321532e82d6e847f45c93f58f83b6b354be6da1a/src/HakyllHelper.hs
myGetIdentifiers :: Pattern -> Compiler [Item String]
myGetIdentifiers pattern = do
    identifiers <- getMatches pattern
    return [Item identifier "" | identifier <- identifiers]

FWIW, all my source is at https://github.com/rlpowell/hblog , including the (far more complex) handling of recency based on Git timestamps.