(Here's the other old post that was lurking about in drafts.)
We had a rather tricky issue to fix these past couple of days. We had implemented generating HTML Snapshots using PhantomJS, which was working fairly splendidly. The only problem was that our snapshots contained a duplicate header and footer for every page we generated a snapshot for. I should also clarify that the snapshots only had duplicate headers and footers when we ran the process inside of the client's servlet code. When we spiked out a simple test application and pointed it at the client's pages, no duplicate header or footer showed up.
My initial suspicion was that the way we had implemented snapshotting was to blame. We based our solution off of one given in the documentation on the Google Webmaster portal. We created a Java servlet filter, and added that in to the servlet filter Chain. Every time a page request came through the Spring router, our servlet filter would check to see if the URI query string contained the argument "_escaped_fragment_", if it contained that argument, we would then generate and return our snapshot as the response to that request. My assumption was that some other filter was interfering with this process, and causing something to run a second time.
We dug into the servlet filters, trying to selectively disable them one at a time until we fixed the problem. Unfortunately our theory ended up bearing no fruit. Near as we could tell, disabling any of the filters caused far more dramatic problems than the one we were trying to solve.
It turned out that the issue was a bit more subtle than we had initially suspected. Our client primarily uses Spring for their web application. However, they also use another framework called Sitemesh. Both frameworks provide some overlapping functionality, and that overlap was making itself known when we performed our HTML Snapshots. In particular, Sitemesh was responsible for adding some static content to each page request, specifically (as you might have guessed) the header and footer.
Basically, this was happening when we requested an HTML snapshot of a given page:
- Our servlet filter requests an HTML snapshot of that page
- Spring and Sitemesh generate that page
- The HTML Snapshot is completed, and returned by our servlet filter
- The Sitemesh filter engages, and decorates our snapshot with a second header and footer
Once we discovered what Sitemesh was doing, we simply had to figure out how to fix it. That turned out to be fairly painless. Sitemesh is configured by a file called Decorators.xml. In that file you can add filters so Sitemesh will not decorate requests matching that filter. We wanted Sitemesh to continue working the same way it always had, we just didn't want it to interfere with our HTML Snapshots. We were able to solve that by modifying the exclusion filter to ignore any request with _escaped_fragment_ in it. So that section of the file ended up looking like this:
<excludes> <pattern>/exclude.jsp</pattern> <pattern>/exclude/*</pattern> <pattern>*_escaped_fragment_*</pattern> </excludes>
Definitely a tricky issue to figure out, but I'm glad we ran into it. It definitely expanded my understanding of the client's architecture quite a bit.