Justin, I've read through this thread carefully, and have started working my way through your presentation here:
http://assets.madcapsoftware.com/webina ... vesSEO.pdf. However, before we consider going down the path of switching from tri-pane to top navigation, I want to be sure we’ve tried everything we can to improve the SEO with our current output.
I generated the Sitemap.xml file for our HTML5 KB using Flare, and while it covers every topic (though also includes extraneous content like thumbnail .png files which I’d just as soon leave out), it seems that many of our htm pages are not getting crawled even though they’re in the sitemap. Our site URL is
http://bigdatakb.syncsort.com/, and if I do a Flare search within the site for “dmexpress machine nodename”, for example, the first hit is the link to the matching article, but if I do a google search on that same string, I get nothing. On the other hand, if I do a google search on “dmexpress platform compatibility”, the first hit is a link to the article in our KB on that topic. I cannot make heads or tails on why one topic is found and another is not.
One thing to be aware of, which may or may not be further complicating things, is that we enabled HTML5 server-based output to allow for our “attached files” such as pdf and text files to also be searched. This was more difficult to set up than we expected, and it didn’t quite work as advertised without some fiddling, so we had to manually copy those non-XHTML files from the Content\Resources\Attachments folder to the AutoSearch folder in order for them to be found in a Flare search. I disallowed the AutoSearch folder in Robots.txt so that the same content isn’t crawled in two different locations for google searches. Interestingly, those pdf files are more likely to be found in a google search than the htm topics that link to them. For example, I created a Google custom search (
https://cse.google.com/cse/publicurl?cx ... otdtpa00f8) limited to just our site, and if I search on “implementation best practices”, I get the pdf doc attached to the topic with those words in the title, but the topic itself does not show up.
I uploaded a Robots.txt file to our site just this past Monday, and it does point to the Sitemap.xml file, but perhaps the site hasn’t been crawled since then. Based on your input, I tried pointing an external sitemap generator to our site URL to see what it would find, and it generated a sitemap with exactly two entries, one with the URL of the site, and one to the Default.htm file, so perhaps that’s a hint that there is some crawling issue. I’m not sure what is supposed to be in the root directory that would help the sitemap generator know which subdirectories to crawl, but clearly it wasn’t seeing any of our content. One suggestion was to create a topic containing a TOC proxy, and then somehow getting Google to crawl that topic file, but this feels like too much of a hack to me. I’m not sure how to get Google to crawl this topic any more than I can get it to find my Sitemap, and the topic itself is ugly and extraneous, as there is no clear way that I can see to style it without affecting the style of my actual TOC in the navigation pane.
I’m trying to understand why the top navigation is less problematic in terms of SEO, given that Google does seem able to see some iframe topics. One of the things that I don’t like about the top navigation is the less clear view to the user of where in the hierarchy of topics they are. Even if you use breadcrumbs, you only see the direct path to where you are, not what else surrounds it, so you still have no context. I found one site that seems to have the left-pane navigation that I prefer, but without the use of iframes. Check out
http://www.cloudera.com/documentation/m ... ction.html. I don’t know if they used Flare or some other tool to build their doc site, but if we could achieve that with Flare, that would be ideal.
In the meanwhile, I’d still love to know whether the combination of the Flare-generated Sitemap.xml and the newly uploaded Robots.txt files we have in place should trigger Google to crawl all the pages on our site and yield better search results, or if there is more that we need to do. Since you can’t tell when/whether your site has been fully crawled, it’s hard to know if changes that you’ve made haven’t made a difference, or simply haven’t been “tested” yet.
Thanks for any and all input.