Aug 122010

Obviously search is busted on MDC right now, and has been for almost two weeks now. We’re tantalizingly close to having it fixed. This problem was caused by a combination of a minor glitch, compounded by a communication error, multiplied by my impatience, then polished off with a larger technical problem.

The story of what’s happened is, I think, an interesting one.

About two weeks ago, search abruptly stopped working, reporting an error along the lines of “The search query you entered contains characters which need to be escaped.”

We’d seen this error before, as a result of a strange communication issue with the database server. So I asked IT about it, and they said there didn’t seem to be a problem. Instead of being patient and waiting to see if the problem cleared up on its own, or to see if IT found a problem and fixed it, I decided to use the MindTouch control panel to rebuild our search index.

This… was not the right decision.

You see, the lucene-based indexing tool used by MindTouch is extremely resource-intensive, and pretty much uses every drop of the server’s capacity. And if there’s any load on the machine other than that, the indexer tends to abort abruptly. This would be the “larger technical problem” I previously referred to.

So the indexer deleted the current site index and started rebuilding it, only to fail. It was at about this time that IT let me know that they had found a minor problem (I don’t know what it was) that was probably responsible for the original search error.

But, by this point, the damage was done, and the MDC index was gone.

Since then, I’ve been working with IT and MindTouch to get the index rebuilt. We removed one of the three servers hosting MDC from the pool and ran the indexer on it to build the index, which did complete successfully. In theory, sometime today that index will get copied to the other two hosts, and then all three will be back online. I’m waiting for IT to get that done now.

After that, the index will need to be refreshed, but that will happen automatically and as a much smaller job than a full index rebuild, should go without a hitch (it has in the past).

 Posted by at 8:46 AM

  4 Responses to “Why MDC search is broken”

  1. Sort of too bad it deletes the old index first, instead of building a new one and then only swapping them once it’s successful!

  2. Is the built-in search better than a domain-restricted google/bing/whatever search anyway?

  3. @Dao,

    The new adaptive search that ships with MindTouch 2010 is likely better than the Bing/Google search once it’s trained. It learns from user behavior within the MindTouch site and soon site admins will be able to manipulate and tune search results.


    No doubt, Keep in mind, in most cases MindTouch is deployed at MDN scale the search is moved off to it’s own hardware resources and, of course, it could be deployed the way you suggest in either setup.

  4. Dão, if you use Google, you also tend to end up with lot of redundant results due to (a) its returning similar content from multiple localizations, and (b) Deki’s disregard for appropriate HTTP status codes, so pages with a few redirects can show up several times, even though they’re the same content. (a) can usually be remedied by appending the two-letter language code, e.g., query words