Apr 232010

Obviously, MDC is a mess right now. We’re making headway on understanding the problem — or, at least, understanding what we need to understand.

For some reason, a specific process on the two servers that host MDC is racing out of control; within seconds of startup, it’s chewing up essentially every single cycle of processor time. We don’t yet know why.

We have a plan of attack for hopefully learning why this is happening. It also so happens that this plan may help alleviate the problem somewhat while we continue to analyze the situation.

There are three key problems at hand, all happening at once:

  • We’re running MindTouch 9.08.3 at the moment, and need to upgrade to 9.12.2. That version offers performance and stability improvements.
  • We had hoped that upgrading to MindTouch 9.12.2 would resolve our problems; however, it appears that we have some sort of configuration problem, which is not just the cause of our problems on 9.08.3, but are actually somehow exacerbated by the upgrade to 9.12.2, resulting in a completely unusable system post-upgrade.
  • So we need to figure out what this configuration issue is and resolve it, then do the upgrade again; in theory, this should resolve our problems fairly well.

This is all easier said than done. As far as we — and the folks at MindTouch — can tell, our configuration is generally okay. There are some settings we previously did not have correct, but even after fixing them, performance is still woefully unacceptable.

So the next step is to set up a third machine in addition to the two we currently have running MDC. It will have some additional profiling tools installed, and some percentage of our traffic will be directed to that machine for analysis.

Some of the details of this plan are subject to change, and I’m intentionally being a little vague since we’re still sorting out the specifics and I’m not entirely clear on some of the details yet. We’re having more meetings over the next couple of days to finalize the plan and get things rolling.

 Posted by at 12:10 AM

  11 Responses to “Stating the obvious”

  1. When did this start?

  2. We’ve had performance and reliability problems for a great many months, but they unfortunately got worse after we tried to upgrade early last week. We reverted back to 9.08.3 a day later, and now it’s working but very slow, worse than before the attempted upgrade.

  3. That is why I’m confused, that reverting to the previous version did not revert to the previous performance. Strange! Thanks for looking into it.

  4. Yeah, that’s pretty weird. It’s better than it was after the upgrade attempt, in that the site isn’t constantly crashing outright, but performance didn’t get better either.

    I just had a meeting with our IT guys and MindTouch support and engineering people, and we’ve got our plan of attack pretty well firmed up. There will be some intense testing work next week.

  5. “the site isn’t constantly crashing outright”. It is. Most europeans often get a blank page when they try to reach devmo.

  6. @Daniel: I think he meant that the server isn’t crashing, and what you just posted on your blog agrees with that. If it was crashed, then you would, of course, get a connection failure every time. If you sometimes get blank pages and sometimes delayed responses, then obviously it’s not “crashed outright”, it’s just working too slowly to be usable (not that this is a good thing..)

  7. “Most europeans often get a blank page when they try to reach devmo.” Exactly same problem.

    Is SSL really needed for reading documentation?

  8. Same here as Daniel but I’m in U.S. on the east coast.

  9. Correct. The server itself is not crashing. The site is just taking so long to respond that often you don’t get anything at all. That’s not the same thing.

    The HTTPS thing was added to secure user passwords and email addresses during login. The wiki software unfortunately doesn’t support only using HTTPS for authentication; it’s all or nothing. So we chose “all.”

    HTTPS is not the real problem here.

  10. HTTPS is a problem here since it prevents intermediaries to cache the responses, which would help with the current issues.

    On a somewhat related topic, will the new mindtouch release allow bulk export of MDC ?

  11. Being able to switch to using HTTPS only for authentication is high on our wish list. I hope we’ll have that ability soon, but I don’t know of any timeline on that. Right now all my attention is focused on the matter at hand, which is fixing the problems the server is having with the process run amok.