Obviously, MDC is a mess right now. We’re making headway on understanding the problem — or, at least, understanding what we need to understand.
For some reason, a specific process on the two servers that host MDC is racing out of control; within seconds of startup, it’s chewing up essentially every single cycle of processor time. We don’t yet know why.
We have a plan of attack for hopefully learning why this is happening. It also so happens that this plan may help alleviate the problem somewhat while we continue to analyze the situation.
There are three key problems at hand, all happening at once:
- We’re running MindTouch 9.08.3 at the moment, and need to upgrade to 9.12.2. That version offers performance and stability improvements.
- We had hoped that upgrading to MindTouch 9.12.2 would resolve our problems; however, it appears that we have some sort of configuration problem, which is not just the cause of our problems on 9.08.3, but are actually somehow exacerbated by the upgrade to 9.12.2, resulting in a completely unusable system post-upgrade.
- So we need to figure out what this configuration issue is and resolve it, then do the upgrade again; in theory, this should resolve our problems fairly well.
This is all easier said than done. As far as we — and the folks at MindTouch — can tell, our configuration is generally okay. There are some settings we previously did not have correct, but even after fixing them, performance is still woefully unacceptable.
So the next step is to set up a third machine in addition to the two we currently have running MDC. It will have some additional profiling tools installed, and some percentage of our traffic will be directed to that machine for analysis.
Some of the details of this plan are subject to change, and I’m intentionally being a little vague since we’re still sorting out the specifics and I’m not entirely clear on some of the details yet. We’re having more meetings over the next couple of days to finalize the plan and get things rolling.
