- The red error messages we keep seeing in pages where templates are being used are happening because of services (or MindTouch extensions) that fail to start up correctly, or have crashed and could not be restarted. This is happening due to an XML reading routine that isn’t thread-safe and needs to be. MindTouch has fixed this issue for the 9.08.1 release that’s due to ship at the end of next week.
- Another problem that occurs during startup causes the Lucene search service and the MindTouch pubsub service not to link up correctly. This results in the search index not being updated as content is added to the site (or failing to generate at all if we initiate a full rebuild of the index). MindTouch intends to fix this for their Noatak release, which is scheduled to ship in mid-November. They are going to try to deliver a hotfix before then if possible.
- Another problem with indexing occurs while attempting to rebuild the index; the indexer is failing due to a problem with the new thread pool used in the 9.08 release. MindTouch switched us back to the old thread pool during their visit last week, but this is still an issue that needs to be fixed. However, it’s not currently directly impacting MDC. A real fix is due for the Noatak release.
- Yet another problem is occurring with the indexing system: a regression in the 9.08 release causes simultaneous read and write operations on the index to cause the read to fail, resulting in search failures. This doesn’t hurt us very often, but does need to be addressed. This, too, is due to be fixed in Noatak.
Next week, IT will have a testbed version of MDC set up, finally actually duplicating our live configuration (two hosts, etc). This will let us do proper testing in advance of future upgrades, and we’ll use this for the first time with the 9.08.1 update, with the hope of going live with that update sometime the first or second week of November.
QA has a test plan for MDC, which will be used to certify each update before deployment.
The long and short of it is this: we finally have a grasp on what’s going wrong, and a plan to fix it — and test it — going forward.
I’ll post again as more information comes in.