Following in the footsteps of MDN’s “Thin Pages” SEO experiment done in the autumn of 2017, we completed a study to test the effectiveness and process behind making changes to correct cases in which pages are perceived as “duplicates” by search engines. In SEO parlance, “duplicate” is a fuzzy thing. It doesn’t mean the pages are identical—this is actually pretty rare on MDN in particular—but that the pages are similar enough that they are not easily differentiated by the search engine’s crawling technology.
This can happen when two pages are relatively short but are about a similar topic, or on reference pages which are structurally and content-wise quite similar to one another. From a search engine’s standpoint, if you create articles about the
background-origin CSS attributes, you have two pages based on the same template (a CSS reference page) whose contents are easily prone to being identical. This is especially true given how common it is to start a new page by copying content from another, similar, page and making the needed changes, and sometimes even only the barest minimum changes needed.
All that means that from an SEO perspective, we actually have tons of so-called “duplicate” pages on MDN. As before, we selected a number of pages that were identified as duplicates by our SEO contractor and made appropriate changes to them per their recommendations, which we’ll get into briefly in a moment.
Once the changes were made, we waited a while, then compared the before and after analytics to see what the effects were.
The content updates
We wound up choosing nine pages to update for this experiment. We actually started with ten but one of them wound up being removed from the set after the fact when I realized it wasn’t a usable candidate. Unsurprisingly, essentially all duplicate pages were found in reference documentation. Guides and tutorials were with almost no exceptions immune to the phenomenon.
The pages were scattered through various areas of the open web reference documentation under https://developer.mozilla.org/docs/Web.
Things to fix or improve
Each page was altered as much as possible in an attempt to differentiate it from the pages which were found to be similar. Tactics to accomplish this included:
- Make sure the page is complete. If it’s a stub, write the page’s contents up completely, including all relevant information, sections, etc.
- Ensure all content on the page is accurate. Look out for places where copy-and-pasted information wasn’t corrected as well as any other errors.
- Make sure the page has appropriate in-content (as opposed to sidebar or “See also”) links to other pages on MDN. Feel free to also include links in the “See also” section, but in-content links are a must. At least the first mention of any term, API, element, property, attribute, technology, or idea should have a link to relevant documentation (or to an entry in MDN’s Glossary). Sidebar links are not generally counted when indexing web sites, so relying on them entirely for navigation will wreak utter havoc on SEO value.
- If there are places in which the content can be expanded upon in a sensible way, do so. Are there details not mentioned or not covered in as much depth as they could be? Think about the content from the perspective of the first-time reader, or a newcomer to the technology in question. Will they be able to answer all possible questions they might have by reading this page or pages that the page links to through direct, in-content links?
- What does the article assume the reader knows that they might not yet? Add the missing information. Anything that’s skimmed over, leaving out details and assuming the reader can figure it out from context needs to be fleshed out with more information.
- Ensure that all of the API’s features are covered by examples. Ideally, every attribute on an element, keyword for a CSS property’s value, parameter for a method, and so forth should be used in at least one example.
- Ensure that each example is accompanied by descriptive text. There should be an introduction explaining what the example does and what feature or features of the technology or API are demonstrated as well as any text that explains how the example works. For example, on an HTML element reference page, simply listing the properties then providing an example that only uses a subset of those properties isn’t enough. Add more examples that cover the other properties, or at least the ones that are likely to be used by anyone who isn’t a deeply advanced user.
- Do not simply add repetitive or unhelpful text. Beefing up the word count just to try to differentiate the page from other content is actually going to do more harm than good.
- It’s frequently helpful to also change the pages which are considered to be duplicate pages. By making different changes to each of the similar pages, you can create more variations than by changing one page alone.
Pages to be updated
The pages we selected to update:
In most if not all of these cases, the pages which are similar are obvious.
After making appropriate changes to the pages listed above as well as certain other pages which were similar to them, we allowed 60 days to pass. This is less than the ideal 90 days, or, better, six months, but time was short. We will check the data again in a few months to see how things change given more time.
The changes made were not always as extensive as would normally be preferred, again due to time constraints created by the one-person experimental model. When we do this work on a larger scale, it will be done by a larger community of contributors. In addition, much of this work will be done from the beginning of the writing process as we will be revising our contributor guide to incorporate the recommendations as appropriate after these experiments are complete.
Pre-experiment unvisited pages
As was the case with the “thin pages” experiment, pages which were getting literally zero—or even close to zero— traffic before the changes continued to not get much traffic. Those pages were:
For what it’s worth, we did learn from this eventually and experiments that were begun after the “thin pages” results were in no longer included pages which got no traffic during the period leading up to the start of the experiment, but this experiment had already begun running by then.
Post-experiment unvisited pages
There were also two pages which had a small amount of traffic before the changes were made but no traffic afterward. This is likely a statistical anomaly or fluctuation, so we’re discounting these pages for now:
The remaining three pages had useful data. This is a small data set but is what we have, so let’s take a look.
|Page URL||Sept. 16-Oct. 16||Nov. 18 – Dec 18|
|Clicks||Impressions||Clicks||Impressions||Clicks Chg. %||Impressions Chg. %|
For each of these three pages, the results are promising. Both click and impression counts are up for each of them, with click counts increasing by anywhere from 14% to 36% and impression counts increasing between 34% and 62% (yes, I rounded down for each of these values, since this is pretty rough data anyway). We’ll check the results again soon and see if the results changed further.
Because of certain implementation specifics of this experiment, there are obviously some uncertainties in the results:
- We didn’t allow as much time as is recommended for the changes to fully take effect in search data before measuring the results. This was due to time constraints for the experiment being performed, but, as previously mentioned, we’ll look at the data again later to double-check our results.
- The set of pages with useful results was very small, and even the original set of pages was fairly small.
- There was substantial overall site growth during this period, so it’s likely the results are affected by this. However, the size of the improvements seen here suggests that even with that in mind, the outcome was significant.
After a team review of these results, we came to some conclusions. We’ll revisit them later, of course, if we decide that a review of the data later suggests changes be made.
- The amount of improvement seen strongly suggests we should prioritize fixing duplicate pages, at least in cases where at least one of the pages which are considered duplicates of one another is getting at least a low-to-moderate amount of traffic.
- The MDN meta-documentation, in particular the writer’s guide and the guides to writing each of the types of reference content, will be updated to incorporate the recommendations into the general guidelines for contributing to MDN. Additionally, the article on MDN about writing SEO-friendly content will be updated to include this information.
- It turns out that many of the changes needed to fix “thin” pages also applies when fixing “duplicate” pages.
- We’ll re-evaluate prioritization after reviewing the latest data after more time has passed.
The most interesting thing I’m learning about SEO, I think, is that it’s really about making great content. If the content is really top-notch, SEO practically attends to itself. It’s all about having thorough, accurate content with solid internal and external links.