White Label SEO
seo in dubai Dave Davies – Search Engine Land News On Search Engines, Search Engine Optimization (SEO) & Search Engine Marketing (SEM) Wed, 20 May 2020 03:17:54 +0000 en-US hourly 1 SEO for holiday shoppers /seo-for-holiday-shoppers-306779 Wed, 31 Oct 2018 11:30:00 +0000 /?p=306779 Here are a few tips to add to your SEO strategy to generate short-term wins during this holiday season.

The post SEO for holiday shoppers appeared first on Search Engine Land.


Here we are at the end of October and you’re realizing your SEO is not in shape for the holidays. Whether that’s because you are just now understanding that your SEO strategy isn’t going to yield the results you want before the holidays, or if you’ve just procrastinated – you’re looking for some techniques that will generate short-term wins.

Fortunately, they exist.

A couple of years ago I wrote a similar piece on last-minute SEO tips for the holidays. That left readers with about three weeks to make use of them.

This year we’re getting a slightly earlier start, so let’s dive in with 5 things you can do right now to get started on making more money during this peak time of year.

Titles and descriptions in the SERPs

I’m going to start with the only tip that I’ll be repeating from my previous article and that’s titles and descriptions. I’m repeating it for two reasons:

    1. It’s easily the most straight-forward thing you can do with tremendous impact.
    2. There have been changes in how to approach this.

Let’s consider a parent is out looking for a video game for his kids and encounters two titles and descriptions in the SERPs:

Title: Gamer-Rated Top 10 Video Games For Christmas 2018 |
Description: Gamer Empire enlists top video game enthusiasts to rate and rank this year’s top video games to make your Christmas gift buying easier.


Title: Best Video Games | Last Guardian, Titanfall 2, Pokemon Sun & Moon, Battlefield
Description: Best video games for Christmas including Last Guardian, Titalfall 2, Pokemon Such & Moon, Battlefield 1, Call Of Duty: Infinite Warfare, Skyrim, PS4, xBox One

Which one am I likely to click? One tells me that I’m going to find what I’ve likely queried, the other is showing me a list of things I probably don’t recognize.

Look through the new Search Console and find the terms that your pages are ranking for and focus your titles and descriptions on improving the clickthroughs for those terms. Remember, you are not just optimizing for the person who wants what’s offered on your site, you’re optimizing for the people who would purchase it for them.

This year, we can very quickly test titles (and thankfully we have time) with Google Ads. With the expanded text ads now allowing three sets of 30 characters rather than two (and in the fall of 2016 it had just been increased from a single 25-character headline) and descriptions now increased from 80 to 90 characters, we can test versions a lot closer to what we would deploy organically.

Importance of featured snippets

Featured snippets give you the opportunity to jump the queue and launch yourself into the coveted position zero for a lot of the types of queries that holiday shoppers would use (remember – for this purchaser they often don’t know what they want so many of the queries will be exploratory).

Here’s what Tech Radar pulled off for one such phrase:

In my opinion, that featured snippet is more valuable than any #1.

If we consider some of the data regarding the growth in voice search, that will be a strong influencer as well. The folks at Stone Temple Consulting (now Perficient Digital) outlined the year-over-year data in voice search just after the holidays last year in this study. At its core, it revealed a much stronger willingness of people to use voice search, especially in public (read: on their phones).

Featured snippets essentially drive voice search but they are a bit different so I recommend reading this piece by Brian Ussery. There have been a few changes since it was written a year ago however the information and process are still valid.

Updating evergreen URL with new content

This advice pretty much works anytime but never more than when you’re in a scramble for rankings to attract visitors who may not necessarily buy from you for their own purposes.

Top lists of popular games/toys/books/etc. are always a winner. Staff Picks. Reviews and ratings. Guides.

Think not about what you sell or what the people who want to buy it would search for, think about who shops for that demographic, what questions they would have, how they would ask it and target that in your content.

If you do this annually, I’d recommend creating a URL something like:


Next year when you update it take the content from that location if you want to archive it and move it to something like:


And put your 2019 content at the old URL. You’re effectively creating an evergreen URL but keeping your archive. This will keep any link weight passing to the primary URL headed to your most current content.

Rank elsewhere and format correctly

If you want to rank for terms that are too competitive for your current site strength, find strong resources that can rank that accept guest articles. But be careful to review Google’s reminder about large-scale article campaigns.

  1. Make a shortlist of 3 or 4 sites that rank well within your niche and accept guest articles.
  2. Research exactly what type of content resonated with THEIR audience. Use your favorite backlink or social measurement tool to figure out what content on their site gets shared the most.
  3. Create content ideas and outlines around what will appeal to their audience that overlaps with what you want to rank for and your knowledge base.
  4. Read their guidelines incredibly carefully and pitch in the EXACT format requested.
  5. In your pitch, be concise. Editors have a tough, time-consuming job making people like me look good. Respect that and keep your pitch to the point but thorough and showing your knowledge of both the subject matter and their audience.

Rank during and post-holiday queries

We have several clients in travel and one of their biggest buying seasons is not before the holidays but rather, during them. It’s when family and friends get together and our analytics tells us how it plays out in many households.

Rather than searching for a “vacation rental Portland” they’re looking for “family reunion portland” or “8 bedroom vacation rental Portland.”

The searchers are looking not for a general type of place but are searching based on the end criteria (i.e., we need x bedrooms, or we want to host y event, etc.).

Couple this with the excitement and convenience of everyone being together, place a low barrier-to-entry on the site (a low non-refundable deposit in one case) and you’re set up to win.

The reason this ties to SEO is that the terms you’ll be targeting are often less competitive. Everyone wants to rent out their “vacation rental portland” but far less competition for the bedrooms, amenities, etc.

At the same time, you’ve got a bunch of folks with newly received gift cards and their searches will be very specific.

Where the parent might have looked for “best video games 2018,” the gift card holder will be searching queries like “black ops 4 price” or “black ops 4 ps4 cheap.”

The search volume isn’t what you’d see for just “black ops 4” but the terms are far easier to target, and the strategy works just as well if you rank already for the core terms and are just expanding to get the during-and-post holiday traffic you might have been missing out.

Focus on maximizing your strategy

The holidays are a time to pull up your socks and focus on the things you can do that will impact your results and maximize your earning from holiday shoppers and post-holiday spenders. Next year, promise yourself you’ll get an earlier start with SEO for the 2019 holidays.

The post SEO for holiday shoppers appeared first on Search Engine Land.

Case study: The tale of two internal link tweaks /case-study-the-tale-of-two-internal-link-tweaks-304453 Mon, 27 Aug 2018 16:39:00 +0000 /?p=304453 Contributor Dave Davies shares a case study that shows how smart internal link building and targeted SEO can have a significant impact on rankings and traffic.

The post Case study: The tale of two internal link tweaks appeared first on Search Engine Land.

Back in 2016, I wrote a piece on optimizing internal linking structures.  In the article, we discussed a range of issues from PageRank and link equity flow to anchor text and more. I wrote the article after performing an audit on a large e-commerce site called Trophy Central.

Since several years have gone by and the site now has a slightly different SEO focus, the owner has graciously allowed me to share the details of that audit, the SEO optimization plan that emerged from it and the results.

There were two core recommendations I gave the owner after my research, review and audit, I’ll outline what they were, why, and how they were implemented by the dev team.

1. Recommendation: Navigation

At the time of the review, the site did not have drop-down navigation. Having drop-down navigation doesn’t apply to all sites but works well for many e-commerce sites and did on this one.

The principle at play here is to drive PageRank to deeper pages. There is a caveat, however. The more links you put in your navigation the less any one of them is worth. Basically, each page has a set amount of PageRank to pass and that PageRank is divided among the links on that page so the fewer the number of links, the more weight they pass.

Pages with products and categories should limit the links they have pointing to pages of importance. With a combination of keyword research to discover search volumes and the client’s understanding of the return on investment (ROI) of product sets we ended up with:

You’ll notice that we link to the most important second-tier categories followed by the most important third tier. The structure this is most similar to from the initial article is:

This structure reduces the weight of individual products but strengthens the categories. Since that’s where the search volume was for this site, it made the most sense.

2. Recommendation: Smart link placement

When I was working on Trophy Central, the site simply linked from the homepage content area to the second-tier locations and a few products. We made the simple addition of adding links to sub-categories:

The idea here was to take advantage of the weight of the area it was placed. While most of these pages are already linked to the top navigation, their placement in a key and highly visible location on the homepage sets them up to pass more weight.

Just as content in highly visible locations carries more weight, so do links. We wanted to push more weight to key sub-categories.

Additional areas

An area we will be running tests on in the future is reducing the number of links in the left nav on internal pages.

The navigation I’m referring to here is:

There is no need to pass link and page equity to sub-categories with low traffic volume.  I do not expect dramatic individual jumps from this adjustment but moving dozens of rankings up one position can lead to significant traffic gains.

The results

Did our simple but significant changes help Trophy Central?  Before I answer, let’s look at two questions that will help put the results in perspective:

  • What did adding a few links to key categories in a visible and “content” area on the homepage yield?
  • What does adding a simple and easy-to-use drop-down navigation add up to when it comes to rankings?

The simple answer to both questions is yes!  Here are the results ordered by the change in rankings.

Here are the results ordered by the current position:

Core stats

We had 381 terms being monitored so I organized them into groups so that it was easy to see and share changes. Here’s how they changed one month after implementation:

  • Top 100. The site had 56 more top 100 terms, declined in rankings across 24 and improved across 165.
  • Top 20. The site had 34 more top 20 terms, declined in rankings for 5 and improved across 55.
  • Top 10. The site had 9 more top 10 terms, declined for 3 and improved for 13 phrases.
  • Top 3. The site had no top 3 terms in the beginning and had 2 after the changes.

This is all based on some fairly straightforward changes to the internal linking structure and that’s the point of this article.

It doesn’t always take Herculean efforts to create dramatic improvement. Sometimes simply paying attention to the core elements of search engine optimization (SEO) and site structure, can pay huge dividends.

What to keep in mind

There are two elements of internal link weight and distribution to always keep in mind.

  1. When it comes to navigation links and drop-down menus, when you give to Peter you take from Paul. Each link you add dilutes the links already in place,  consider limiting your global links to pages that need the PageRank and as a way to provide traffic.  This will basically limit your navigation links to those pages people are prone to go to, which is good for users and the engines. Go figure.
    2. Links in highly visible locations are worth more than those tucked away. Google prioritizes content that’s meant to be engaged with and links are content.  Provide links to key pages in highly visible locations on your site so people and PageRank will flow.

Just these simple changes created a significant impact on rankings and traffic for Trophy Central. Thankfully it’s not because of anything magical, just smart internal link building and SEO.

The post Case study: The tale of two internal link tweaks appeared first on Search Engine Land.

How to increase your PageSpeed in WordPress /how-to-increase-your-pagespeed-in-wordpress-303727 Tue, 14 Aug 2018 14:34:00 +0000 /?p=303727 With over 59% of websites using WordPress as a CMS, optimizing them to load quickly is a good idea. Contributor Dave Davies walks through key steps and shows how to optimize a WordPress site for Google PageSpeed.

The post How to increase your PageSpeed in WordPress appeared first on Search Engine Land.

We hear a lot about PageSpeed from Google, and there’s no doubt it’s an important metric from both a usability and an SEO standpoint. Of course, there’s a lot more to the web than WordPress, but with it now powering over 59.3 percent of the web and Google dedicating an engineering team to work with WordPress, it deserves special attention.

Before we dive in, it’s important to clarify that in our article today we’re going to be focusing on PageSpeed,  and not page speed.

For those unfamiliar with the difference, PageSpeed is a Google metric. It’s based on a family of tools, and when we’re referring to a PageSpeed number between 0 and 100, we’re referring to the output of the PageSpeed Insights tool.

Page speed, on the other hand, generally refers to the real-world speed of a web page. And yes, it’s possible to increase one without the other, and I’ve even seen cases where improving one is at the cost of the other.

In short, we’re going to focus on the Google metric in this article as it relates to WordPress sites. Whenever you are working on one, it’s important to be measuring the other, too, so as not to shoot yourself in the foot.

One-or-the-other metrics

While I’ll be drawing on my experiences with impacting PageSpeed or page speed, my experiences are drawn from a scenario I have never seen or been involved with before. I am going to run this little experiment while writing this article so I can provide screen shots and output numbers.

It is worth noting that as I write this, I do not know where we’ll end up regarding final numbers. We’re shooting for 80+ to hit the “Good” level, but that isn’t always possible. I consider anything above 70 to be reasonable, as it gives a bit of wiggle room to drop over time and stay above the 60 threshold, where we drop into the “Low” grade.

I can’t give the specific URL in this exercise, and you won’t see the starting numbers by the time you read this, but I want to stress again that I have never seen this specific scenario or anything this low before. I am going to use Search Engine Land as a placeholder in some of the screen shots, but this little experiment is being run on a different URL.

Here’s what we’re starting with:

The scores to begin with are:

  • Mobile: 57/100
  • Desktop: 0/100

And yes, I’ve checked multiple times over multiple days; the report continues to show a score of 0 for the desktop! Not good. Your goal is to get as high a score as possible, with a score of 80 as the starting point for a page to be rated “Good.”

We’re also going to look at the time the page took to load, or the speed of the page, as it were. I’ll include those numbers beneath improvement metrics as well.

It’s important to note that each tool measures differently. I’ll be basing my numbers on, but works just as well.

The reason I use Dotcom is that it tests from multiple locations around the world, and the number I’m giving is the average.

Step 1: HTTPS

The first step kills two birds with one stone. The site has a secure certificate supplied and installed by the registrar. And they did a fine job, except that HTTP does not redirect to HTTPS, and Google has the HTTP version cached.

The first step is to get the site fully switched over to HTTPS. In our case, the site setting simply hadn’t been switched to HTTP in General Settings.

Switching the address to HTTPS created the 301 redirect, and the settings immediately jumped to:

  • Mobile: 61/100
  • Desktop: 0/100

Before we began, we had a page speed of 10.1 seconds. To give you an idea of what I was referring to above about multiple worldwide locations, from Denver it loaded in 3.5 seconds. After switching to HTTPS, the page speed bumped to 9.4 seconds.

If the site doesn’t automatically redirect, there’s the plugin called Force HTTPS to get the job done. Or you can, if you’re comfortable with it, add the following to your .htaccess file:

RewriteEngine On
RewriteCond %{SERVER_PORT} 80
RewriteCond %{HTTP_HOST} ^(www\.)?domain\.com
RewriteRule ^(.*)$$1 [R,L]

You’ll obviously want to switch the code from to your URL.

Step 2: Images

Anyone who’s ever tackled PageSpeed will tell you images are the most common culprit for slowing pages down. In our case, we see …

You read that right — over 15 unnecessary MiB.

The images fall into two categories of error:

  • Compressing and resizing. This means that the images are physically larger than they need to be. This happens a lot in WordPress when an image is added to the Media Library and placed on the page at a size far larger than it needs to be for the dimensions it’s occupying.
  • Compressing. Images have a ton of junk in them, and for the web, they can often be of far higher quality than needed.  Image compression deals with this. As a word of warning, if you ever use an automated image compression system try to always check and make sure the image comes out looking the way you want. It’s rare, but I’ve encountered cases where there was a noticeable degradation of the quality.

I generally either use my image editing tools or and do them manually. I’ll be using for the metrics in this article.

After optimizing one large image from 9.2MB down to 175 KB with no visual impact on the page, just by optimizing the images, we got the score to:

  • Mobile: 61/100
  • Desktop: 67/100

For page speed, we’re now coming in at 5.5 seconds, or about twice as fast.

Images weren’t the biggest issue on mobile for PageSpeed, but they were easily the biggest on the desktop. Now both scores are in the OK range.

Step 3: Browser caching

For those going through this process for the first time, if you see browser caching as an issue, Google is suggesting you tell your visitors how long their browsers should keep specific resources.

For example, you can send a message to the browser that images can be cached for two weeks. This way, when a visitor returns to your site within two weeks, the site loads more quickly, as many of the resources are simply being pulled from their own machine.

You can set time limits for caching of most resources, ranging from scripts and style sheets to most types of images.

There are two methods I tend to use when setting up browser caching: setting up directly in the .htacess file and via plug-in W3 Total Cache.

Directly in the .htaccess File

You can add some code to your .htaccess file when setting up browser caching, but a word of warning: If you’re not sure what a .htaccess file is, you’re probably better off going the plug-in route outlined further below.

If you decide to go old school and code .htaccess, you’ll need to access the site via either FTP, or, if you don’t have FTP access, you can install the plug-in WP File Manager, which grants access to the files.

You will be adding the following to your .htaccess file:

## Start browser caching ##

ExpiresActive On
ExpiresByType image/jpg “access 1 month”
ExpiresByType image/jpeg “access 1 month”
ExpiresByType image/gif “access 1 month”
ExpiresByType image/png “access 1 month”
ExpiresByType text/css “access 1 month”
ExpiresByType text/html “access 1 month”
ExpiresByType application/pdf “access 1 month”
ExpiresByType text/x-javascript “access 1 month”
ExpiresByType application/x-shockwave-flash “access 1 month”
ExpiresByType image/x-icon “access 1 year”
ExpiresDefault “access 1 month”

## End browser caching ##

You can adjust the access time frames as necessary. You would do this if you need resources refreshed in a shorter period of time. An example of this might be if images change periodically but retain the same filename.

Here’s how to add the code:

Resulting in:

  • Mobile: 62/100
  • Desktop: 72/100

Browser caching via the plug-in got us a real speed of 5.1 seconds.

Via W3 Total Cache

There are a few caching plug-ins, the most popular being W3 Total Cache and WP Super Cache.

I’ve found W3 Total Cache to provide better results across a wider array of tasks in most but not all scenarios. It never hurts to try both or others to maximize your results.

Once you’ve installed the plug-in, enabling browser caching is pretty much as easy as going to the general settings, ticking a box and clicking “Save all settings.”

Enabling browser caching via the plug-in produced the same PageSpeed scores, and the real page speed was also unchanged.

Step 4: Reduce server response time

Often, we can hit scenarios where we’re being told to reduce the time it takes for the server to respond. You might worry you need to upgrade your hosting environment, but this is very often unnecessary.

One of the main issues that slows down a server is all the messy back-and-forths between the PHP files and the database. Thankfully, W3 Total Cache offers a solution in the form of page caching. In fact, this can speed things up even if you’re not getting the server response warning.

With page caching, we are essentially creating a static copy of a page rather than requiring the server to generate the page on each visit. This takes a significant load off the server. In the case we’re addressing here, we had the server response issue, with Google reporting a 0.6 second time to respond and Dotcom Tools reporting the first-byte time of 573 milliseconds.

I turned on page caching:

And suddenly we were at:

  • Mobile: 70/100
  • Desktop: 74/100

The first-byte time dropped to 75 ms.  It’s worth noting there are customization options for this feature in the Page Cache settings. You can select the pages that are and aren’t cached there — among other things.

IMPORTANT: Remember that you’re creating cached pages, which means they don’t change. When you update a page, W3 Total Cache is configured to clear the cache for that page and rebuild it. However, more global changes like menus, widgets and so on can be updated without the cache clearing. If you make a change and don’t see the update live, simply click any of the “purge cache” or “empty cache” buttons in the plug-in area and you’ll be set.

Step 5: Minification

If you’ve ever peeked at the files making up your web page, you’ll see most have multiple lines and empty spaces. Each of these adds bytes to files. Removing these bytes is referred to as minification.

The three core types of minification that apply to WordPress sites are:

  1. HTML. Code of the actual pages themselves.
  2. CSS. Code within your styles sheets.
  3. JavaScript. The code within your various scripts.

IMPORTANT: Whenever you minify files, especially the scripts, it’s incredibly important to visit the pages of your site that rely on them to ensure they continue to function properly.

The first method you can use is to download the minified files themselves from Google:

It includes the images, but interestingly, I don’t find it does as good a job as the methods referenced above. You can download the minified version of the JavaScript and CSS here, but a problem can pop up if you update the plug-ins that created the scripts. You’ll have to do it all again.

Along the same lines, you can use tools like or

Just keep in mind that if the plug-in updates, and that update had anything to do with the script or styles, you’ll have to exclude the references that call the original files in the code. This can be annoying.

The alternative is to once more return to W3 Total Cache, which includes the function in the general settings (though you’ll need to head into the advanced settings here as well). You’ll find them at:

I highly recommend minifying them one at a time and testing the site between each. If you find things break, you can head over the minify settings and test the exclusion of specific scripts and styles sheets:

You can also just exclude pages if you find it causes problems with a specific page like the contact page or a slider. Can you tell where I’ve found the biggest issues?

Most of the time this will work, but occasionally, you’ll find that it doesn’t (as it didn’t in the scenario we’re currently working on, but it’s a good first step).  If it doesn’t improve things, I recommend the plugin Autoptimize to accomplish the same task.

With this plugin our scores are now:

  • Mobile: 70/100
  • Desktop: 75/100

This is one of the scenarios where we saw an improvement in PageSpeed with no improvement in real site speed.

And that’s it

You may find,  as we have here, there are issues you can’t fix. Google isn’t giving us a 100 percent, and here’s why:

  • Optimize images. They’re as small or smaller than those Google themselves provides, though I used the tools above. Any further compression results in the images looking degraded.
  • Eliminate render-clocking JavaScript and CSS in above-the-fold content. The only remaining issue here was a style sheet that resulted in a rendering of the page quite poorly for about a second before the styles were applied. I wanted to be realistic in the numbers I was giving, and I would not move it on my site unless I was stuck with speeds well into the “Poor” category. Always put users before engines.
  • Leverage browser caching. We have leveraged browser caching, but unfortunately, that only applies to scripts pulled in from our own sites. We can’t leverage browser caching for external scripts, such as those from Facebook or Google, as were the cases here.

Our final real-world speed at the end is 3.0 seconds and better in most of North America, with the lowest coming in at 2.2. To speed this further, we’d need to look at cleaning our WordPress code, choosing a faster host and/or deploying a CDN.

But that’s another story for another article.

The post How to increase your PageSpeed in WordPress appeared first on Search Engine Land.

Predicting the value of a search engine ranking signal /predicting-the-value-of-a-search-engine-ranking-signal-301087 Thu, 05 Jul 2018 16:37:00 +0000 /?p=301087 Contributor Dave Davies deconstructs a new Google patent that covers how machine learning can predict a ranking signal value when the value is unknown.

The post Predicting the value of a search engine ranking signal appeared first on Search Engine Land.

Google was recently granted a patent with a wide range of practical applications. The patent covers how, with machine learning, they can predict a ranking signal value when the value is unknown.

Given the vast amount of content on the internet and more coming daily, Google needs to find a way to assign value to pages even if they have not been crawled and indexed. How can a page be ranked if Google hasn’t crawled it? How can Google use a new piece of content that doesn’t have any inbound links?

The methods in this patent address how the Google algorithm may address and calculate unknown factors and use them to determine where a page ranks.

We’ll discuss the possible implementations Google may be using and a couple of the problems it solves for search engine optimization specialists (SEOs). But before we start, I feel obliged to offer my standard disclaimer.

Just because something is patented, it does not mean it is incorporated into an algorithm. We need to weigh the probabilities that the patent, or parts of it, are being used with what we see around us and what makes sense. If nothing else, it gives us a glimpse into what Google is working on.

Given the topic and methods outlined in this patent, I would say it’s highly likely that at least some iteration is in use and likely to be expanded on as machine learning systems evolve.

Patent 20180157758

Let’s begin by digging into the nuts and bolts. If you’re interested in the source, you can find the full patent here, but I’ll be covering the applications from the patent, what they mean and how they can be used.

Let’s begin with an image from the patent that won’t make sense now but will assist in the explanations to come:

Take a look at items 150 and 160 in the image above. These two factors are important and that’s what we’ll be talking about, since machine learning is used to solve significant search issues SEOs have complained about for years.

The problem

While the system we’ll be discussing has a variety of applications, the patent outlines one core issue in section 0008:

The search system can update a search engine index that indexes resources with the generated values of the search engine ranking signals for the resources and the generated values can then be used by the search engine in ranking the resources. Thus, the completeness of the search engine index and, in turn, the accuracy and the efficiency of the search engine can be improved.

Basically, they have identified a significant problem: In the absence of a known ranking signal value, there isn’t a way to rank content, even if the content is best suited for a specific query.

When there are no links

Let’s consider the following simplistic calculation for links to a new piece of content:

Number of links (signal a) = unknown or unavailable
Relevance of content to “blue widgets” (signal b) = 9.8/10
Domain value passed / Internal PageRank (signal c) = 9.2/10

Based on the calculation, we know the relevance of the page, and we know the strength the domain is passing to the page; but without knowing the number of links or their weight, how can Google properly rank the page? How can Google rank any page if they don’t know how many or what type of inbound links a page has? Any formula or algorithm that uses link count as a multiplier will zero out.

With an unknown signal value, no calculation can ever be correct, and Google won’t be able to produce the best results. As SEOs, we have a similar problem: You can’t rank without links, and it’s hard to get links for content that doesn’t rank, even with the best content for the query.

The methods in this patent give the algorithm the capability of predicting a value until it is confirmed. This prediction factor might be the most exciting aspect, as it facilitates rapid testing and accelerates the deployment of machine-learned corrections.

While a variety of permutations are discussed in the patent, at its core it comes down to training a machine learning system to generate a likely value for a ranking signal when there isn’t one.

A tale of two indexes

The method outlined in the patent requires two indexes. These should not be confused with the search index we use every day. While the intent may be to apply this to the general index, prior to that Google would use two closed indexes, separate from the general search index.

For illustration purposes, we’ll call them index A and index B.

For index A, the ranking signals value is known and applied to train the algorithm in understanding its starting point.  The algorithm has also been given pages and backlinks. Once the algorithm has been trained to understand how a web page is structured and has adapted to related elements like backlinks, a value is assigned, and signal values are then applied to the second index.

In index B, the signal values are known to the algorithm but are not incorporated into the machine learning system.  Index B trains itself by learning where it gives the correct weighting of a factor and where it does not based on the information from index A.

It’s in the second index that things become more interesting, because the algorithm also considers additional queries that may apply to the ranking signals. When the algorithm in index B tries to predict a single result, it will probably always be off a bit, but when predicting many results, the predictions become more accurate. Because of the “wisdom of the crowd” phenomenon, index B is allowed to self-correct (that’s the machine learning element at play) and does so by incorporating the additional queries and what it’s learned.

If the system in index B can determine a signal value for a number of related queries, this may assist in generating the unknown value for the initial query.

Why is this important?

It’s always valuable to understand how search engines work, but more directly, it’s valuable to understand the system that will enable new sites and new resources to rank quickly.

The two-index system described above has encoders and decoders. The encoders visit a web page and create an encoded representation. While I obviously am not privy to exactly what this would look like on the back end, based on the multiple references to entities in the patent, it’s likely a mapping of the entities within the page and known relationships to other entities in the index or in other resources.

Google has been granted a patent that lets them rank new resources (pages) using likely ranking signals. This same patent will also facilitate the creation of new signals by other engineers or machine learning systems and allow the overall algorithm to rank pages that haven’t yet been assigned a value.

New content or resources can be assigned values based on links, user behavior metrics and content quality they are likely to get. Or basically, they’ve found a way to predict the search future.

Even more groundbreaking, however, is the fact that the system offers a method to give machine learning systems the ability to generate signals on their own. Humans no longer have to tell the algorithm what is important: Machine learning teaches the algorithm to find, identify and assign a value to signals.

How you can use this patent

While there is little you can directly do to influence machine learning, you can indirectly make a difference by continuing to produce great content and promoting the development of good links.

Look at the content on your site and figure out the types of content generating traffic and links as these are metrics Google can measure through its analytic and search console tools. IMO, these are signals a machine learning system would use.

If your current content is ranking well, generating links, clicks and shares, new content may be predicted to do the same.

Review your analytics and backlinks and make note of what you’re doing right, and let that inspire future content and link-building efforts. Conversely, take note of what didn’t go well. Just as the algorithm takes note of successes, it also takes note of failures. If the trend on your site is positive, you will likely be rewarded, and if it’s negative, then the opposite is may be true.

And if you don’t rank quickly, especially for time-sensitive content, you likely won’t get the signals you need to rank the next piece, either.

The post Predicting the value of a search engine ranking signal appeared first on Search Engine Land.

What do Google and a toddler have in common? Both need to learn good listening skills. /google-is-teaching-itself-to-ask-the-right-questions-299379 Fri, 01 Jun 2018 15:55:00 +0000 /?p=299379 Contributor and patent explorer Dave Davies reviews a recently-presented paper that suggests Google is grouping entities and using their relationships to listen for better answers to multipart questions.

The post What do Google and a toddler have in common? Both need to learn good listening skills. appeared first on Search Engine Land.


At the Sixth International Conference on Learning Representations, Jannis Bulian and Neil Houlsby, researchers at Google AI, presented a paper that shed light on new methods they’re testing to improve search results.

While publishing a paper certainly doesn’t mean the methods are being used, or even will be, it likely increases the odds when the results are highly successful. And when those methods also combine with other actions Google is taking, one can be almost certain.

I believe this is happening, and the changes are significant for search engine optimization specialists (SEOs) and content creators.

So, what’s going on?

Let’s start with the basics and look topically at what’s being discussed.

A picture is said to be worth a thousand words, so let’s start with the primary image from the paper.

This image is definitely not worth a thousand words. In fact, without the words, you’re probably pretty lost. You are probably visualizing a search system to look more like:

In the most basic form, a search system is:

  • A user asks a question.
  • The search algorithm interprets the question.
  • The algorithm(s) are applied to the indexed data, and they provide an answer.

What we see in the first image, which illustrates the methods discussed in the paper, is very different.

In the middle stage, we see two parts: the Reformulate and the Aggregate. Basically, what’s happening in this new process is:

  • User asks a question to the “Reformulate” portion of the active question-answering (AQA) agent.
  • The “Reformulate” stage takes this question and, using various methods discussed below, creates a series of new questions.
  • Each of these questions is sent to the “Environment” (We can loosely think of this as the core algorithm as you would think of it today) for an answer.
  • An answer for each generated query is provided back to the AQA at the “Aggregate” stage.
  • A winning answer is selected and provided to the user.

Seems pretty straightforward, right? The only real difference here is the generation of multiple questions and a system figuring out which is the best, then providing that to the user.

Heck, one might argue that this is what goes on already with algorithms assessing a number of sites and working together to figure out the best match for a query. A slight twist, but nothing revolutionary, right?

Wrong. There’s a lot more to this paper and the method than just this image. So let’s push forward. It’s time to add some…

Machine learning

Where the REAL power of this method comes in is in the application of machine learning. Here are the questions we need to ask about our initial breakdown:

How does the system select from the various questions asked?

Which question has produced the best answer?

This is where it gets very interesting and the results, fascinating.

In their testing, Bulian and Houlsby began with a set of “Jeopardy!-like questions (which, if you watch the show, you know are really answers).

They did this to mimic scenarios where the human mind is required to extrapolate a right or wrong response.

If you’re not familiar with the game show “Jeopardy!,” here’s a quick clip to help you understand the “question/answer” concept:

From the paper:

In the face of complex information needs, humans overcome uncertainty by reformulating questions, issuing multiple searches, and aggregating responses. Inspired by humans’ ability to ask the right questions, we present an agent that learns to carry out this process for the user.

Here is one of the “Jeopardy!” questions/answers posed to the algorithm. We can see how the question can be turned into a query string:

Travel doesn’t seem to be an issue for this sorcerer and onetime surgeon; astral projection and teleportation are no problem.

Not an easy question to answer, given it requires collecting various pieces of data and also interpreting the format and context of often cryptic questions themselves. In fact, without people posting “Jeopardy!”- like questions, I don’t think Google’s current algorithms would be able to return the right results, which is exactly the problem they were seeking to address.

Bulian and Houlsby programmed their algorithm with “Jeopardy!”-like questions and calculated a successful answer as one that gave a right or wrong answer. The algorithm was never made aware of why an answer was right or wrong, so it wasn’t given any other information to process.

Because of the lack of feedback, the algo couldn’t learn success metrics by anything more than when it got a correct answer. This is like learning in a black box which is akin to the real world.

Where did they get the questions?

Where did the questions used in the test come from? They were fed to a “user” in the Reformulate stage. Once the questions were added, the process:

  • Removed stop words from the query.
  • Put the query to lowercase.
  • Added wh-phrases (who, what, where, when, why).
  • Added paraphrasing possibilities.

For paraphrasing, the system uses the United Nations Parallel Corpus, which is basically a dataset of over 11 million phrases fully aligned with six languages. They produced various English-to-English translators that would adjust the query but maintain the context.


So here’s where this all landed us:

After training the systems, the results were pretty spectacular. The system they developed and trained beat all variants and improved performance dramatically. In fact, the only system that did better was a human.

Here is a small sample of the types of queries that ended up being generated:

What they have developed is a system which can accurately understand complex and convoluted questions and, with training, produce the correct answer with a surprising degree of accuracy.

So what, Dave? What does this get me?

You might be asking why this matters. After all, there are constant evolutions in search and constant improvements. Why would this be any different?

The biggest difference is what it means for search results. Google also recently published a paper for the ICLR Conference that suggested Google can produce its own content based on data provided by other content producers. 

We all know that just because a paper is written, it doesn’t mean a search engine is actually implementing the concept, but let’s pause a minute for the following scenario:

  1. Google has the capabilities of providing its own content, and that content is well-written.
  2. Google has a high confidence in its capabilities of determining the right answer. In fact, by tweaking its capabilities, it may surpass humans.
  3. There are multiple examples of Google working to keep users on its site and clicking on its search results with layout and content changes.

With this all stacked up, we need to ask:

  • Will this impact search results? (It probably will.)
  • Will it hinder a webmaster’s content production efforts?
  • Will it restrict the exposure of our content to a greater public?

Again, just because a paper is published, it does not mean the contents will be implemented; but Google is gaining the capability of understanding complex nuances in a language in a way that surpasses humans. Google is also interested in keeping users on Google properties because, at the end of the day, they are a publishing company, first and foremost.

What can you do?

You do the same thing you’ve always done.  Market your website.

Whether you are optimizing to be in the top 10 of the organic results or optimizing for voice search or virtual reality, the same number of blue widgets is being sold. You just need to adapt, since search engine result pages (SERPs) change quickly.

The methods we’re seeing used here raise an important subject everyone interested in search engine optimization (SEO) should be paying close attention to, and that’s the use of entities.

If you look at the query sets above that were generated by the systems Bulian and Houlsby created, you’ll notice that in general, the closer they are to accurately understanding the relationship between entities, the better the answer.

The specific wording is irrelevant, in fact. Fully deployed, the system would not be required to use words you or I understand. Thankfully, they enable us to see that success is attained through grouping entities and their relationships in a way that makes giving an answer based on those relationships more reliable.

If you’re just getting your feet wet in understanding entities, there’s a piece here that introduces the concept and covers of the ins and outs. I guarantee that you’ll quickly see how they relate, and you need to focus on this area as we head into the next generation of search.

The post What do Google and a toddler have in common? Both need to learn good listening skills. appeared first on Search Engine Land.

Searcher intent: The secret ingredient behind successful content development /searcher-intent-the-secret-ingredient-behind-successful-content-development-297222 Wed, 02 May 2018 14:06:00 +0000 /?p=297222 Contributor Dave Davies takes the guesswork out of determining what type of content will resonate with an audience by creating Excel formulas to help determine what a searcher may be looking for.

The post Searcher intent: The secret ingredient behind successful content development appeared first on Search Engine Land.

Google’s goal is to satisfy a searcher’s intent. When a user finds what they’re looking for after clicking on an organic search result, that’s a success.

Sounds easy enough, but things get complicated when there are multiple results that may fulfill the primary intent of a given query.

What is “the primary intent of a given query?” Let’s look at the search phrase “Real estate in Miami” to help answer the question.

Primary intent

Someone searching the term “real estate in Miami” is probably looking to either buy or sell a property. This is the primary intent of the search phrase. We could search virtually any site that accesses a Multiple Listing Service (MLS) in the United States and finds results using the phrase.

Algorithms use math and math (in this context) and rely on probability. When Google is determining which results to rank highest, they are looking to maximize the probability that the searcher will leave satisfied. When many indexed sites meet a primary intent, the algorithm needs to look at secondary intents to see what other information the searcher may be looking for.

Secondary intents increase the probability a site will meet their intent. Here’s a very simple example:

  • Assume 90 percent of the searchers were looking to buy or sell real estate.
  • Assume that 10 percent of the searchers were looking for information on the real estate market in Miami.

Most, if not all sites, will fulfill the 90 percent of intents, but only those sites with information on the market itself (Miami) on top of the listings would fulfill 100 percent of the users’ intents.

In the absence of such a site, Google must provide listings that fulfill different intents, knowing any given searcher may potentially click the result that fulfills the wrong intent and be disappointed.

This happens when it’s the only option Google has, or for diversity, but when a single resource fulfills multiple intents and thus increases the probability they will satisfy the user, that site is more likely to rank.

There is supplemental data that doesn’t match the first intents but which does match supplemental needs. When a searcher has fulfilled their primary intent (found properties of interest, read the market research and so on), it’s logical that they may have a next step.

For example, when searching for real estate in Miami, it’s logical to want to know what schools are in the area, the crime rate, property values and more. Once you find a site that provides the information, and based on history, Google knows what query will probably be searched on next.

If your website contains the information of the next request, there is a high probability that your site will fulfill that searcher’s intent.

You need to be a fortune teller

This leaves you, me and everyone else with the problem of determining exactly what “user intent” means.

This is especially annoying if you like to work with hard numbers. How do you put a hard number on something like meeting user intent? With complex machine learning algorithms, a massive data center and a sneak peek at Google’s algorithms, it might be possible, but let’s assume for a second we don’t have the time, skill or resources for that.

I had to come up with a way to determine where the holes were in the canvas we’re painting to satisfying user intent and come up with a way of determining where these holes were acceptable and where they were not.

Step 1: Building a keyword list

The first step is going to be building a keyword list. Chances are you’ve done this before, but this time we’re going to be expanding our search beyond those conversion phrases we often focus on.

The conversion phrases you probably already have in your lists are likely (though not necessarily) focused on primary intent terms. If you are a realtor in Miami, terms like “Miami real estate” are on your list.

This time, we’re going to take a different approach. We’re going to look for the secondary and supplemental intent terms, which can be a tedious task but incredibly valuable.

The first thing you’ll do is head over to your favorite keyword research tools. Each has its pros and cons, but thankfully, for our purposes, it doesn’t really matter which you use. We’re not looking for traffic estimates, we’re looking for content ideas so that specific search volumes don’t matter as much as understanding the ratios.

I’m going to assume everyone in the audience either has access to Google’s Keyword Planner or can convert the process to the tool you use. Now it’s time to build your list.

How broad you want to go on your keyword research will vary by your niche, but I generally suggest going as broad as possible. In this instance, I would start with the single word “Miami” and get the results:

Add all the keywords to a plan, making sure to choose “exact” as the match type. Doing so does not actually add all the top queries that include “Miami”; Google reduces the queries to what they deem likely, so they don’t include everything.

To supplement this list and ensure you are getting all the most important terms, you’ll also want to query “Miami real estate,” “Miami homes,” “Miami neighborhoods,” “Miami mortgages” and pretty much anything else you can think of that your target market might look for related to primary and secondary intents.

Download your list and delete all the columns other than the Keyword and Average Monthly Searches (exact match only). Now the real work begins.

Step 2: Time to classify

The next step is to classify your keywords.

Group your terms together by their intent. The more granular you get, the better.

You can use whatever system you like to classify your keywords. To keep things simple, I tend to add a column to my spreadsheet and use an alphanumeric key.

If it’s a short list with a few classes, I’ll use numbers. If it’s a larger list, I’ll go with letters or a combination depending mostly on keyboard position.

It’s important to also keep a notepad doc or similar log of what your classifications mean. It’s only worth classifying items that might conceivably be of interest to a searcher prior to, during or shortly after the conversion cycle.

I kept the categorization a little more basic here. For example, you may notice I lumped all events into one class to include everything from “what to do” queries to places to go. There are times when a broad approach like this may work well as a first round, followed by a second round of this same research for each section if it’s going to be a substantial build-out. I won’t be taking it that far, but this needed to be noted.

At the end of this stage, you’ll have something that looks like:

Step 3: Numbering

The next step is to put some numbers on the various types of content.

You’ve determined all the queries not classified “not applicable” (NA) could be involved at some point in the searcher’s journey. Using what we have, we can see how each of these areas can assist in improving the probability of our site meeting a user’s intent.

1. First, delete all the rows for keywords that have zero relevance.

2. Order by class and then add the key and a probability field like:

3. The next step is to find the total volume of queries that relate in some way to a conversion using the SUM feature:

4. And finally, you’ll use the SUM again to find the percentage impact a specific grouping of content has on intent (with a big BIG caveat we’ll get to shortly). In the probability field for each key entry, you’ll simply add the formula: =SUM(B2:B43)*100/B144

Adjusting the bolded B2:B43 to reflect the class rows (e.g., in rows 2 through 43 in my spreadsheet we had the values for class 1 (real estate).

This part of the formula will add these cells together. The bolded B144 is to be replaced by the cell you created in Step 3 giving the total search volume of all queries.

Altogether, this formula creates the percentage of all relevant searches that each class represents. In our case, we get:

In the last step, we need to adjust the weight because some classes are relevant but not equal.

A user looking for mortgage information is obviously more likely to be interested in real estate than one looking for weather information. So, we need to adjust the numbers based on their impact on the specific users we’re targeting. To do this, we need to add three final columns. Here’s what the final product will look like:

The first column we will add is G, which I have titled “Likely.”  It uses a scale of 100.

I have graded what I believe is the likelihood a searcher of that class of terms would be related to my users.

In my example, I believe 2 percent of users searching for schools are interested in real estate.

In column H, I have added a “Working” column, which is an adjusted value. I need to know what the probability column values would be when I take into account the multiplier from the “Likely” column.

You will notice the “real estate” value is multiplied by 100 in the “Working” column, as it had a “Likely” multiplier of 100. “Schools” only doubled, as it had a multiplier of 2.

This is done by adding the following formula into cell H3:  =F3*G3

If you select the bottom-left corner of the top cell and drag it down (or double-click it), the formula will copy down through the rows you have a “Probability” value for.

Once that’s done, use the SUM formula to add and total all the adjusted values which (in my case) became ~434.38.

And now, the final step is to add the following formula into I3 in what I’ve called my “Adj Weight” column:  =F3*G3/$H$19*100

Drag it down or double-click, and you will need to adjust the bold in the formula to reference the SUM cell in column H.

Notice I have placed a string ($) before the column and row references. When you drag a formula cell to copy it to rows below it, the values change. What was a reference to cell F3 would adjust to cell F4. By placing the string ($) before each value, we stop this from happening, in this case, locking the reference to cell H19 instead of having that value change to H20, H21 and so on.

This formula gives us the percentage weight of each class after our “Likely” value is factored in.

Now we have a roadmap

Now we have a roadmap for content development, and we have assigned a probability to the various content classes.

As with almost all data, this needs to be viewed with a critical eye, but it gives us a very good starting point as to where our opportunities lie and where we should be looking to expand our content to increase the probability of meeting the searcher’s intent.

If the only content that appeared on the site was real estate listings you would have a 54.58 percent probability of meeting the searcher’s intent (when we include secondary and supplemental intents).

If we add content related to schools in the area, we will be adding 13.46 percent probability to our example.

One major consideration to keep in mind is that it all relies on your choosing the right “Likely” values.

Choose incorrectly and you will skew to the wrong types of content. Before launching into valuing classes of content, I generally review the content of the top 10 sites for my primary terms AND also view the top five or 10 sites for content ranking for that class.

This will assist in confirming that you are correct in your valuation; if some or more of your ranking competitors have this content it increases the odds that it is indeed valuable and you’ve assigned correctly.

Reviewing the sites that rank for content using specific keywords will also assist in confirming your interpretation of what the query means or what Google believes the users are looking for is correct.


This is how you put numbers on your content classes to get an idea what types of content will increase the probability that you will meet your searcher’s intent.

In this fast-changing and crucial area of search engine optimization (SEO), this method can help you keep focused on what’s most likely to move the ranking needle.

The post Searcher intent: The secret ingredient behind successful content development appeared first on Search Engine Land.

Google patent on related entities and what it means for SEO /google-patent-on-related-entities-and-what-it-means-for-seo-295727 Thu, 05 Apr 2018 17:00:00 +0000 /?p=295727 Contributor Dave Davies pulls key points from a newly awarded Google patent on related entities and points out the ranking benefits of strengthening entity associations in your SEO and link-building efforts.

The post Google patent on related entities and what it means for SEO appeared first on Search Engine Land.

I read a lot of patents, many of which may or may not apply to search engine optimization (SEO) or be used by Google at all.

But that’s not the case with the recently granted Google patent “Related entities.” I believe this patent is being applied and it gives us significant insight into how Google identifies entities and the related entities people are searching for.

Let’s look at some details I think are interesting and get a general understanding of the patent and its intent. Understanding how Google associates entities will help us grasp and use the connections to SEO.

Related entities

Let’s start with understanding related entities, especially in the context of Google patent US 20180046717A1.  

If you search on the phrase “presidents of the united states,” this is what you may see:

The presidents shown are “related entities” and listed because the general phrase “presidents of the united states” was searched on. Different people are shown, but all share a common denominator, being President of the United States.

How does Google know to show these particular people when a general phrase is queried? That is what the patent explains. It essentially discusses how these related entities are selected and how they are displayed.

Let’s look at another example. If we click the image of Donald Trump on the page, we are taken to a query for his name that appears as:

When I search his name without previously searching for anything President-related (and being logged out), this is what I see:

We can see the breadcrumb navigation at the top of the results which started appearing in February of 2018, but in addition, we see the context carrying forward.

When we searched for presidents, a carousel of presidents in chronological order was presented, and when we click an image, the context is carried with it, something that does not occur when we search a president in isolation.

So, what does this mean, and what does it have to do with the patent? Let’s begin by digging into a few core areas, and I will highlight the key points.

Entity database

One of my favorite takeaways is the idea there is an actual entity database.

Essentially, this is a separate database which is only tasked with understanding the various entities on the internet, what attributes they have and how they are interconnected.

For our purposes here, we need to remember that an entity is not simply a person, place or thing but also its characteristics.

These characteristics are connected by relationships. If you read a patent, the entities are referred to as “nodes,” and the relationships as “edges.” Some of the clearly prominent entities and relationships involved with Barack Obama are:

  • Has name Barack Obama.
  • Has position President of the United States.
  • Has birthplace Honolulu Hawaii.
  • Has spouse — Michelle Obama.
  • Has net worth $12.1 million.

And so forth.

According to this common logic and other patents, there is a separate database outside of the general search index:

I believe this is important, and we’ll get back to it after looking at relatedness.

Determining relatedness

The patent touches on the important subject of determining relatedness.

We discussed how relatedness applies to other areas when optimizing for voice search. There are a few key ways that Google determines the relatedness of entities, but one key mechanism that comes up repeatedly is the co-occurrence of the entities in the same resources.

In our example above, this would mean the various presidents would appear on the same page often, thus indicating to Google they are related.

Alternatively, one can assume each entity appearing in the carousel would be there regardless of whether they occurred frequently or infrequently on the same page together. Even if President Jimmy Carter did not ever appear on the same page as Donald Trump, they would be associated by the phrase “president of the united states” because each man is connected to that phrase.

This is an incredibly important idea for content marketing and general SEO outside of the patent we’re discussing.

Determining priority

An area of the patent that applies less to general SEO but is still worth discussing here is that Google needs a mechanism for determining which entities and relationships are most important.

Currently, Donald J. Trump is President of the United States, but he’s also a businessman and could be connected to that entity by the relationship “has/had job.” And yet, when searching his name, we see results for him as president and not a businessman.

Here’s another example: Ronald Reagan was an actor for far longer than he was a politician or president. And yet, when we search his name, his presidential information is returned first:

Why was either man not shown as a businessman or actor when only their names were searched?

One of the key mechanisms Google uses to determine which entity and relationship are the most important is the freshness (how recent are the co-relations we discussed above), as well as the click-through rate on related queries combined with what users type in after a query.

Basically, if people typed in “president of the united states” more often than “business person”  or “actor,” the importance of that relationship would be increased.

Overarching factor

Authoritative sites, especially those related to a specific subject matter, are given a higher priority in determining the relationships between entities.

For example, a Wikipedia page on Ronald Reagan that discusses his role as president would be considered authoritative and strengthen the relationship between his name and the term “president.”

If we were talking about technical SEO, Search Engine Land would be considered an authority since it is associated with the process and a flagship publication in the SEO industry.

Think of it as PageRank for entities, even though there’s no green bar to tell you when you’re on the right track.

Now, let’s look at the question, “What does this mean for SEO?”

The core of the patent

A lot of what’s in the patent applies to general SEO, and not just by displaying related options within search results.

The idea of an entity database separated from the general search system reinforced in yet another Google patent strikes a chord with me. You can think of it as a database that maps all the links across the web to pass PageRank — only more powerful.

Instead of simply keeping a record of all the links and anchors from around the web, it’s taking things one step further and includes an understanding of the relationships between entities.

If you operate a hotel in New York City, and that hotel name is frequently referenced on pages with the entity “hotel,” the relationship between the brand and the word “hotel” will be strengthened.

Further, if the hotel also exists on pages optimized for “New York City,” that entity relationship will be reinforced whether there is an active link or not. Even if topically unrelated pages use the phrase “New York City” and the name of the hotel, the relevance score goes up.

Interestingly, being included on a page with other brands that are already strongly related to New York hotels would aid your efforts as well, essentially piggy-backing on the relatedness of their brand and passing it off to yours.

And unlike PageRank, which reduces based on the number of links, I have read nothing about diminishing returns related to entities. But that isn’t to say it doesn’t happen. It’s worth considering.

Competing brands

Continuing with my hotel example, having said “hotel” on a page with competing brands would, by my logic, assist in boosting the strength of the relationship for “hotels.”

But if the page is also about dining and activities in New York, the relationship may soften.

There is no information I know of to suggest whether entity association is an on-and-off, relative-or-not scenario or whether the more entities referenced, the less any one is valued. This would make sense, and if that is the case, then pages with a focus would logically reinforce a specific entity association more than a general page.

We do know the patents suggest that proximity to an entity is a signal, so the closer two terms appear on a page, the stronger the relationship association is.

As with PageRank, authority matters. Unlike PageRank, the link doesn’t and if there’s a link. Whether it uses a nofollow attribute or not would be irrelevant.

Now, to be clear, I am referring to entity relationship building and not PageRank. PageRank and links are still powerful signals, but they are not what we’re talking about here. I’m not telling you to ignore link building or that nofollow links are as powerful as followed links, but for what we’re covering here, nofollow would not play a role.

Wikipedia uses nofollow attributes on its outbound links, yet those links pass a powerful signal.

Some might even argue sites that using nofollow links still have a high value, provided the content and structure is presented in a way that the entities can easily be associated.


This patent gives us some idea of how to strengthen the association of our site or brand with specific terms and entities.

The idea that we can push our rankings forward through entity associations, and not just links, is incredibly powerful and versatile. Links have tried to serve this function and have done a great job, but there are a LOT of advantages for Google to move toward the entity model for weighting as well as a variety of other internal needs.

Again, I am not suggesting you abandon your link building. Do this in addition to building links, or even better, focus your link-building efforts on ways that can accomplish both tasks at once.

If nothing else, you’ll be forcing yourself to pursue links on pages with a strong topical or geographic relevance to the attributes you want to be associated with.

Think about it this way: Even if this patent is rubbish, you’ll still be doing smart marketing.

The post Google patent on related entities and what it means for SEO appeared first on Search Engine Land.

Optimize for voice search by keeping it short and to the point /optimize-for-voice-search-by-keeping-it-short-and-to-the-point-293470 Tue, 13 Mar 2018 15:54:38 +0000 /?p=293470 Contributor Dave Davies explains the many layers and aspects of Google Voice Search and how to optimize your content for it.

The post Optimize for voice search by keeping it short and to the point appeared first on Search Engine Land.


OK, Google … how do I optimize for voice search?

Ask that question and you’ll discover even Google doesn’t know but is trying to learn.

For those of us in the search engine optimization (SEO) field who want to stay up to date, waiting for Google to figure it out isn’t much help. We need to know what’s going on, and we need to know it before our competitors get the jump on us.

Who uses voice search?

Before we dive into the approaches we need to take to optimize for voice search, let’s take time to gain an understanding of who is using it.

Our friends over at Stone Temple Consulting published their findings after surveying 1,000 people on their use of voice commands. Here are some highlights:

  • People are becoming more comfortable using voice search in public.
  • The 35-to-44 age group is the largest segment using voice search.
  • The 25-to-34 age group is most comfortable using voice search in public.
  • The heaviest users of voice search have an income above $50,000 per year.

Add to this the Gartner research that predicts 75 percent of US homes will have a smart speaker by 2020:

It appears we will have a deep saturation of a technology with strong buying power in the near future.

You may be thinking, “Yes, Dave, we know voice search is important, and we know who is searching using voice, but what can we do to get our content in front of it all?”

Excellent question. Let’s take a look.

Voice search ranking factor

Clearly, the environment is changing rapidly, and it is difficult to predict specifically how users will interact with their devices using voice.

The winners in the voice space will be those who pay close attention to the various devices that launch and how they are used.

Understanding the new device capabilities and who is using them is step one.

Recently, Greg Sterling covered a study done by Backlinko on voice search ranking factors.

The study is based on 10,000 Google Home search results and is close to what I’ve experimented with on my own device on a much smaller scale.

In the findings, they note some results may be due to causality, while others may be coincidence or correlation. Understanding what’s at play is crucial to understanding what Google is looking at.

There are several key takeaways from the Backlinko study I feel are important to note:

  • Answers are 29 words on average. When you’re structuring the data you want to become a voice “answer,” make sure it’s short and to the point. This means formatting the page so an answer can be easily drawn from it and understood to be a complete answer to the question.

For example, ask Google what the Pythagorean theorem is and you’ll hear this 25-word reply:

  • The average writing level of a result was targeted to the ninth-grade reading level, so keep it simple.
  • Presently, voice search results seem to serve a more generic audience. I don’t expect this to last long; ranking for the present requires writing to the masses.
  • Google may eventually cater the reading level to the individual searching and implied education level of the query.
  • The average word count of pages used to draw voice search results was 2,312 words. This suggests Google wants to draw results from authoritative pages.

With each page we create, we need to keep in mind the entity we are discussing and the intent(s) we need to satisfy when trying to optimize for voice and general search.


An entity is basically a noun connected by relationships.

If answering the question, “who is Dave Davies,” Google needs to search their database of entities for the various Dave Davieses and determine the one most likely to satisfy the searcher’s intent. They will then compare that with the other entities related to it to determine its various traits.

When someone searches for Dave Davies, Google usually assumes they are looking for Dave Davies of The Kinks and not the author of this article.

I will get to why in a minute. Let’s look briefly at how Google connects the various entities around the musician Dave Davies.

A very small connection structure to illustrate might look something like:

What we are seeing here are the entities (referenced in patents as nodes) and how they are connected.

So, for example, the entity “Dave Davies” is connected to the entity “Ray Davies” by the relationship “Has Brother.”

He would also be connected to the entity “February 3, 1947” by the relationship “Has Birthday” and the entity “Kinks” by the relationship “has Band.”

Other people in the band will also share this entity point with Dave, enabling them to all appear for a query such as:

OK Google, who was in the Kinks

to which Google will reply:

The band members of the Kinks include Ray Davies, Dave Davies, Mick Avory and others.

To illustrate further the connected understanding Google applies to entities and their importance, they allow Google to respond to multiple questions without explicit direction and to understand the weight and prominence of specific entities to determine which to rank.

For example, Dave Davies of The Kinks is a more prominent entity than Dave Davies the SEO, so if I ask “who is Dave Davies,” it will reference the Wikipedia page of the Kinks guitarist.

Understanding the entity relationships and how they’re referenced on the web helps Google determine this but it’s also the reason why we can follow up with the question, “OK Google, who is Dave Davies’ brother,” and “Ray Davies” is given as the answer.

This is what will provide us the blueprint for creating the content that will rank in voice search. Understanding how entities relate to each other and giving concise and easily digested information on as many related topics as possible will ensure that Google sees us as the authoritative answer.

And not just for the first questions but also supplemental questions, thus increasing the probability our content will satisfy the user intent.

Circling back

This explains why the Backlinko study found longer content tended to rank better. A longer piece of content (or a grouping of pages, well-connected/linked and covering the same subject) is not just more likely to answer the user intent and potential follow-up questions but also eliminates any possibility that the entity selection is incorrect.

Let’s consider my own bio here on Search Engine Land. Why does Google not accidentally select this bio when answering the query, “who is Dave Davies?”

The bio is on a strong site, is tied to entity relationships such as my position, website and Twitter profile. That is a lot of information about me, so why not select it?

Wikipedia has enough content on the Dave Davies from the Kinks page and enough supporting entity data to confirm he is the correct Dave Davies.


What we see here is that covering as many related entities and questions as possible in our content is critical to ranking well for voice search. It extends beyond voice, obviously, but due to the absence of anything other than position zero, voice is far more greatly impacted.

Earlier, I mentioned Google determines which entity the user is likely to be referencing when there are multiples to select from.

In the end, it comes down to intent, and Google determines intent based on a combination of related factors from previous queries.

If I simply ask “OK Google, who is his brother” without first asking it about Dave Davies, Google will not be able to reply. Google uses a system of metrics related to authority and relevance to determine which would win in a generic environment.

While not all patents are used, some iteration of their patent “Ranking Search Results Based On Entity Metrics” probably is. According to the patent, Google uses the following four metrics to determine which entity is strongest:

  • Relatedness. As Google sees relationships or entities appear relatedly on the web (e.g., “Dave Davies” and “Ray Davies”), they will connect these entities.
  • Notability. This relates to notability in the field. Basically, it takes into account the popularity of the entity in question and also the popularity of the field as a whole. The music industry is a bit more notable than the SEO industry, and the Kinks are listed as one of the most influential bands of all times.
  • Contribution. Google will weight entities by reviews, fame rankings and similar information. Some may suggest Dave Davies of the Kinks is a little more famous than I am.
  • Prizes. More weight will be added to an entity or aspect of that entity based on prizes and awards. This isn’t referring to a lotto but rather something like a Grammy. Had I won a Nobel Prize for SEO, I might have been selected.

There is more to determining the generic intent reply than a single patent, but this gives us a very good idea how it’s calculated.

The next step in ranking on voice search is to isolate which entities will have these metrics and cover them by writing targeted content well.

Cover the core answer, but also consider all the various entities connected to that answer to reinforce that you’re referring to the same entity and also have the authority and information to give the best answer.

Bottom line

If you want to rank in voice search, you need three things:

  • A strong domain.
  • Strong content.
  • Content divided into logical and easily digested segments.

Out of the three, I feel that easily digested content and weight are the most influential elements.

Of course, getting a site up to par with Wikipedia is a massive undertaking, but I suspect we will see this weighting drop in importance as Google gains confidence in its capabilities to actually determine quality content and context.

The post Optimize for voice search by keeping it short and to the point appeared first on Search Engine Land.

Looking back at 2018 in search: A time traveler’s year in review /2018-year-review-289384 Wed, 10 Jan 2018 18:53:01 +0000 /?p=289384 What does 2018 have in store for search marketers? Columnist (and time-traveler) Dave Davies pays a visit from the future to share what this year's major search developments will be.

The post Looking back at 2018 in search: A time traveler’s year in review appeared first on Search Engine Land.


Greetings from the future! I’m writing to you from January 2019. Since search is such a dynamic space, with every year bringing unexpected developments, I thought it would be helpful to use my knowledge as a denizen of the future to give you a glimpse into what’s to come in 2018. So for you, this is a look forward — but for me, it’s a year in review. And let me warn you, you’d best buckle up!

(Let’s get the obvious out of the way first: Sorry folks, I can’t tell you which cryptocurrencies take off, as I promised some guy named Doc Brown I wouldn’t, but I can say that AI-investment programs sure do a number on it.)

The top stories in search in 2018

The big question for Search Engine Land readers, of course, is, What the heck will happen in search in 2018? Obviously, I can’t cover everything, but here are the top stories that will make the news for you this year.

1. The mobile-first index rolls out

Mobile First Indexing

As we were all promised, the mobile-first index rolled out — and it did not go smoothly. In fact, after some limited testing in which they rolled it out and pulled it back a couple of times, they finally just tore off the Band-Aid and decided to sort out the remaining problems while folks scrambled to figure out why their rankings were fluctuating so much.

Despite Google’s best efforts, it hit major issues with mobile-specific sites that had structures that were different from the desktop versions. Many sites experienced a huge loss in search visibility, particularly for long-tail queries, as Google struggled to find the less-linked-to mobile counterparts of previously high-performing desktop pages. Some pages were being dropped from the index entirely.

When asked about it, Google reps blamed developers and SEOs not crawling their sites with mobile crawlers. SEOs, on the other hand, pointed out that they were doing what Google asked (building sites for users based on their devices) and were being impacted negatively by that decision in a lot of cases.

Responsive sites were less impacted by the mobile-first index, as their mobile and desktop content were already one and the same, but the ranking fluctuations certainly affected them indirectly.

Google did ultimately iron it all out, but it was not a fun couple of months for many, and even the folks who gained ground weren’t happy when those gains were rolled back.

2. Voice-first devices continue to grow

Voice-First Devices

As predicted, voice-first devices grew in popularity, as did the use of personal assistants. Homes and Echos filled twice as many nooks and crannies by the holidays in 2018 vs. 2017, with both Google and Amazon both literally giving them away to critical user bases (including Prime Members by Amazon and Pixel phone users by Google).

The downside of this for the search community was that the adoption of the technology was happening faster than the optimization strategies for it were evolving. Yes, folks knew to use structured data to boost their odds of being selected as the answer for some voice queries. But Google was relying more on early-generation machine learning algorithms to glean answers from sites, leaving SEOs scrambling to figure out how to optimize for voice — and how to use that voice interaction to somehow move someone towards a conversion. To make matters worse, due to the infancy of all the related tech, voice search results were changing incredibly rapidly throughout the year.

Warning! By Q3, you’ll be sick of hearing about the changing ways to “rank” on voice-first devices, and you’ll be resigned to simply waiting until Google is sophisticated enough and has enough experience to provide a consistent set of results based on a consistent set of principles.

3. Machine learning takes off

Machine Learning

Equally unsurprising was the growth in machine learning, specifically its application toward guiding purchasing decisions via search.

Unfortunately, this proved a pro and a con for Google. On one hand, by the time Black Friday was upon us, Google’s systems had grown quite adept at not just understanding our own behaviors but also connecting our profiles with those around us. This proved incredibly effective — Google was able to steer consumers away from Amazon and toward its own advertisers by delivering gift ideas when the receiver was known to be away from the purchaser, and the suggested ideas were often spot-on.

On the downside, stores saw a lot of returned items post-holidays, as Google didn’t properly coordinate the ads — often, the same gift suggestion was sent to multiple people, who then purchased the same product for the same person. Unfortunately, Google only knows what Google knows, so when users left the search engine to make the suggested purchase on a non-Google property, Google didn’t register it and offered the same suggestion to others.

Perhaps more interesting in 2018 was the growth in counter-search algorithms. That is, black-hat SEOs began developing their own AI and machine learning systems to exploit similar holes in Google’s own system. Basically, where once black-hat SEOs were pitting their own tactics against engineers to find the holes, in 2018 we saw the emergence of black-hat SEO as a battle to develop machine learning systems that would isolate and exploit those left out by other machines.

I suspect we’re going to see a lot more of this as we head into 2019 and beyond, though I don’t buy the rumors that Amazon is developing their own exploits; I suspect they’re more focused on developing their own systems for product suggestion and conversion improvement.

4. Backlinks lose ground as a ranking signal


Saving the best for last, probably the most interesting shift in SEO we saw in 2018 was the increasing number of reports that skewed rankings away from pure links. To be sure, links are definitely a strong signal in 2019 — but the types of links and how/if they are weighted has produced a scenario where, for many query types, links don’t even factor into the top five most important ranking factors. Even when links are among the top factors, how they are weighted has become very difficult to isolate.

Interestingly, there were query types for which it appeared that links typically considered “low-quality” — such as forum comments — carried weight over a link in Forbes. And there were instances in which links to a specific page seemed irrelevant, but overall, links to a domain were huge.

Reportedly, this was an extension of the RankBrain algorithm into links and other signals. After a tumultuous integration and testing phase, it does appear to produce more relevant results. Google spokesperson Danny Sullivan did confirm that links will likely be a factor for the foreseeable future, but their weight would be skewed in a variety of ways dependent on various contextual factors for a given query, such as searcher intent and industry vertical.

What I would have done differently

Given the opportunity to redo 2018, there’s one thing I’d do differently — and that’s trying to follow Google’s progress in a case-by-case way. Following news and developments within the search industry is important for staying informed, but a trap I fell into throughout 2018 was constantly trying to adjust strategies based on major updates pushed by Google.

Basically, with things moving so rapidly, it became increasingly important to give each adjustment time to settle. For example, Google’s initial mobile-first push in 2018 proved to require a lot of tweaking post-rollout, which meant that initial analyses of the mobile-first index’s impact were obsolete by the time anyone had time to fully process them. Such is the state of SEO in a world of such rapid adjustment!

The post Looking back at 2018 in search: A time traveler’s year in review appeared first on Search Engine Land.

Visualizing your site structure in advance of a major change /visualizing-site-structure-advance-major-change-286510 Wed, 13 Dec 2017 18:46:40 +0000 /?p=286510 Making big changes to your website structure? Columnist Dave Davies shares a data visualization method that can help you predict what effect your proposed site structure changes will have on SEO performance.

The post Visualizing your site structure in advance of a major change appeared first on Search Engine Land.


In our last article, we looked at some interesting ways to visualize your website structure to illuminate how external links and PageRank flow through it. This time, we’re going to use the same tools, but we’re going to look instead at how a major site structure change might impact your site.

Search engine crawlers can determine which pages on your site are the most important, based, in part, on how your internal links are structured and organized. Pages that have a lot of internal links pointing to them — including links from the site’s navigation — are generally considered to be your most important pages. Though these are not always your highest-ranking pages, high internal PageRank often correlates with better search engine visibility.

Note: I use the phrase “internal PageRank,” coined by Paul Shapiro, to refer to the relative importance of each page within a single website based on that site’s internal linking structure. This term may be used interchangeably with “page weight.”

The technique I’ll outline below can be used to consider how internal PageRank will be impacted by the addition of new sections, major changes to global site navigation (as we’ll see below) and most major changes to site structure or internal linking.

Understanding how any major change to a site could potentially impact its search visibility is paramount to determining the risk vs. reward of its implementation. This is one of the techniques I’ve found most helpful in such situations, as it provides numbers we can reference to understand if (and how) page weight will be impacted by a structural adjustment.

In the example below, we’re going to assume you have access to a staging server, and that on that server you will host a copy of your site with the considered adjustments. In the absence of such a server, you can edit the spreadsheets manually to reflect the changes being considered. (However, to save time, it’s probably worth setting up a secondary hosting account for the tests and development.)

It’s worth noting that on the staging server, one need only mimic the structure and not the final design or content. Example: For a site that I’m working on, I considered removing a block of links in a drop-down from the global site navigation and replacing that block of links with a single text link. That link would go to a page containing the links that were previously in the drop-down menu.

When I implemented this site structure change on the staging server, I didn’t worry about whether any of this looked good — I simply created a new page with a big list of text links, removed all the links from the navigation drop-down, and replaced the drop-down with a single link to the new page.

I would never put this live, obviously — but my changes on the staging server mimic the site structure change being considered, giving me insight into what will happen to the internal PageRank distribution (as we’ll see below). I’ll leave it to the designers to make it look good.

For this process, we’re going to need three tools:

  1. Screaming Frog — The free version will do if your site is under 500 pages or you just want a rough idea of what the changes will mean.
  2. Gephi — A free, powerful data visualization tool.
  3. Google Analytics

So, let’s dive in…

Collecting your data

I don’t want to be redundant, so I’ll spare you re-reading about how to crawl and export your site data using Screaming Frog. If you missed the last piece, which explains this process in detail, you can find it here.

Once the crawl is complete and you have your site data, you need simply export the relevant data as follows:

Bulk Export > Response Codes > Success (2xx) Inlinks

You will do this for both your live site and your staging site (the one with the adjusted structure). Once you have downloaded both structures, you’ll need to format them for Gephi. All that Gephi needs to create a visualization is an understanding of your site pages (“nodes”) and the links between them (“edges”).

Note: Before we ready the data, I recommend doing a Find & Replace in the staging CSV file and replacing your staging server domain/IP with that of your actual site. This will make it easier to use and understand in future steps.

As Gephi doesn’t need a lot of the data from the Screaming Frog export, we’ll want to strip out what’s not necessary from these CSV files by doing the following:

  • Delete the first row containing “Success (2xx) Inlinks.”
  • Rename the “Destination” column “Target.”
  • Delete all other columns besides “Source” and “Target.” (Note: Before deleting it, you may want to do a quick Sort by the Type column and remove anything that isn’t labeled as “AHREF” — CSS, JS, IMG and so on — to avoid contaminating your visualization.)
  • Save the edited file. You can name it whatever you’d like. I tend to use domain-live.csv and domain-staging.csv.
Edges and Nodes spreadsheet

The third set of data we’ll want to have is an Export of our organic landing pages from Google Analytics. You can use different metrics, but I’ve found it extremely helpful to have a visual of which pages are most responsible for my organic traffic when considering the impact of a structural change on page weight. Essentially, if you find that a page responsible for a good deal of your traffic will suffer a reduction in internal PageRank, you will want to know this and adjust accordingly.

To get this information into the graph, simply log into Google Analytics, and in the left-hand navigation under “Behavior,” go to “Site Content” and select “Landing Pages.” In your segments at the top of the page, remove “All Users” and replace it with “Organic Traffic.” This will restrict your landing page data to only your organic visitors.

Expand the data to include as many rows as you’d like (up to 5,000) and then Export your data to a CSV, which will give you something like:

Remove the first six rows so your heading row begins with the “Landing Page” label. Then, scroll to the bottom and remove the accumulated totals (the last row below the pages), as well as the “Day Index” and “Sessions” data.

Note that you’ll need the Landing Page URLs in this spreadsheet to be in the same format as the Source URLs in your Screaming Frog CSV files. In the example shown above, the URLs in the Landing Page column are missing the protocol (https) and subdomain (www), so I would need to use a Find & Replace to add this information.

Now we’re ready to go.

Getting a visualization of your current site

The first step is getting your current site page map uploaded — that is, letting Gephi know which pages you have and what they link to.

To begin, open Gephi and go to File > Import Spreadsheet.  You’ll select the live site Screaming Frog export (in my case, yoursite-live.csv) and make sure the “As table:” drop-down is set to “Edges table.”

Importing to Gephi

On the next screen, make sure you’ve checked “Create missing nodes,” which will tell Gephi to create nodes (read: pages) for the “Edges table” (read: link map) that you’ve entered. And now you’ve got your graph. Isn’t it helpful?


OK, not really — but it will be. The next step is to get that Google Analytics data in there. So let’s head over to the Data Laboratory (among the top buttons) and do that.

First, we need to export our page data. When you’re in the Data Laboratory, make sure you’re looking at the Nodes data and Export it.

Data Laboratory export

When you open the CSV, it should have the following columns:

  • Id (which contains your page URLs)
  • Label
  • Timeset

You’ll add a fourth column with the data you want to pull in from Google Analytics, which in our case will be “Sessions.” You’ll need to temporarily add a second sheet to the CSV and name it “analytics,” where you’ll copy the data from your analytics export earlier (essentially just moving it into this Workbook).

Now, what we want to do is fill the Sessions column with the actual session data from analytics. To do this, we need a formula that will look through the node Ids in sheet one and look for the corresponding landing page URL in sheet two; when it finds it, it should insert the organic traffic sessions for that page into the Sessions column where appropriate.

Probably my most-used Excel script does the trick here. In the top cell of the “Sessions” column you created, enter the following (the bolded numbers will change based on the number of rows of data you have in your analytics export).


Once completed, you’ll want to copy the Sessions column and use the “Paste Values” command, which will switch the cells from containing a formula to containing a value.

All that’s left now is to re-import the new sheet back into Gephi. Save the spreadsheet as something like data-laboratory-export.csv (or just nodes.csv if you prefer). Using the Import feature from in the Data Laboratory, you can re-import the file, which now includes the session data.

Now, let’s switch from the Data Laboratory tab back to the Overview tab. Presently, it looks virtually identical to what it had previously — but that’s about to change. First, let’s apply some internal PageRank. Fortunately, a PageRank feature is built right into Gephi based on the calculations of the initial Google patents. It’s not perfect, but it’s pretty good for giving you an idea of what your internal page weight flow is doing.

To accomplish this, simply click the “Run” button beside “PageRank” in the right-hand panel. You can leave all the defaults as they are.

PageRank in Gephi

The next thing you’ll want to do is color-code the nodes (which represent your site pages) based on the number of sessions and size them based on their PageRank. To do this, simply select the color palette for the nodes under the “Appearance” pane to the upper left. Select sessions from the drop-down and choose a palette you like. Once you’ve chosen your settings, click “Apply.”

Apply color in Gephi

Next, we’ll do the same for PageRank, except we’ll be adjusting size rather than color. Select the sizing tool, choose PageRank from the drop-down, and select the maximum and minimum sizes (this will be a relative sizing based on page weight). I generally start with 10 and 30, respectively, but you might want to play around with them. Once you’ve chosen your desired settings, click “Apply.”

Adjust node size by PageRank in Gephi

The final step of the visualization is to select a layout in the bottom left panel. I like “Force Atlas” for this purpose, but feel free to try them all out. This gives us a picture that looks something like the following:

Live site visual from Gephi.

You can easily reference which pages have no organic traffic and which have the most based on their color — and by right-clicking them, you can view them directly in the Data Laboratory to get their internal PageRank. (In this instance, we can learn one of the highest traffic pages is a product page with a PageRank of 0.016629.) We can also see how our most-trafficked pages tend to be clustered towards the center, meaning they’re heavily linked within the site.

Now, let’s see what happens with the new structure. You’ll want to go through the same steps above, but with the Screaming Frog export from the staging server (in my case, domain-staging.csv). I’m not going to go make you read through all the same steps, but here’s what the final result looks like:

Visual representation of staging site in Gephi

We can see that there are a lot more outliers in this version (pages that have generally been significantly reduced in their internal links). We can investigate which pages those are by right-clicking them and viewing them in the Data Laboratory, which will help us locate possible unexpected problems.

We also have the opportunity to see what happened to that high-traffic product page mentioned above. In this case, under the new structure, its internal PageRank shifted to 0.02171 — in other words, it got stronger.

There are two things that may have caused this internal PageRank increase: an increase in the number of links to the page, or a drop in the number of links to other pages.

At its core, a page can be considered as having 100 percent of its PageRank. Notwithstanding considerations like Google reduction in PageRank with each link or weighting by position on the page, PageRank flows to other pages via links, and that “link juice” is split among the links. So, if there are 10 links on a page, each will get 10 percent. If you drop the total number of links to five, then each will get 20 percent.

Again, this is a fairly simplified explanation, but these increases (or decreases) are what we want to measure to understand how a proposed site structure change will impact the internal PageRank of our most valuable organic pages.

Over in the Data Laboratory, we can also order pages by their PageRank and compare results (or just see how our current structure is working out).

PageRank in the Data Laboratory in Gephi.


This is just the tip of the iceberg. We can substitute organic sessions for rankings in the page-based data we import (or go crazy and include both). With this data, we can judge what might happen to the PageRank of ranking (or up-and-coming) pages in a site structure shift. Or what about factoring in incoming link weight, as we did in the last article, to see how its passing is impacted?

While no tool or technique can give you 100 percent assurance that a structural change will always go as planned, this technique assists in catching many unexpected issues. (Remember: Look to those outliers!)

This exercise can also help surface unexpected opportunities by isolating pages that will gain page weight as a result of a proposed site structure change. You may wish to (re)optimize these pages before your site structure change goes live so you can improve their chances of getting a rankings boost.

The post Visualizing your site structure in advance of a major change appeared first on Search Engine Land.