Max Cyrek – Search Engine Land News On Search Engines, Search Engine Optimization (SEO) & Search Engine Marketing (SEM) Mon, 23 Dec 2019 14:05:39 +0000 en-US hourly 1 Page load time and crawl budget rank will be the most important SEO indicators in 2020 /page-load-time-and-crawl-budget-rank-will-be-the-most-important-seo-indicators-in-2020-326847 Mon, 23 Dec 2019 14:10:00 +0000 /?p=326847 Based on my own testing, PLT and CBR are the technical aspects I believe will determine website success, or failure, in the new year.

The post Page load time and crawl budget rank will be the most important SEO indicators in 2020 appeared first on Search Engine Land.

Google has the ability to impose its own rules on website owners, both in terms of content and transparency of information, as well as the technical quality. Because of this, the technical aspects I pay the most attention to now – and will do so next year – are the speed of websites in the context of different loading times I am calling PLT (Page Load Time).

Time to first byte (TTFB) is the server response time from sending the request until the first byte of information is sent. It demonstrates how a website works from the perspective of a server (database connection, information processing and data caching system, as well as DNS server performance). How do you check TTFB? The easiest way is to use one of the following tools:

  • Developer tools in the Chrome browser
  • WebPageTest
  • Byte Check

Interpreting results

TTFB time below 100ms is an impressive result. In Google’s recommendations, TTFB time should not exceed 200ms. It is commonly adopted that the acceptable server response time calculated to receiving the first byte may not exceed 0.5s. Above this value, there may be problems on a server so correcting them will improve the indexation of a website.

Improving TTFB

1. Analyze the website by improving either the fragments of code responsible for resource-consuming database queries (e.g. multi-level joins) or heavy code loading the processor (e.g. generating on-the-fly complex tree data structures, such as category structure or preparing thumbnail images before displaying the view without the use of caching mechanisms).

2. Use a Content Delivery Network (CDN). This is the use of server networks scattered around the world which provide content such as CSS, JS files and photos from servers located closest to the person who wants to view a given website. Thanks to CDN, resources are not queued, as in the case of classic servers, and are downloaded almost in parallel. The implementation of CDN reduces TTFB time up to 50%.

3. If you use shared hosting, consider migrating to a VPS server with guaranteed resources such as memory or processor power, or a dedicated server. This ensures only you can influence the operation of a machine (or a virtual machine in the case of VPS). If something works slowly, the problems may be on your side, not necessarily the server.

4. Think about implementing caching systems. In the case of WordPress, you have many plugins to choose from, the implementation of which is not problematic, and the effects will be immediate. WP Super Cache and W3 Total Cache are the plugins I use most often. If you use dedicated solutions, consider Redis, Memcache or APC implementations that allow you to dump data to files or store them in RAM, which can increase the efficiency.

5. Enable HTTP/2 protocol or, if your server already has the feature, HTTP/3. Advantages in the form of speed are impressive.

DOM processing time

DOM processing time is the time to download all HTML code. The more effective the code, the less resources needed to load it. The smaller amount of resources needed to store a website in the search engine index improves speed and user satisfaction.

I am a fan of reducing the volume of HTML code by eliminating redundant HTML code and switching the generation of displayed elements on a website from HTML code to CSS. For example, I use the pseudo classes :before and :after, as well as removing images in the SVG format from HTML (those stored inside <svg> </svg>).

Page rendering time

Page rendering time of a website is affected by downloading graphic resources, as well as downloading and executing JS code.

Minification and compression of resources is a basic action that speeds up the rendering time of a website. Asynchronous photo loading, HTML minification, JavaScript code migration from HTML (one where the function bodies are directly included in the HTML) to external JavaScript files loaded asynchronously as needed. These activities demonstrate that it is good practice to load only the Javascript or CSS code that is needed on a current sub-page. For instance, if a user is on a product page, the browser does not have to load JavaScript code that will be used in the basket or in the panel of a logged-in user.

The more resources needing to be loaded, the more time the Google Bot must spend to handle the download of information concerning the content of the website. If we assume that each website has a maximum number/maximum duration of Google Bot visits – which ends with indexing the content – the fewer pages we will be able to be sent to the search engine index during that time.

Crawl Budget Rank

The final issue requires more attention. Crawl budget significantly influences the way Google Bot indexes content on a website. To understand how it works and what the crawl budget is, I use a concept called CBR (Crawl Budget Rank) to assess the transparency of the website structure.

If Google Bot finds duplicate versions of the same content on a website, our CBR decreases. We know this in two ways:

1. Google Search Console

By analyzing and assessing problems related to page indexing in the Google Search Console, we will be able to observe increasing problems in the Status > Excluded tab, in sections such as:

  • Website scanned but not yet indexed
  • Website contains redirection
  • Duplicate, Google has chosen a different canonical page than the user
  • Duplicate, user has not marked the canonical page

2. Access Log

This is the best source of information about how Google Bot crawls our website. On the basis of the log data, we can understand the website’s structure to identify weak spots in architecture created by internal links and navigation elements.

The most common programming errors affecting indexation problems include:

1. Poorly created data filtering and sorting mechanisms, resulting in the creation of thousands of duplicate sub-pages

2. “Quick view” links which in the user version show a pop-up with data on the layer, and create a website with duplicate product information.

3. Paging that never ends.

4. Links on a website that redirect to resources at a new URL.

5. Blocking access for robots to often repetitive resources.

6. Typical 404 errors.

Our CBR decreases if the “mess” of our website increases, which means the Google Bot is less willing to visit our website (lower frequency), indexes less and less content, and in the case of wrong interpretation of the right version of resources, removes pages previously in the search engine index.

The classic crawl budget concept gives us an idea of how many pages Google Bot crawls on average per day (according to log files) compared to total pages on site. Here are two scenarios:

1. Your site has 1,000 pages and Google Bot crawls 200 of them every day. What does it tell you? Is it a negative or positive result? 

2. Your site has 1,000 pages and Google Bot crawls 1,000 pages. Should you be happy or worried?

Without extending the concept of crawl budget with the additional quality metrics, the information isn’t as helpful as it good be. The second case may be a well-optimized page or signal a huge problem. Assume if Google Bot crawls only 50 pages you want to be crawled and the rest (950 pages) are junky / duplicated / thin content pages. Then we have a problem.

I have worked to define a Crawl Budget Rank metric. Like Page Rank, the higher the page rank, the more powerful outgoing links. The bigger the CBR, the fewer problems we have.

The CBR numerical interpretation can be the following:

IS – the number of indexed websites sent in the sitemap (indexed sitemap)

NIS – the number of websites sent in the sitemap (non-indexed sitemap)

IPOS – the number of websites not assigned in the sitemap (indexed pages outside sitemap)

SNI – the number of pages scanned but not yet indexed

The first part of the equation describes the state of a website in the context of what we want the search engine to index (websites in the sitemap are assumed to be the ones we want to index) versus the reality, namely what the Google Bot reached and indexed even if we did not want that. Ideally, IS = NIS and IPOS = 0.

In the second part of the equation, we take a look at the number of websites the Google Bot has reached versus the actual coverage in indexing. As above, under ideal conditions, SNI = 0.

The resulting value multiplied by 10 will give us a number greater than zero and less than 10. The closer the result is to 0, the more we should work on CBR.

This is only my own interpretation based on the analysis of projects that I have dealt with this past year. The more I manage to improve this factor (increase CBR), the more visibility, position and ultimately the traffic on a website is improved.

If we assume that CBR is one of the ranking factors affecting the overall ranking of the domain, I would set it as the most important on-site factor immediately after the off-site Page Rank. What are unique descriptions optimized for keywords selected in terms of popularity worth if the Google Bot will not have the opportunity to enter this information in the search engine index?

User first content

We are witnessing another major revolution in reading and interpreting queries and content on websites. Historically, such ground-breaking changes include:

  1. Quantity standards – 1,000 characters with spaces and three money keywords in the content. Up to a certain moment, it was a guarantee of success, one day it simply ceased to matter.
  2. Thin content – traffic built on tags packed with keywords. Overnight, this strategy stopped working, as did artificially generated low-quality content (text mixers).
  3. Duplicate content – the Google Bot has learned (more or less well) which text indexed in the search engine is original (created first), and which is a copy. As a consequence, Panda (Google algorithm) was created. Every few months it filtered and flagged low-quality websites and reduced their ranking, as well as search engine positions. Currently, it works in “live” mode.
  4. Rank Brain – an algorithm that, using machine learning, interprets the queries of search engine users with less emphasis on keywords, and more on query context (including query history), as well as displays more context-specific results.
  5. E-A-T – elimination of content that is misleading or likely to be misleading due to the low authority of the author of the content, and thus the domains. This particularly affected the medical and financial industry. Any articles not created by experts, yet concerning the above spheres of life, can cause a lot of damage. Hence the fight of Google with domains containing poor content and quality.

Creating content for specific keywords is losing importance. Long articles packed with sales phrases lose to light and narrowly themed articles if the content is classified as one that matches the intentions of a user and the search context.


BERT (Bi-directorial Encoder Representations from Transformers) is an algorithm that tries to understand and interpret the query at the level of the needs and intentions of a user. For example, the query – How long can you stay in the US without a valid visa? – can display both the results of websites where we can find information on the length of visas depending on the country of origin (e.g. for searches from Europe), as well as those about what threatens the person whose visa will expire, or describing how to legalize one’s stay in the US.

Is it possible to create perfect content? The answer is simple – no. However, we can improve our content.

In the process of improving content so that it is more tailored, we can use tools such as ahrefs (to build content inspirations based on competition analysis), semstorm (to build and test longtail queries including e.g. search in the form of questions) and surferseo (for comparative analysis content of our website with competition pages in SERP), which has recently been one of my more favorite tools.

In the latter, we can carry out comparative analysis at the level of words, compound phrases, HTML tags (e.g., paragraphs, bolds and headers) by pulling out common “good practices” that can be found on competition pages that pull search engine traffic into themselves.

This is partly an artificial optimization of content, but in many cases, I was able to successfully increase the traffic on the websites with content I modified using the data collected by the above tools.


As I always highlight, there is no single way to deal with SEO. Tests demonstrate to us whether the strategy, both concerning creating the content of a website or the content itself, will prove to be good.

Under the Christmas tree and on the occasion of the New Year, I wish you high positions, converting movement and continuous growth!

The post Page load time and crawl budget rank will be the most important SEO indicators in 2020 appeared first on Search Engine Land.

Ready, Set, Go! Googlebot Race /ready-set-go-googlebot-race-314894 Tue, 09 Apr 2019 12:00:48 +0000 /?p=314894 The test demonstrated Googlebot ignores rel=next and rel=prev tags; but it is worth taking a closer look at pages containing infinite scroll in future.

The post Ready, Set, Go! Googlebot Race appeared first on Search Engine Land.

The Googlebot Race is an unusual tournament watched daily with engagement by over 1.8 billion websites. The tournament consists of many competitions commonly referred to as “ranking factors.” Every year, somebody tries to describe as many of them as possible, but nobody really knows what they are all about and how many there are. Nobody but Googlebot. It is he who daily traverses petabytes of data, forcing webmasters to compete on the weirdest fields, to choose the best ones. Or that is what he thinks.

The 1,000 meters run (with steeplechase) – we are checking indexation speed. For this competition, I presented five similar data structures. Each of them had 1000 subpages with unique content and additional navigation pages (e.g. other subpages or categories). Below you can see the results for four running tracks.

This data structure was very poor with 1,000 links to subpages with unique content on one page (so 1,000 internal links). All SEO experts (including me…) repeat it like a mantra: no more than 100 internal links per page or Google will not manage to crawl such an extensive page and it will simply ignore some of the links, and it will not index them. I decided to see if it was true.

This is an average running track. Another 100 subpages (on each of them, visible links to a few former pages, a few following pages, to the first one and to the last one). On each subpage, 10 internal links to pages with content. The first page consists of the meta robots tag index/follow, the other one noindex/follow.

I wanted to introduce a little confusion, so I decided to create a silo structure on the website, and I divided it into 50 categories. In each of them, there were 20 links to content pages divided into two pages.

The next running track is the dark horse of this tournament. No normal pagination/paging. Instead, solely rel=”next” i rel=”prev” headlines paging/pagination, defining the following page to which Googlebot should go.

Running track number five is similar to number two. The difference is that I got rid of noindex/follow and I set canonical tags for all subpages to the first page.

and they took up…

hits – total number of Googlebot visits

indexed – number of indexed pages

I must admit that I was disappointed by the results. I was very much hoping to demonstrate that the silo structure would speed up the crawling and the indexation of the site. Unfortunately, it did not happen. This kind of structure is the one that I usually recommend and implement on websites that I administer, mainly because of the possibilities that it gives for internal linking. Sadly, with a larger amount of information, it does not go hand in hand with indexation speed.

Nevertheless, to my surprise, Googlebot easily dealt with reading 1,000 internal links, visiting them for 30 days and indexing the majority. But it is commonly believed that the number of internal links should be 100 per page. This means that if we want to speed the indexation up, we should create website’s maps in HTML format even with such a large number of links.

At the same time, classic indexation with noindex/follow is absolutely losing against pagination with the use of index/follow and rel=canonical directing to the first page. In the case of the last one, Googlebot was expected not to index specific paginated subpages. Nevertheless, from 100 paginated subpages, it has indexed five, despite the canonical tag to page one, which shows again (I wrote about it here) that setting canonical tags does not guarantee avoiding the indexation of a page and the resulting mess in the search engine’s index.

In the case of the above-described test, the last construction is the most effective one for the number of pages indexed. If we introduced a new notion Index Rate defined by the proportion of the number of Googlebot visits to the number of pages indexed, e.g., within 30 days, then the best IR in our test would be 3,89 (running track 5) and the worst one would be 6,46 (running track 2). This number would stand for average number of Googlebot’s visits on a page required to index it (and keep it in the index). To further define IR, it would be worth verifying the indexation daily for a specific URL. Then, it would definitely make more sense.

One of the key conclusions from this article (after a few days from the beginning of the experiment) would be demonstrating that Googlebot ignores rel=next and rel=prev tags. Unfortunately, I was late to publish those results (waiting for more) and John Muller on March 21 announced to the world that indeed, these tags are not used by Googlebot. I am just wondering whether the fact that I am typing this article in Google Docs has anything to do with it (#conspiracytheory).

It is worth taking a look at pages containing infinite scroll – dynamic content uploading, uploaded after scrolling down to the lower parts of the page and the navigation based on rel=prev and rel=next. If there is no other navigation, such as regular pagination hidden in CSS (invisible for the user but visible for Googlebot) we can be sure that Googlebot’s access to newly uploaded content (products, articles, photos) will be hindered.

The post Ready, Set, Go! Googlebot Race appeared first on Search Engine Land.

Learn how to manage product unavailability without hurting your SEO /learn-how-to-manage-product-unavailability-without-hurting-your-seo-311447 Fri, 01 Feb 2019 17:42:29 +0000 /?p=311447 When your e-commerce site has out-of-stock products, here are strategies to manage your indexed pages and turn the customer experience into a sale.

The post Learn how to manage product unavailability without hurting your SEO appeared first on Search Engine Land.

An everyday problem in online shops, which has given a headache to many entrepreneurs, is the unavailability of products. From a report prepared by Daniel Corsten and Thomas Gruen [pdf] we find out that their research conducted in seven countries has shown that every fifth product offered online is unavailable. What does it mean? Not only the lost chances to sell, but also the losses related to marketing investment which might never monetize.

The unavailability of a product can be divided into two categories for the customer:

  1. Exit the shop and find another shop which offers the same product, switch the sales channel (e.g., look for the product in a brick and mortar store) or simply change their mind and not buy if they were not sufficiently motivated to do it).
  2. Stay at the shop and wait until the product is available, look for a different product of the same brand which meets their expectations, or chooses a product of a different brand.

The economic goal of every e-commerce site is to sell. In this article, we will look at the ways of dealing with product unavailability without hurting the business, and how to turn product unavailability into sales.

Find out how much you are losing

Each industry is different and each online shop has its own rules. To know the scale of the problem, you need to know the numbers so put Google Analytics to work. You can get the display page information about the product (name and product code) to indicate the product’s availability. If you want more precise information, you can send the information once per session by saving a cookie with the information and whether the information about product’s unavailability has already been sent to Google Analytics. If you do not know how to do this, I recommend you to read the article by Adam Greco, who describes it in detail.

Once you implement the above mechanism in the report about the products, you will see how many chances to sell you are losing. If you add the price of the missing product and multiply it by the average conversion rate, you will find how much you are not going to earn.

At the same time, you will find out how many visitors visited an unavailable product. You will see whether they choose a different product, wait until their desired product is available (e.g., by activating the availability alert) or leave the store without purchasing anything.

Let us say you have 10,000 products from which, statistically, 20 percent can be permanently or temporarily unavailable. Let us consider a few extreme cases and answer the question of how to deal with supply shortages in online shops.

No incoming traffic, no internal traffic

You have a product which does not kindle any interest. Perhaps it is the right moment to reconsider the offer? You are thinking that there is no point in keeping in the shop products which are not popular and even if they are available, they will not convert.

You will say that it does not cost anything and has nothing to do with SEO? Well, it does.

One of Google’s main goals is to provide the user with good quality content in search results. If you can help them to fulfill this mission, you will be awarded better visibility and more traffic.

A metric influencing the kind of information stored in a search engine’s index is the crawl budget. The amount of content indexed by the search engine, and how current it is, significantly influence the service’s visibility in search results. Even though crawl budget is not considered to be a ranking signal, it is hard to imagine traffic on a website if the subpages are not being indexed and the information in search results is not current.

When you stay in the search engine’s index, say 2,000 pages, which have no value (this can be one of the possible conclusions if they do not generate traffic) and they are not useful for a user (even for one who has already reached the shop in a different way) it is worth considering disposing of such products both from the search engine’s index and from the shop itself. This will translate into the speed with which Google Bot crawls the website. How to do it?

  1. If there are products equivalent that generates traffic and conversion, create redirection 301 to the similar products
  2. If there are no equivalent products, redirect the removed subpage to the category where the product was.

There is no organic traffic, but the pages of the products are still being visited

The situation gets complicated here. You have unavailable products which are not on the subpages where users’ visits began but the users are reaching them anyway. It is likely that when those products were available, the users were buying them.

For an e-commerce site with about 8,000 subpages I work with, I excluded unavailable products from category pages and gave them the noindex, nofollow attributes because there were no available equivalents to them in the shop. The unavailable products could be found only in the internal search results. By doing this:

  • the service’s site has been temporarily decreased, thanks to which its power has been distributed onto fewer subpages
  • the number of subpages that Google Bot needed to index has been limited
  • most users do not like to come across unavailable products, so thanks to removing such items from category pages, they do not experience frustration

The measure described above has changed the service’s structure and influenced its SEO visibility for category pages, which has brought a slight increase in traffic throughout three months. After replenishing the storage supplies, re-indexing all product subpages and indexing them again in all sections of the shop, visibility and traffic have returned to the former level.

Pages converting with external traffic

In this case, wrong decisions can bring not only a loss of conversion but also the loss of traffic. Having products which are well visible in search results means we cannot afford to remove them and redirect to a subpage of a similar product. Even though this could be favorable from the point of view of SEO, because the site would be decreased, redirecting the traffic (with temporary redirections 302) to a similar subpage would confuse the user.

Lack of redirection from the removed product and the server’s response with error 404 (page not found) will result in:

  • the user leaving the website
  • loss of power from external links, leading to pages of the products (if such exist)

Moreover, if there is no redirection, the message “product unavailable” will usually make them go back to search results which will be noted in the higher bounce rate.

Here, before taking further steps, it is necessary to identify the reason why the product is unavailable.

Temporary product unavailability

Even if the approximate date of the product’s availability is known and it is within the range of one to four weeks, it is worth trying to solve the problem of its temporary unavailability in a marketing way. I mean things like a “clock” counting down time to the delivery day with the option of buying the item before it is in the supply again. It resembles advanced sale. If you offer the user a discount for prolonged waiting time, many of those who do not need the product immediately will surely use the opportunity to buy the product at a lower price.

Seasonal products and categories

Here you should think not only about individual products but also about entire categories. Removing a category like “Christmas Gifts” or “Black Friday Deals” which convert once per year would be a mistake.

On the other hand, changing product pages into information hubs makes sense. It allows to continually build visibility for these categories in organic results, in order to swap the content of the page to a selling one when the right time comes. At the same time, disconnecting the page from the menu’s structure will not hurt.

What about seasonal products? If you are not a manufacturer of Christmas decorations, you do not need to store glass balls or other decorations all year round. You cannot even be sure that the same products will be available a year later.

Removing or hiding products with 404 message by redirecting them to the category page will be the best solution.

Products discontinued and with unknown delivery date

Forward the clients (but do not send them directly with 301 redirection) to similar products in order to finalize the purchase. A product which is unavailable in the store should not be a blind alley on the path to purchase. Take advantage of the traffic that already exists.

Under no circumstances should you remove such product subpages or even redirect them even with the 301 redirection.

The solution that I recommend is to present an alternative, similar product on the product card of the unavailable one in a way that is clear enough for the customer to proceed or to add the product to the basket.

In this way you will be able to enjoy the traffic and conversions from a product that has been discontinued.

This strategy can also be used do generate traffic from products that have never been offered in your shop because of the importer’s aggressive distribution policy. In such case, the competition is usually small in organic search results which translates into potentially big traffic with low SEO expenses.

The post Learn how to manage product unavailability without hurting your SEO appeared first on Search Engine Land.

Here’s what happened when I followed Googlebot for 3 months /heres-what-happened-when-i-followed-googlebot-for-3-months-308674 Wed, 28 Nov 2018 12:40:23 +0000 /?p=308674 This experiment uncovered no direct way to bypass the First Link Counts Rule with modified links but it was possible to build a structure using Javascript links.

The post Here’s what happened when I followed Googlebot for 3 months appeared first on Search Engine Land.


On internet forums and content-related Facebook groups, discussions often break out about how Googlebot works – which we shall tenderly call GB here – and what it can and cannot see, what kind of links it visits and how it influences SEO.

In this article, I will present the results of my three-month-long experiment.

Almost daily for the past three months, GB has been visiting me like a friend dropping by for a beer.

Sometimes it was alone:

[02/09/2018 18:29:49]: /page1.html Mozilla/5.0 (compatible; Googlebot/2.1; +

[02/09/2018 19:45:23]: /page5.html Mozilla/5.0 (compatible; Googlebot/2.1; +

[02/09/2018 21:01:10]: /page3.html Mozilla/5.0 (compatible; Googlebot/2.1; +

[02/09/2018 21:01:11]: /page2.html Mozilla/5.0 (compatible; Googlebot/2.1; +

[02/09/2018 23:32:45]: /page6.html Mozilla/5.0 (compatible; Googlebot/2.1; +

Sometimes it brought its buddies along:

[16/09/2018 19:16:56]: /page1.html Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Search Console) Chrome/41.0.2272.118 Safari/537.36

[16/09/2018 19:26:08]: /image.jpg Googlebot-Image/1.0

[27/08/2018 23:37:54]: /page2.html Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +

And we had lots of fun playing different games:

Catch: I observed how GB loves to run redirections 301 and crawl images, and run from canonicals.

Hide-and-seek: Googlebot was hiding in the hidden content (which, as its parents claim, it does not tolerate and avoids)

Survival: I prepared traps and waited for it to spring them.

Obstacles: I placed obstacles with various levels of difficulty to see how my little friend would deal with them.

As you can probably tell, I was not disappointed. We had tons of fun and we became good friends. I believe our friendship has a bright future.

But let’s get to the point!

I built a website with merits-related content about an interstellar travel agency offering flights to yet-undiscovered planets in our galaxy and beyond.

The content seemed to have a lot of merits when in fact it was a load of nonsense.

The structure of the experimental website looked like this:

Experimental website structure

I provided unique content and made sure that every anchor/title/alt, as well as other coefficients, were globally unique (fake words). To make things easier for the reader, in the description I will not use names like anchor cutroicano matestito, but instead refer them as anchor1, etc.

I suggest that you keep the above map open in a separate window as you read this article.

Part 1: First link counts

One of the things that I wanted to test in this SEO experiment was the First Link Counts Rule – whether it can be omitted and how it influences optimization.

The First Link Counts Rule says that on a page, Google Bot sees only the first link to a subpage. If you have two links to the same subpage on one page, the second one will be ignored, according to this rule. Google Bot will ignore the anchor in the second and in every consecutive link while calculating the page’s rank.

It is a problem widely overseen by many specialists, but one that is present especially in online shops, where navigation menus significantly distort the website’s structure.

In most stores, we have a static (visible in the page’s source) drop-down menu, which gives, for example, four links to main categories and 25 hidden links to subcategories. During the mapping of a page’s structure, GB sees all the links (on each page with a menu) which results in all the pages being of equal importance during the mapping and their power (juice) is distributed evenly, which looks roughly like this:

The most common but in my opinion, the wrong page structure.

The above example cannot be called a proper structure because all the categories are linked from all the sites where there is a menu. Therefore, both the home page and all the categories and subcategories have an equal number of incoming links, and the power of the entire web service flows through them with equal force. Hence, the power of the home page (which is usually the source of most of the power due to the number of incoming links) is being divided into 24 categories and subcategories, so each one of them receives only 4 percent of the power of the homepage.

How the structure should look:

If you need to fast test the structure of your page and crawl it like Google does,  Screaming Frog is a helpful tool.

In this example, the power of the homepage is divided into four and each of the categories receives 25 percent of the homepage’s power and distributes part of it to the subcategories. This solution also provides a better chance of internal linking. For instance, when you write an article on the shop’s blog and want to link to one of the subcategories, GB will notice the link while crawling the website. In the first case, it will not do it because of the First Link Counts Rule. If the link to a subcategory was in the website’s menu, then the one in the article will be ignored.

I started this SEO experiment with the following actions:

  • First, on the page1.html, I included a link to a subpage page2.html as a classic dofollow link with an anchor: anchor1.
  • Next, in the text on the same page, I included slightly modified references to verify whether GB would be eager to crawl them.

To this end, I tested the following solutions:

  • To the web service’s homepage, I assigned one external dofollow link for a phrase with a URL anchor (so any external linking of the homepage and the subpages for given phrases was out of question) – it sped up the indexing of the service.
  • I waited for page2.html to start ranking for a phrase from the first dofollow link (anchor1) coming from page1.html. This fake phrase, or any other that I tested could not be found on the target page. I assumed that if other links would work, then page2.html would also rank in the search results for other phrases from other links. It took around 45 days. And then I was able to make the first important conclusion.

Even a website, where a keyword is neither in the content, nor in the meta title, but is linked with a researched anchor, can easily rank in the search results higher than a website which contains this word but is not linked to a keyword.

Moreover, the homepage (page1.html), which contained the researched phrase, was the strongest page in the web service (linked from 78 percent of the subpages) and still, it ranked lower on the researched phrase than the subpage (page2.html) linked to the researched phrase.

Below, I present four types of links I have tested, all of which come after the first dofollow link leading to page2.html.

Link to a website with an anchor

< a href=”page2.html#testhash” >anchor2< /a >

The first of the additional links coming in the code behind the dofollow link was a link with an anchor (a hashtag). I wanted to see whether GB would go through the link and also index page2.html under the phrase anchor2, despite the fact that the link leads to that page (page2.html) but the URL being changed to page2.html#testhash uses anchor2.

Unfortunately, GB never wanted to remember that connection and it did not direct the power to the subpage page2.html for that phrase. As a result, in the search results for the phrase anchor2 on the day of writing this article, there is only the subpage page1.html, where the word can be found in the link’s anchor. While Googling the phrase testhash, our domain does not rank either.

Link to a website with a parameter


Initially, GB was interested in this funny part of the URL just after the query mark and the anchor inside the anchor3 link.

Intrigued, GB was trying to figure out what I meant. It thought, “Is it a riddle?” To avoid indexing the duplicate content under the other URLs, the canonical page2.html was pointing at itself. The logs altogether registered 8 crawls on this address, but the conclusions were rather sad:

  • After 2 weeks, the frequency of GB’s visits decreased significantly until it eventually left and never crawled that link again.
  • page2.html wasn’t indexed under the phrase anchor3, nor was the parameter with the URL parameter1. According to Search Console, this link does not exist (it is not counted among incoming links), but at the same time, the phrase anchor3 is listed as an anchored phrase.

Link to a website from a redirection

I wanted to force GB to crawl my website more, which resulted in GB, every couple of days, entering the dofollow link with an anchor anchor4 on page1.html leading to page3.html, which redirects with a 301 code to page2.html. Unfortunately, as in the case of the page with a parameter, after 45 days page2.html was not yet ranking in the search results for the anchor4 phrase which appeared in the redirected link on page1.html.

However, in Google Search Console, in the Anchor Texts section, anchor4 is visible and indexed. This could indicate that, after a while, the redirection will begin to function as expected, so that page2.html will rank in the search results for anchor4 despite being the second link to the same target page within the same website.

Link to a page using canonical tag

On page1.html, I placed a reference to page5.html (follow link) with an anchor anchor5. At the same time, on page5.html there was unique content, and in its head, there was a canonical tag to page2.html.

< link rel=“canonical” href=”” />

This test gave the following results:

  1. The link for the anchor5 phrase directing to page5.html redirecting canonically to page2.html was not transferred to the target page (just like in the other cases).
  2. page5.html was indexed despite the canonical tag.
  3. page5.html did not rank in the search results for anchor5.
  4. page5.html ranked on the phrases used in the page’s text, which indicated that GB totally ignored the canonical tags.

I would venture to claim that using rel=canonical to prevent the indexing of some content (e.g. while filtering) simply could not work.

Part 2: Crawl budget

While designing an SEO strategy, I wanted to make GB dance to my tune and not the other way around. To this aim, I verified the SEO processes on the level of the server logs (access logs and error logs) which provided me with a huge advantage. Thanks to that, I knew GB’s every movement and how it reacted to the changes I introduced (website restructuring, turning the internal linking system upside-down, the way of displaying information) within the SEO campaign.

One of my tasks during the SEO campaign was to rebuild a website in a way that would make GB visit only those URLs that it would be able to index and that we wanted it to index. In a nutshell: there should only be the pages that are important to us from the point of view of SEO in Google’s index. On the other hand, GB should only crawl the websites that we want to be indexed by Google, which is not obvious to everyone, for example, when an online shop implements filtering by colors, size and prices, and it is done by manipulating the URL parameters, eg.:

It may turn out that a solution which allows GB to crawl dynamic URLs makes it devote time to scour (and possibly index) them instead of crawling the page.

Such dynamically created URLs are not only useless but potentially harmful to SEO because they can be mistaken for thin content, which will result in the drop of website rankings.

Within this experiment I also wanted to check some methods of structuring without using rel=”nofollow”, blocking GB in the robots.txt file or placing part of the HTML code in frames that are invisible for the bot (blocked iframe).

I tested three kinds of JavaScript links.

JavaScript link with an onclick event

A simple link constructed on JavaScript

< a href=”javascript:void(0)” onclick=”window.location.href =’page4.html’” >anchor6< /a >

GB easily moved on to the subpage page4.html and indexed the entire page. The subpage does not rank in the search results for the anchor6 phrase, and this phrase cannot be found in the Anchor Texts section in Google Search Console. The conclusion is that the link did not transfer the juice.

To summarize:

  • A classic JavaScript link allows Google to crawl the website and index the pages it comes upon.
  • It does not transfer juice – it is neutral.

Javascript link with an internal function

I decided to raise the game but, to my surprise, GB overcame the obstacle in less than 2 hours after the publication of the link.

< a href=”javascript:void(0)” class=”js-link” data-url=”page9.html” >anchor7< /a >

To operate this link, I used an external function, which was aimed at reading the URL from the data and the redirection – only the redirection of a user, as I hoped – to the target page9.html. As in the earlier case, page9.html had been fully indexed.

What is interesting is that despite the lack of other incoming links, page9.html was the third most frequently visited page by GB in the entire web service, right after page1.html and page2.html.

I had used this method before for structuring web services. However, as we can see, it does not work anymore. In SEO nothing lives forever, apart from the Yellow Pages.

JavaScript link with coding

Still, I would not give up and I decided that there must be a way to effectively shut the door in GB’s face. So, I constructed a simple function, coding the data with a base64 algorithm, and the reference looked like this:

< a href=”javascript:void(0)” class=”js-link” data-url=”cGFnZTEwLmh0bWw=” >anchor8< /a >

As a result, GB was unable to produce a JavaScript code that would both decode the content of a data-URL attribute and redirect. And there it was! We have a way to structure a web service without using rel=nonfollows to prevent bots from crawling wherever they like! This way, we do not waste our crawl-budget, which is especially important in the case of big web services, and GB finally dances to our tune. Whether the function was introduced on the same page in the head section or an external JS file, there is no evidence of a bot either in the server logs or in Search Console.

Part 3: Hidden content

In the final test, I wanted to check whether the content in, for example, hidden tabs would be considered and indexed by GB or whether Google rendered such a page and ignored the hidden text, as some specialists have been claiming.

I wanted to either confirm or dismiss this claim. To do that, I placed a wall of text with over 2000 signs on page12.html and hid a block of text with about 20 percent of the text (400 signs) in Cascading Style Sheets and I added the show more button. Within the hidden text there was a link to page13.html with an anchor anchor9.

There is no doubt that a bot can render a page. We can observe it in both Google Search Console and Google Insight Speed. Nevertheless, my tests revealed that a block of text displayed after clicking the show more button was fully indexed. The phrases hidden in the text ranked in the search results and GB was following the links hidden in the text. Moreover, the anchors of the links from a hidden block of text were visible in Google Search Console in the Anchor Text section and page13.html also began to rank in the search results for the keyword anchor9.

This is crucial for online shops, where content is often placed in hidden tabs. Now we are sure that GB sees the content in hidden tabs, indexes them, and transfers the juice from the links that are hidden there.

The most important conclusion that I am drawing from this experiment is that I have not found a direct way to bypass the First Link Counts Rule by using modified links (links with parameter, 301 redirects, canonicals, anchor links). At the same time, it is possible to build a website’s structure using Javascript links, thanks to which we are free from the restrictions of the First Link Counts Rule. Moreover, Google Bot can see and index content hidden in bookmarks and it follows the links hidden in them.

The post Here’s what happened when I followed Googlebot for 3 months appeared first on Search Engine Land.