Is Rankar Academy completely free?

Yes. Every lesson, every topic, and every certificate is permanently free. No credit card, no trial expiry. All 10 Rankar tools have free plans that cover every exercise.

How long does one lesson take?

Each lesson takes 12 to 18 minutes to read, plus 5 to 10 minutes to complete the hands-on task in the Rankar tool.

Do I need prior experience?

No experience needed. The program starts from absolute zero — Lesson 1 explains what SEO is and assumes you know nothing.

Do the certificates work on LinkedIn?

Yes. Every certificate has a unique ID you can verify at rankar.ai/verify and add to LinkedIn under Licences and Certifications.

1. Does robots.txt prevent pages from appearing in Google?

No. Robots.txt only blocks crawling. Google may still index a URL if it discovers the page through external or internal links.

2. Where is the robots.txt file located?

It should be located at the root of your domain, such as https://yourdomain.com/robots.txt.

3. Should every website have a robots.txt file?

Yes. Even a simple robots.txt file helps guide search engines, specify your sitemap location, and manage crawl efficiency.

← Technical SEO Excellence

Robots.txt — Use It Without Accidentally Blocking Yourself

Learn how to use robots.txt correctly, manage crawl budget, avoid indexing mistakes, and prevent accidentally blocking key pages

What robots.txt is — and what it does not do

Robots.txt is one of the simplest files on a website, yet it has the power to dramatically affect your SEO performance. A single incorrect directive can prevent search engines from crawling important pages, waste valuable crawl budget, or even block an entire website from being discovered.

For beginners, robots.txt often seems like a tool for hiding pages from Google. In reality, its purpose is much more specific: it tells search engine crawlers which parts of your website they can and cannot crawl. Understanding how it works—and what it cannot do—is essential if you want to avoid costly SEO mistakes.

This guide explains robots.txt fundamentals, common errors, best practices, and how to use it safely without accidentally damaging your rankings.

What Is Robots.txt?

A robots.txt file is a plain text document located at:

https://yourdomain.com/robots.txt

When a search engine crawler visits your website, one of the first files it requests is robots.txt. The crawler reads the instructions inside and decides which URLs it should crawl or avoid.

A basic example looks like this:

User-agent: *

Disallow: /admin/

Sitemap: https://yourdomain.com/sitemap.xml

In this example:

User-agent: * applies the rule to all crawlers
Disallow: /admin/ blocks crawling of the admin section
Sitemap: points search engines to the XML sitemap

Robots.txt acts as a set of crawling instructions rather than a security mechanism. Anyone can view the file simply by visiting its URL, which means sensitive information should never rely on robots.txt for protection.

Crawling vs. Indexing: The Critical Difference

One of the biggest SEO misunderstandings is assuming robots.txt controls indexing.

It doesn't.

Robots.txt controls whether Google can crawl a page.

Indexing determines whether Google can store and display that page in search results.

This distinction is extremely important.

Example

Imagine you block this page:

Disallow: /private-page/

Googlebot cannot crawl the page.

However, if another website links to that URL, Google may still discover and index the URL itself, even without crawling the content.

The search result may appear as:

"No information is available for this page."

Because Google knows the URL exists but cannot access its contents.

If You Want a Page Removed from Search Results

Use a:

The page must remain crawlable for Google to see the noindex directive.

Simple Rule

Robots.txt = controls crawling
Noindex = controls indexing

Confusing these two concepts causes many SEO problems.

Understanding Robots.txt Directives

The robots.txt language is intentionally simple.

User-Agent

Specifies which crawler a rule applies to.

User-agent: *

Applies to all crawlers.

User-agent: Googlebot

Applies only to Google's primary crawler.

Disallow

Blocks crawling of a specific path.

Disallow: /admin/

Prevents crawlers from accessing URLs inside the admin folder.

Examples:

Disallow: /cart/

Disallow: /checkout/

Disallow: /search/

Allow

Overrides a broader disallow rule.

Example:

Disallow: /products/

Allow: /products/featured-product/

This blocks most product URLs while allowing one specific page.

Sitemap

Provides the location of your XML sitemap.

Sitemap: https://yourdomain.com/sitemap.xml

Including this directive helps search engines discover important URLs faster.

What Should You Block in Robots.txt?

The goal of robots.txt is not to hide content.

Its main purpose is to prevent search engines from wasting crawl budget on low-value pages.

Good Candidates for Blocking

Admin Areas

Disallow: /admin/

Disallow: /wp-admin/

These sections provide no SEO value.

Login Pages

Disallow: /login/

Disallow: /account/

Searchers never need these pages in search results.

Internal Search Results

Disallow: /search/

Search result pages often create thousands of thin URLs that consume crawl resources.

URL Parameters

Sites with filters and sorting options frequently generate duplicate content.

Examples:

?sort=price

?filter=color

?page=2

Managing these URLs carefully can improve crawl efficiency.

Staging Environments

Development and staging versions should never be crawled.

Examples:

staging.domain.com

dev.domain.com

These environments often contain duplicate or incomplete content.

What You Should Never Block

Some pages and files should almost always remain crawlable.

CSS Files

Google renders websites similarly to a browser.

Blocking CSS prevents Google from seeing layouts correctly.

Bad example:

Disallow: /css/

JavaScript Files

Modern websites rely heavily on JavaScript.

Blocking JS can cause rendering issues and indexing problems.

Bad example:

Disallow: /js/

Important Content Pages

Never block:

Blog posts
Product pages
Category pages
Service pages
Landing pages

If a page should rank, Google must be able to crawl it.

XML Sitemaps

Always keep your sitemap accessible.

Search engines use it to discover important URLs efficiently.

The Most Dangerous Robots.txt Mistakes

1. Blocking the Entire Website

The most infamous robots.txt error:

User-agent: *

Disallow: /

This tells all crawlers:

"Do not crawl anything."

If deployed on a live website, rankings can collapse rapidly.

Many SEO disasters have started with this single line accidentally moving from staging to production.

2. Blocking Assets Needed for Rendering

Google needs access to:

CSS
JavaScript
Images

Blocking these resources creates an incomplete understanding of your pages.

This can lead to:

Lower rankings
Mobile usability issues
Indexing problems

3. Using Robots.txt Instead of Noindex

Many site owners write:

Disallow: /thank-you/

Expecting the page to disappear from search results.

Instead, use:

when search visibility—not crawling—is the issue.

4. Leaving Old Rules After a Site Migration

Website migrations frequently create outdated robots.txt files.

Examples include:

Old directories no longer used
Legacy CMS paths
Temporary development blocks

Always review robots.txt after major site changes.

5. Blocking Pages You Actually Need

This happens surprisingly often.

Examples:

Disallow: /blog/

Disallow: /products/

Disallow: /services/

These mistakes silently prevent valuable content from being crawled and ranked.

Regular audits help catch these issues before they affect traffic.

How to Test Your Robots.txt File

Never publish robots.txt changes without testing them.

Use Google Search Console

Google Search Console provides robots.txt reporting and validation tools.

Before publishing:

Review the current file.
Check for syntax errors.
Test important URLs.
Verify that key pages remain crawlable.
Confirm blocked pages are truly intended to be blocked.

Use SEO Crawling Tools

Tools like:

Screaming Frog
Sitebulb
RankAIO

can identify blocked URLs and reveal crawling issues before they impact rankings.

Robots.txt Best Practices

Follow these guidelines to avoid most robots.txt problems:

Keep It Simple

Only add rules you genuinely need.

Complex robots.txt files are harder to maintain and easier to break.

Include Your Sitemap

Always specify your XML sitemap location.

Sitemap: https://yourdomain.com/sitemap.xml

Audit Regularly

Review robots.txt after:

Website redesigns
CMS migrations
Plugin changes
Development deployments

Focus on Crawl Budget

Use robots.txt primarily to:

Block low-value URLs
Improve crawl efficiency
Guide search engines toward important content

Test Before Deploying

A five-minute review can prevent months of traffic loss.

Never assume robots.txt changes are harmless.

⚠️ Critical Distinction

robots.txt controls CRAWLING. noindex meta tags control INDEXING. These are different things. A common and costly mistake: blocking a page in robots.txt and assuming it will not appear in search results. Googlebot will not crawl it — but may still index the URL if other sites link to it. Use noindex for pages that should not appear in search results.

The robots.txt syntax — what you need to know

A robots.txt file contains one or more "User-agent" blocks. Each block specifies a crawler and the rules for that crawler:

User-agent: *— applies the following rules to all crawlers
User-agent: Googlebot— applies only to Google's main web crawler
Disallow: /admin/— blocks crawling of any URL starting with /admin/
Disallow: /— blocks crawling of the ENTIRE SITE (the most dangerous directive — never use on production)
Allow: /admin/public.html— allows crawling of a specific URL within a disallowed directory
Sitemap: https://yoursite.com/sitemap.xml— tells crawlers where your sitemap is located

Rules are case-sensitive for paths. Disallow: /Admin/ and Disallow: /admin/ are different rules. Wildcards (*) match any character sequence. A $ at the end of a pattern matches the end of the URL exactly.

What to disallow — and what to never block

Safe to disallow

Crawl budget savers

Admin pages (/admin/, /wp-admin/), login pages (/login, /account/login), internal search results (/search?q=), staging environments (if publicly accessible), print-only page versions (/print/), and any directory that generates hundreds of low-value parameter-based URLs.

Never disallow

Critical pages

Your CSS and JavaScript files (Google needs these to render pages properly), any page you want to rank, your sitemap location, product pages, blog posts, category pages, or any page linked externally. Blocking CSS/JS prevents Google from seeing your pages as users do — often causing rendering issues that suppress rankings.

The most dangerous robots.txt mistakes

Disallow: /— Blocks the entire site. Every developer who has accidentally deployed this to production has felt genuine panic when rankings disappear overnight. Always check robots.txt after any site deployment.
Disallowing CSS and JavaScript directories— A common legacy mistake from when bandwidth costs made blocking bot access to assets attractive. Today this prevents Google from rendering your pages correctly, causing significant ranking damage.
Using robots.txt instead of noindex— Blocking important pages in robots.txt when you want them to not appear in search results. The correct tool is a noindex meta tag on the (crawlable) page.
Forgetting to update after migrations— Old disallow rules from a previous site structure silently blocking new content. Review the entire file after any major site change.

Testing your robots.txt in Google Search Console

Google Search Console has a built-in robots.txt tester. Navigate to Settings → robots.txt → Open report. This tool shows your current live robots.txt content, lets you test any URL to see whether Googlebot would be blocked from crawling it, and highlights any syntax errors in the file. Use this before and after making any changes to your robots.txt to verify the effect.

🎯 Your Task This Lesson

Audit your robots.txt for accidental blocks

Visit yoursite.com/robots.txt in your browser. Copy the contents. Open the GSC robots.txt tester (Settings → robots.txt). Test the following URLs against your robots.txt: your homepage, your most important blog post, your most important product or service page, your CSS directory (e.g. /wp-content/themes/), and your JavaScript directory. Fix any unexpected blocks immediately — especially any block on CSS or JS files. Add the Sitemap: directive to your robots.txt if it is missing. Document any changes made.

Test robots.txt in Search Console ↗

✓ Lesson Complete — You Now Know

✓

What robots.txt does — controls crawling, not indexing — and why this distinction is critical

✓

The complete robots.txt syntax: User-agent, Disallow, Allow, wildcards, Sitemap directive

✓

What is safe to disallow (admin, login, search results) and what must never be blocked (CSS, JS, important pages)

✓

The 4 most dangerous robots.txt mistakes — including the site-wide Disallow: / disaster

✓

How to use the GSC robots.txt tester to verify your rules before and after any change

← Back to Technical SEO Excellence

← Technical SEO Excellence

Robots.txt — Use It Without Accidentally Blocking Yourself

Learn how to use robots.txt correctly, manage crawl budget, avoid indexing mistakes, and prevent accidentally blocking key pages

What robots.txt is — and what it does not do

This guide explains robots.txt fundamentals, common errors, best practices, and how to use it safely without accidentally damaging your rankings.

What Is Robots.txt?

A robots.txt file is a plain text document located at:

https://yourdomain.com/robots.txt

When a search engine crawler visits your website, one of the first files it requests is robots.txt. The crawler reads the instructions inside and decides which URLs it should crawl or avoid.

A basic example looks like this:

User-agent: *

Disallow: /admin/

Sitemap: https://yourdomain.com/sitemap.xml

In this example:

User-agent: * applies the rule to all crawlers
Disallow: /admin/ blocks crawling of the admin section
Sitemap: points search engines to the XML sitemap

Crawling vs. Indexing: The Critical Difference

One of the biggest SEO misunderstandings is assuming robots.txt controls indexing.

It doesn't.

Robots.txt controls whether Google can crawl a page.

Indexing determines whether Google can store and display that page in search results.

This distinction is extremely important.

Example

Imagine you block this page:

Disallow: /private-page/

Googlebot cannot crawl the page.

However, if another website links to that URL, Google may still discover and index the URL itself, even without crawling the content.

The search result may appear as:

"No information is available for this page."

Because Google knows the URL exists but cannot access its contents.

If You Want a Page Removed from Search Results

Use a:

The page must remain crawlable for Google to see the noindex directive.

Simple Rule

Robots.txt = controls crawling
Noindex = controls indexing

Confusing these two concepts causes many SEO problems.

Understanding Robots.txt Directives

The robots.txt language is intentionally simple.

User-Agent

Specifies which crawler a rule applies to.

User-agent: *

Applies to all crawlers.

User-agent: Googlebot

Applies only to Google's primary crawler.

Disallow

Blocks crawling of a specific path.

Disallow: /admin/

Prevents crawlers from accessing URLs inside the admin folder.

Examples:

Disallow: /cart/

Disallow: /checkout/

Disallow: /search/

Allow

Overrides a broader disallow rule.

Example:

Disallow: /products/

Allow: /products/featured-product/

This blocks most product URLs while allowing one specific page.

Sitemap

Provides the location of your XML sitemap.

Sitemap: https://yourdomain.com/sitemap.xml

Including this directive helps search engines discover important URLs faster.

What Should You Block in Robots.txt?

The goal of robots.txt is not to hide content.

Its main purpose is to prevent search engines from wasting crawl budget on low-value pages.

Good Candidates for Blocking

Admin Areas

Disallow: /admin/

Disallow: /wp-admin/

These sections provide no SEO value.

Login Pages

Disallow: /login/

Disallow: /account/

Searchers never need these pages in search results.

Internal Search Results

Disallow: /search/

Search result pages often create thousands of thin URLs that consume crawl resources.

URL Parameters

Sites with filters and sorting options frequently generate duplicate content.

Examples:

?sort=price

?filter=color

?page=2

Managing these URLs carefully can improve crawl efficiency.

Staging Environments

Development and staging versions should never be crawled.

Examples:

staging.domain.com

dev.domain.com

These environments often contain duplicate or incomplete content.

What You Should Never Block

Some pages and files should almost always remain crawlable.

CSS Files

Google renders websites similarly to a browser.

Blocking CSS prevents Google from seeing layouts correctly.

Bad example:

Disallow: /css/

JavaScript Files

Modern websites rely heavily on JavaScript.

Blocking JS can cause rendering issues and indexing problems.

Bad example:

Disallow: /js/

Important Content Pages

Never block:

Blog posts
Product pages
Category pages
Service pages
Landing pages

If a page should rank, Google must be able to crawl it.

XML Sitemaps

Always keep your sitemap accessible.

Search engines use it to discover important URLs efficiently.

The Most Dangerous Robots.txt Mistakes

1. Blocking the Entire Website

The most infamous robots.txt error:

User-agent: *

Disallow: /

This tells all crawlers:

"Do not crawl anything."

If deployed on a live website, rankings can collapse rapidly.

Many SEO disasters have started with this single line accidentally moving from staging to production.

2. Blocking Assets Needed for Rendering

Google needs access to:

CSS
JavaScript
Images

Blocking these resources creates an incomplete understanding of your pages.

This can lead to:

Lower rankings
Mobile usability issues
Indexing problems

3. Using Robots.txt Instead of Noindex

Many site owners write:

Disallow: /thank-you/

Expecting the page to disappear from search results.

Instead, use:

when search visibility—not crawling—is the issue.

4. Leaving Old Rules After a Site Migration

Website migrations frequently create outdated robots.txt files.

Examples include:

Old directories no longer used
Legacy CMS paths
Temporary development blocks

Always review robots.txt after major site changes.

5. Blocking Pages You Actually Need

This happens surprisingly often.

Examples:

Disallow: /blog/

Disallow: /products/

Disallow: /services/

These mistakes silently prevent valuable content from being crawled and ranked.

Regular audits help catch these issues before they affect traffic.

How to Test Your Robots.txt File

Never publish robots.txt changes without testing them.

Use Google Search Console

Google Search Console provides robots.txt reporting and validation tools.

Before publishing:

Review the current file.
Check for syntax errors.
Test important URLs.
Verify that key pages remain crawlable.
Confirm blocked pages are truly intended to be blocked.

Use SEO Crawling Tools

Tools like:

Screaming Frog
Sitebulb
RankAIO

can identify blocked URLs and reveal crawling issues before they impact rankings.

Robots.txt Best Practices

Follow these guidelines to avoid most robots.txt problems:

Keep It Simple

Only add rules you genuinely need.

Complex robots.txt files are harder to maintain and easier to break.

Include Your Sitemap

Always specify your XML sitemap location.

Sitemap: https://yourdomain.com/sitemap.xml

Audit Regularly

Review robots.txt after:

Website redesigns
CMS migrations
Plugin changes
Development deployments

Focus on Crawl Budget

Use robots.txt primarily to:

Block low-value URLs
Improve crawl efficiency
Guide search engines toward important content

Test Before Deploying

A five-minute review can prevent months of traffic loss.

Never assume robots.txt changes are harmless.

⚠️ Critical Distinction

The robots.txt syntax — what you need to know

A robots.txt file contains one or more "User-agent" blocks. Each block specifies a crawler and the rules for that crawler:

User-agent: *— applies the following rules to all crawlers
User-agent: Googlebot— applies only to Google's main web crawler
Disallow: /admin/— blocks crawling of any URL starting with /admin/
Disallow: /— blocks crawling of the ENTIRE SITE (the most dangerous directive — never use on production)
Allow: /admin/public.html— allows crawling of a specific URL within a disallowed directory
Sitemap: https://yoursite.com/sitemap.xml— tells crawlers where your sitemap is located

What to disallow — and what to never block

Safe to disallow

Crawl budget savers

Never disallow

Critical pages

The most dangerous robots.txt mistakes

Disallow: /— Blocks the entire site. Every developer who has accidentally deployed this to production has felt genuine panic when rankings disappear overnight. Always check robots.txt after any site deployment.
Disallowing CSS and JavaScript directories— A common legacy mistake from when bandwidth costs made blocking bot access to assets attractive. Today this prevents Google from rendering your pages correctly, causing significant ranking damage.
Using robots.txt instead of noindex— Blocking important pages in robots.txt when you want them to not appear in search results. The correct tool is a noindex meta tag on the (crawlable) page.
Forgetting to update after migrations— Old disallow rules from a previous site structure silently blocking new content. Review the entire file after any major site change.

Testing your robots.txt in Google Search Console

🎯 Your Task This Lesson

Audit your robots.txt for accidental blocks

Test robots.txt in Search Console ↗

✓ Lesson Complete — You Now Know

✓

What robots.txt does — controls crawling, not indexing — and why this distinction is critical

✓

The complete robots.txt syntax: User-agent, Disallow, Allow, wildcards, Sitemap directive

✓

What is safe to disallow (admin, login, search results) and what must never be blocked (CSS, JS, important pages)

✓

The 4 most dangerous robots.txt mistakes — including the site-wide Disallow: / disaster

✓

How to use the GSC robots.txt tester to verify your rules before and after any change

← Back to Technical SEO Excellence