Whenever we talk about SEO of Wp blogs, WordPress robots.txt file plays a major role in search engine ranking. It blocks search engine bots and helps index and crawl important parts of our blog. Though sometimes, a wrongly configured Robots.txt file can let your presence completely go away from search engines. So, it is important that when you make changes in your robots.txt file, it should be well optimized and should not block access to important parts of your blog.
There are many misunderstandings regarding indexing and non-indexing of content in Robots.txt and we will look into that aspect too.
SEO consists of hundreds of element and one of the essential parts of SEO is Robots.txt. This small text file standing at the root of your website can help in serious optimization of your website.
Most of the Webmasters tend to avoid editing Robots.txt file, but it’s not as hard as killing a snake. Anyone with basic knowledge can create and edit a Robots file, and if you are new to this, this post is perfect for your need.
If your website doesn’t have a Robots.txt file, you can learn how to do it here. If your blog/website has a Robots.txt file but is not optimized, you can follow this post and optimize your Robots.txt file.
What is WordPress Robots.txt and why should we use it
The Robots.txt file helps search engine robots and directs which part to crawl to and which part to avoid. When a search bot or spider of the search engine comes to your site and wants to index your site, they follow the Robots.txt file first. The search bot or spider follows the file directions for indexing or not indexing pages of your website.
If you use WordPress, you will find Robots.txt file in the root of your WordPress installation. For static websites, if you or your developers have created one, you will find it in your root folder. If you can’t, simply create a new notepad file and name it Robots.txt and upload it into the root directory of your domain using the FTP. Here is ShoutMeLoud’s Robots.txt file and you can see the content and it’s the location at the root of the domain.
https://www.shoutmeloud.com/robots.txt
How to make a robots.txt file?
As I mentioned earlier, Robots.txt is a general text file. So, if you don’t have this file on your website, open any text editor as you like (Notepad, for example) and create a Robots.txt file made with one or more records. Every record bears important information for the search engine. Example:
User-agent: googlebot
Disallow: /cgi-bin
If these lines are written in the Robots.txt file, it means it allows the Google bot to index every page of your site. But cgi-bin folder of root directory doesn’t allow for indexing. That means Google bot won’t index cgi-bin folder.
By using Disallow option, you can restrict any search bot or spider from indexing a page or folder. There are many sites that use no index in the archive folder or page for not making duplicate content.
Where Can You Get names of Search bots?
You can get it in your website’s log, but if you want lots of visitors from the search engine, you should allow every search bot. That means every search bot will index your site. You can write User-agent: * for allow every search bot. For example:
User-agent: *
Disallow: /cgi-bin
That is why every search bot will index your website.
What You Shouldn’t do
1. Don’t use comments in Robots.txt file.
2. Don’t keep the space at the beginning of any line and don’t make ordinary space in the file. Example:
Bad Practice:
User-agent: *
Dis allow: /support
Good Practice:
User-agent: *
Disallow: /support
3. Don’t change rules of command.
Bad Practice:
Disallow: /support
User-agent: *
Good Practice:
User-agent: *
Disallow: /support
4. If you do not want to index more than one directory or page, don’t write along with these names:
Bad Practice:
User-agent: *
Disallow: /support /cgi-bin /images/
Good Practice:
User-agent: *
Disallow: /support
Disallow: /cgi-bin
Disallow: /images
5. Use capital and small letters properly. For example, if you want to index “Download” directory but write “download” on Robots.txt file, it mistakes it for a search bot.
6. If you want index all pages and directory of your site, write:
User-agent: *
Disallow:
7. But if you want no index for all page and directory of you site write:
User-agent: *
Disallow: /
After editing the Robots.txt file, upload it via any FTP software on Root or Home Directory of your site.
Robots.Txt for WordPress:
You can either edit your WordPress Robots.txt file by logging into your FTP account of the server or you can use plugins like Robots meta to edit Rrobots.txt file from WordPress dashboard. There are a few things which you should add in your Robots.txt file along with your sitemap URL. Adding sitemap URL helps search engine bots to find your sitemap file and results in faster indexing of pages.
Here is a sample Robots.txt file for any domain. In the sitemap, replace the Sitemap URL with your blog URL:
Making sure no content is affected by new Robots.txt file
So now you have made some changes to your Robots.txt file, and it’s time to check if any of your content is impacted due to the updation in the robots.txt file.
You can use Google search console ‘Fetch as Google tool’ to see if whether or not your content can be accessed by Robots.txt file.
This steps are simple. Login to Google search console, select your site, go to diagnostic and Fetch as Google.
Add your site posts and check if there is any issue accessing your post.
You can also check for the crawl errors caused due to Robots.txt file under Crawl error section of search console.
Under Crawl > Crawl Error, select Restricted by Robots.txt and you will see what all links have been denied by the Robots.txt file.
Here is an example of Robots.txt Crawl Error for ShoutMeLoud:
You can clearly see that Replytocom links have been rejected by Robots.txt and so have other links which should not be a part of Google. FYI, Robots.txt file is an essential element of SEO, and you can avoid many post duplication issues by updating your Robots.txt file.
Do you use WordPress Robots.txt to optimize your site? Do you wish to add more insight to your Robots.txt file? Let us know using the comment section below. Don’t forget to subscribe to our e-mail newsletter to keep receiving more SEO tips.
Here are a few other hand-picked articles for you to read next:
How To Optimize Blog for Search Engines Using Search Console
5 Mobile SEO Mistakes That Are Sabotaging Your Mobile Marketing Efforts
A DIY Guide for WordPress Blog SEO: From Beginner to Pro
Subscribe on Youtube
Tweet76
Share50
Share10
Pin3
Like this:
LikeLoading…
Support
DoorWay Pages SEO Penalty – Return of Google Panda
Search Engine Value: The Not So Secret Mantra for SEO Success
Whenever we talk about SEO of Wp blogs, WordPress robots.txt file plays a major role in search engine ranking. It blocks search engine bots and helps index and crawl important parts of our blog. Though sometimes, a wrongly configured Robots.txt file can let your presence completely go away from search engines. So, it is important that when you make changes in your robots.txt file, it should be well optimized and should not block access to important parts of your blog.
There are many misunderstandings regarding indexing and non-indexing of content in Robots.txt and we will look into that aspect too.
SEO consists of hundreds of element and one of the essential parts of SEO is Robots.txt. This small text file standing at the root of your website can help in serious optimization of your website.
Most of the Webmasters tend to avoid editing Robots.txt file, but it’s not as hard as killing a snake. Anyone with basic knowledge can create and edit a Robots file, and if you are new to this, this post is perfect for your need.
If your website doesn’t have a Robots.txt file, you can learn how to do it here. If your blog/website has a Robots.txt file but is not optimized, you can follow this post and optimize your Robots.txt file.
What is WordPress Robots.txt and why should we use it
The Robots.txt file helps search engine robots and directs which part to crawl to and which part to avoid. When a search bot or spider of the search engine comes to your site and wants to index your site, they follow the Robots.txt file first. The search bot or spider follows the file directions for indexing or not indexing pages of your website.
If you use WordPress, you will find Robots.txt file in the root of your WordPress installation. For static websites, if you or your developers have created one, you will find it in your root folder. If you can’t, simply create a new notepad file and name it Robots.txt and upload it into the root directory of your domain using the FTP. Here is ShoutMeLoud’s Robots.txt file and you can see the content and it’s the location at the root of the domain.
https://www.shoutmeloud.com/robots.txt
How to make a robots.txt file?
As I mentioned earlier, Robots.txt is a general text file. So, if you don’t have this file on your website, open any text editor as you like (Notepad, for example) and create a Robots.txt file made with one or more records. Every record bears important information for the search engine. Example:
User-agent: googlebot
Disallow: /cgi-bin
If these lines are written in the Robots.txt file, it means it allows the Google bot to index every page of your site. But
cgi-bin
folder of root directory doesn’t allow for indexing. That means Google bot won’t indexcgi-bin
folder.By using Disallow option, you can restrict any search bot or spider from indexing a page or folder. There are many sites that use no index in the archive folder or page for not making duplicate content.
Where Can You Get names of Search bots?
You can get it in your website’s log, but if you want lots of visitors from the search engine, you should allow every search bot. That means every search bot will index your site. You can write
User-agent: *
for allow every search bot. For example:User-agent: *
Disallow: /cgi-bin
That is why every search bot will index your website.
What You Shouldn’t do
1. Don’t use comments in Robots.txt file.
2. Don’t keep the space at the beginning of any line and don’t make ordinary space in the file. Example:
Bad Practice:
User-agent: *
Dis allow: /support
Good Practice:
User-agent: *
Disallow: /support
3. Don’t change rules of command.
Bad Practice:
Disallow: /support
User-agent: *
Good Practice:
User-agent: *
Disallow: /support
4. If you do not want to index more than one directory or page, don’t write along with these names:
Bad Practice:
User-agent: *
Disallow: /support /cgi-bin /images/
Good Practice:
User-agent: *
Disallow: /support
Disallow: /cgi-bin
Disallow: /images
5. Use capital and small letters properly. For example, if you want to index “Download” directory but write “download” on Robots.txt file, it mistakes it for a search bot.
6. If you want index all pages and directory of your site, write:
User-agent: *
Disallow:
7. But if you want no index for all page and directory of you site write:
User-agent: *
Disallow: /
After editing the Robots.txt file, upload it via any FTP software on Root or Home Directory of your site.
Robots.Txt for WordPress:
You can either edit your WordPress Robots.txt file by logging into your FTP account of the server or you can use plugins like Robots meta to edit Rrobots.txt file from WordPress dashboard. There are a few things which you should add in your Robots.txt file along with your sitemap URL. Adding sitemap URL helps search engine bots to find your sitemap file and results in faster indexing of pages.
Here is a sample Robots.txt file for any domain. In the sitemap, replace the Sitemap URL with your blog URL:
Making sure no content is affected by new Robots.txt file
So now you have made some changes to your Robots.txt file, and it’s time to check if any of your content is impacted due to the updation in the robots.txt file.
You can use Google search console ‘Fetch as Google tool’ to see if whether or not your content can be accessed by Robots.txt file.
This steps are simple. Login to Google search console, select your site, go to diagnostic and Fetch as Google.
Add your site posts and check if there is any issue accessing your post.
You can also check for the crawl errors caused due to Robots.txt file under Crawl error section of search console.
Under Crawl > Crawl Error, select Restricted by Robots.txt and you will see what all links have been denied by the Robots.txt file.
Here is an example of Robots.txt Crawl Error for ShoutMeLoud:
You can clearly see that Replytocom links have been rejected by Robots.txt and so have other links which should not be a part of Google. FYI, Robots.txt file is an essential element of SEO, and you can avoid many post duplication issues by updating your Robots.txt file.
Do you use WordPress Robots.txt to optimize your site? Do you wish to add more insight to your Robots.txt file? Let us know using the comment section below. Don’t forget to subscribe to our e-mail newsletter to keep receiving more SEO tips.
Here are a few other hand-picked articles for you to read next:
Subscribe on Youtube
Like this:
Support