What is Robots.txt?
Robots.txt is a file in Search Console/Webmaster make to educate web robots (ordinarily internet searcher robots) how to crawl pages on their site.
Syntax for robots.txt file
An Easy robots.txt file uses two words User-agent and Disallow.
- User-agents are search engine robots or web crawlers like Google, Yahoo, etc.
- Where Disallow is a command for avoiding access of page/file for User-agent.
- You can Disallow one or many file/ pages at a time.
Basic format for blocking all crawlers:
Disallow: [URL string not to be crawled]
Basic format for blocking specific crawlers:
How does robots.txt work?
Web engines have two functions:
- Crawling the web to find content
- Indexing that content with the goal that it can be served up to searchers who are searching for data.
- Robots.txt control crawler access to specific zones of your site.
- Blocking copy content from showing up in SERPs
- Preventing web crawlers from requesting certain records on your webpage
- If there is no unhelpful page on your site you can avoid to use robots.txt
You can also allow the access of blocked pages/files to User-agent by giving access through
Allow: [Blocked URL string]
How to create and place robots.txt file?
- Create robots.txt file by writing syntax and then save as text file.
- Place the saved file into the top level directory of your site.
- Saving the file as text helps web crawlers to recognize your file easily.
How can I find robots.txt on a site?
If you want to know whether your site contain robots.txt or not the just do the following step
Type your base URL then add /robots.txt
It shows one of the three results :
1) You Will find robots.txt file
2) It shows empty file
3) Get error 404
Things to keep in Mind :
The filename is case sensitive; please make sure to type robots.txt and not Robots.txt.
The document is publically accessible and anybody approaches the record. It is imperative to apply better security to areas of your site you are attempting to keep covered up for security reason.
You can write disallow ones per URL.
Alongside the record the “noindex, take after” marks should similarly be used on each and every related page
In conclusion, robots.txt file allows you to block certain pages/files of your site that you don’t wish to index by the search engines. It helps to confirm security of a website.