Pages

  • RSS
  • Twitter
  • Facebook

Saturday, 15 September 2012

Robots.txt files, and why it matters!

Believe it or not, one important factor in web security has gone very unacknowledged. The robots.txt file is used by majority of people in order to disallow the indexing of sensitive files. However, the problem with this is, any one has access to your robots file, and you MUST attempt to make it so that nothing important gets disclosed through it.

Many servers hold files called "robots.txt" in their root directory in order to dictate to search engines what is allowed to be indexed, cached and listed in search results. Furthermore these can be listed with sitemaps in order to specify the URLs that search engines should index.

Now, there's one big flaw from this. Suppose, we had a webmaster called Joe, and he didn't want search engines to index a private directory called "privatedirectory". He'd proceed to list this up on his robots.txt file as this:

User-agent: *
Disallow:
Disallow: /privatedirectory
Have you spotted the flaw? It's bluntly obvious! What if someone simply visited the robots.txt file of any giver server at any given time? That would cause this disclosure of this directory that Joe obviously wished to not disclose.

How do you make proper robots.txt files? Use sitemaps! Don't list any private or sensitive directories directly in the robots file and also attempt to use the Allow method rather than the Disallow method shown above.

Personally, back in 2009, I was talking to a few blackhats on IRC and I asked them, what is potentially one of the stupidest things that have helped you get through layers of security? They told me that they were pentesting an ISP, and had found an SQL injection that led to the disclosure of admin passwords, were but stuck at this point. They had no idea what the admin directory was, and had tried all types of methods to try and find it, at they end, they had realised that the answer to where it was located was in plain text in the robots file, which had disallowed the indexing of a directory called "/a1d2m3i4n5". This is quite shocking, as not only should an ISP never have SQLi in the first place but they also should NEVER place such sensitive directories in the robots file.

This concludes my ramble on the robots file, I hope it helps you in whatever position you are in.

5 comments:

Unknown said...

He should also have password-protected his private directory using .htaccess or similar.

Shubham Shah said...

Definitely! But also, htaccess has major major flaws. Will talk about it in the next article, and how to prevent the bypassing of htaccess files.

Anonymous said...

I like your post and the way you are explaining about robots.txt. It is nice post and contain genuine data for learning how we create robots.txt file for a site.

sarah lee said...

As really needed robot.txt I am excited getting your instructs, Hope to get more:-)
This is the site for your Home security

Unknown said...

Home Lifestyle has a wide range of One Stop Home Essentials products suited for the Active, Busy, Mobile and City Living People, bringing the Quality of Life to a different level.

Post a comment