Many servers hold files called "robots.txt" in their root directory in order to dictate to search engines what is allowed to be indexed, cached and listed in search results. Furthermore these can be listed with sitemaps in order to specify the URLs that search engines should index.
Now, there's one big flaw from this. Suppose, we had a webmaster called Joe, and he didn't want search engines to index a private directory called "privatedirectory". He'd proceed to list this up on his robots.txt file as this:
User-agent: *Have you spotted the flaw? It's bluntly obvious! What if someone simply visited the robots.txt file of any giver server at any given time? That would cause this disclosure of this directory that Joe obviously wished to not disclose.
Disallow:
Disallow: /privatedirectory
How do you make proper robots.txt files? Use sitemaps! Don't list any private or sensitive directories directly in the robots file and also attempt to use the Allow method rather than the Disallow method shown above.
Personally, back in 2009, I was talking to a few blackhats on IRC and I asked them, what is potentially one of the stupidest things that have helped you get through layers of security? They told me that they were pentesting an ISP, and had found an SQL injection that led to the disclosure of admin passwords, were but stuck at this point. They had no idea what the admin directory was, and had tried all types of methods to try and find it, at they end, they had realised that the answer to where it was located was in plain text in the robots file, which had disallowed the indexing of a directory called "/a1d2m3i4n5". This is quite shocking, as not only should an ISP never have SQLi in the first place but they also should NEVER place such sensitive directories in the robots file.
This concludes my ramble on the robots file, I hope it helps you in whatever position you are in.