# Robots.txt file # 12-2014 Change philosophy. Block known bad guys. For everyone else, block image directories. Most of the bad guys # simply ignore robots.txt anyway. # 12-2014 I'll block the bad guys like AmazonAws, Hackers, TopHosts and spammers in our firewall. # June 2012 Setup as a common robots.txt for all of my sites. Obviously, some of the directories don't exist # on all sites. # From Google at: https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt # Only one group of group-member records is valid for a particular crawler. The crawler must determine the correct group of # records by finding the group with the most specific user-agent that still matches. All other groups of records are # ignored by the crawler. The user-agent is non-case-sensitive. All non-matching text is ignored (for example, both # googlebot/1.2 and googlebot* are equivalent to googlebot). The order of the groups within the robots.txt file is irrelevant. # The start-of-group element user-agent is used to specify for which crawler the group is valid. Only one group of records is valid for a particular crawler. # Name the specific bot we don't want, they'll probably ignore this #6-2016 User-agent: msnbot-media # Don't steal our images #6-2016 User-agent: Googlebot-Image User-agent: Gigabot #Gigabot is the name of Gigablast's robot #6-2016 User-agent: yahoo-MMCrawler # Don't steal our images #6-2016 User-agent: yahoo-MMCrawler/3.x # Don't steal our images User-agent: ia_archiver-web.archive.org User-agent: ia_archiver User-agent: Yandex #Russian search engine User-agent: YandexBot User-agent: moget User-agent: ichiro User-agent: NaverBot User-agent: Yeti User-agent: Baiduspider User-agent: Baiduspider-video User-agent: Baiduspider-image User-agent: sogou spider User-agent: YoudaoBot User-agent: YodaoBot User-agent: AhrefsBot User-agent: SISTRIX User-agent: SEOkicks-Robot User-agent: SEOkicks User-agent: MJ12bot User-agent: SearchmetricsBot User-agent: NetSeer User-agent: SemrushBot User-agent: discoverybot User-agent: BacklinkCrawler User-agent: Ralocobot User-agent: YandexImages User-agent: A6-Indexer User-agent: coccoc #2-2015 Vietnamese browser User-agent: Apache-HttpClient #2-2015 User-agent: Curious George #2-2015 User-agent: WebmasterCoffee #2-2015 User-agent: spbot #2-2015 User-agent: WhelanLabs #2-2015 User-agent: research-scanner #2-2015 User-agent: Runet-Research-Crawler #2-2015 User-agent: CorporateNewsSearchEngine #2-2015 User-agent: SpiderLing #2-2015 User-agent: W3CLineMode #2-2015 HttpClient? User-agent: NetResearchServer #2-2015 User-agent: SurveyBot #2-2015 User-agent: Gimme60bot User-agent: Curious George User-agent: analyticsseo User-agent: Genieo User-agent: CRAZYWEBCRAWLER User-agent: Findxbot User-agent: DomainSigmaCrawler User-agent: aiHitBot User-agent: ChangeDetect User-agent: ChangeDetection User-agent: InfoMinder User-agent: Sogou User-agent: Sogou web spider User-agent: Toweyabot User-agent: DomainAppender User-agent: MegaIndex User-agent: DeuSu User-agent: GrapeshotCrawler User-agent: Wotbox User-agent: 8LEGS User-agent: Domain Re-Animator Bot User-agent: Domain Re-Animator User-agent: Qwantify User-agent: IstellaBot Disallow: / User-agent: * # Everybody else Disallow: /part-xref # MCM/MHP cross reference Disallow: /stayout # Duh Disallow: /pinnacle # Nothing much here Allow: /