public class RobotsTxt
extends java.lang.Object
robots.txt file.| Constructor and Description |
|---|
RobotsTxt(java.io.Reader r,
java.util.Set<java.lang.String> agents) |
| Modifier and Type | Method and Description |
|---|---|
boolean |
isAllowed(java.lang.String userAgent,
java.lang.String path)
Determine if a user agent is allowed for the specified path.
|
boolean |
isAllowed(java.lang.String userAgent,
java.lang.String path,
boolean checkWildcard)
Determine if a user agent is allowed for the specified path.
|
static RobotsTxt |
parse(java.lang.String host,
Client httpClient,
java.lang.String userAgent,
java.util.Set<java.lang.String> preserveAgents,
Logger logger)
Creates a robots.txt from the standard location (
/robots.txt). |
public static final RobotsTxt NO_ROBOTS
public RobotsTxt(java.io.Reader r,
java.util.Set<java.lang.String> agents)
throws java.io.IOException
java.io.IOExceptionpublic static RobotsTxt parse(java.lang.String host, Client httpClient, java.lang.String userAgent, java.util.Set<java.lang.String> preserveAgents, Logger logger)
/robots.txt).host - The hostname. The URL will be created as [host]/robots.txt.httpClient - The HTTP client for making the request.userAgent - The User-Agent sent with the request.preserveAgents - The set of agents to preserve. Agents not contained
in this set will be ignored during parse.logger - A logger for errors. May be null. If specified HTTP errors during
parse will be logged at the warn level.public final boolean isAllowed(java.lang.String userAgent,
java.lang.String path)
userAgent - The user agent string.path - The path.public final boolean isAllowed(java.lang.String userAgent,
java.lang.String path,
boolean checkWildcard)
Technically, the treatment of Allow is not right (http://www.robotstxt.org/wc/norobots-rfc.html). A single list should be processed - matching all records in the order they appear. However, in practice, I have found that many times people do things that don't make sense - like disallow all, then allow, etc.
userAgent - The user agent string.path - The path.checkWildcard - Should the wildcard record be checked? (This gives a way to know if a
user agent is explicitly disallowed by name.)