- schedule(WebURL) - Method in class edu.uci.ics.crawler4j.frontier.Frontier
-
- scheduleAll(List<WebURL>) - Method in class edu.uci.ics.crawler4j.frontier.Frontier
-
- SCHEDULED_PAGES - Static variable in class edu.uci.ics.crawler4j.frontier.Counters.ReservedCounterNames
-
- scheduledPages - Variable in class edu.uci.ics.crawler4j.frontier.Frontier
-
- setAnchor(String) - Method in class edu.uci.ics.crawler4j.parser.ExtractedUrlAnchorPair
-
- setAnchor(String) - Method in class edu.uci.ics.crawler4j.url.WebURL
-
- setAuthenticationType(AuthInfo.AuthenticationType) - Method in class edu.uci.ics.crawler4j.crawler.authentication.AuthInfo
-
- setAuthInfos(List<AuthInfo>) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
- setBinaryContent(byte[]) - Method in class edu.uci.ics.crawler4j.parser.BinaryParseData
-
- setCacheSize(int) - Method in class edu.uci.ics.crawler4j.robotstxt.RobotstxtConfig
-
- setCleanupDelaySeconds(int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
- setConnectionTimeout(int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
- setContentCharset(String) - Method in class edu.uci.ics.crawler4j.crawler.Page
-
- setContentData(byte[]) - Method in class edu.uci.ics.crawler4j.crawler.Page
-
- setContentEncoding(String) - Method in class edu.uci.ics.crawler4j.crawler.Page
-
- setContentType(String) - Method in class edu.uci.ics.crawler4j.crawler.Page
-
- setCrawlStorageFolder(String) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
The folder which will be used by crawler for storing the intermediate
crawl data.
- setCustomData(Object) - Method in class edu.uci.ics.crawler4j.crawler.CrawlController
-
- setDefaultHeaders(Collection<? extends Header>) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
Set the default header collection (creating copies of the provided headers).
- setDepth(short) - Method in class edu.uci.ics.crawler4j.url.WebURL
-
- setDocid(int) - Method in class edu.uci.ics.crawler4j.url.WebURL
-
- setDocIdServer(DocIDServer) - Method in class edu.uci.ics.crawler4j.crawler.CrawlController
-
- setDomain(String) - Method in class edu.uci.ics.crawler4j.crawler.authentication.NtAuthInfo
-
- setEnabled(boolean) - Method in class edu.uci.ics.crawler4j.robotstxt.RobotstxtConfig
-
- setEntity(HttpEntity) - Method in class edu.uci.ics.crawler4j.fetcher.PageFetchResult
-
- setFetchedUrl(String) - Method in class edu.uci.ics.crawler4j.fetcher.PageFetchResult
-
- setFetchResponseHeaders(Header[]) - Method in class edu.uci.ics.crawler4j.crawler.Page
-
- setFollowRedirects(boolean) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
- setFrontier(Frontier) - Method in class edu.uci.ics.crawler4j.crawler.CrawlController
-
- setHost(String) - Method in class edu.uci.ics.crawler4j.crawler.authentication.AuthInfo
-
- setHref(String) - Method in class edu.uci.ics.crawler4j.parser.ExtractedUrlAnchorPair
-
- setHtml(String) - Method in class edu.uci.ics.crawler4j.parser.BinaryParseData
-
- setHtml(String) - Method in class edu.uci.ics.crawler4j.parser.HtmlParseData
-
- setHttpMethod(FormSubmitEvent.MethodType) - Method in class edu.uci.ics.crawler4j.crawler.authentication.AuthInfo
-
- setIgnoreUADiscrimination(boolean) - Method in class edu.uci.ics.crawler4j.robotstxt.RobotstxtConfig
-
- setIncludeBinaryContentInCrawling(boolean) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
- setIncludeHttpsPages(boolean) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
- setLanguage(String) - Method in class edu.uci.ics.crawler4j.crawler.Page
-
- setLoginTarget(String) - Method in class edu.uci.ics.crawler4j.crawler.authentication.AuthInfo
-
- setMaxConnectionsPerHost(int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
- setMaxDepthOfCrawling(int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
Maximum depth of crawling For unlimited depth this parameter should be set to -1
- setMaxDownloadSize(int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
- setMaxOutgoingLinksToFollow(int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
- setMaxPagesToFetch(int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
Maximum number of pages to fetch For unlimited number of pages, this parameter should be
set to -1
- setMaxTotalConnections(int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
- setMetaTags(Map<String, String>) - Method in class edu.uci.ics.crawler4j.parser.HtmlParseData
-
- setMovedToUrl(String) - Method in class edu.uci.ics.crawler4j.fetcher.PageFetchResult
-
- setOnlineTldListUpdate(boolean) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
Should the TLD list be updated automatically on each run? Alternatively,
it can be loaded from the embedded tld-names.txt resource file that was
obtained from https://publicsuffix.org/list/effective_tld_names.dat
- setOutgoingUrls(Set<WebURL>) - Method in class edu.uci.ics.crawler4j.parser.BinaryParseData
-
- setOutgoingUrls(Set<WebURL>) - Method in class edu.uci.ics.crawler4j.parser.HtmlParseData
-
- setOutgoingUrls(Set<WebURL>) - Method in interface edu.uci.ics.crawler4j.parser.ParseData
-
- setOutgoingUrls(Set<WebURL>) - Method in class edu.uci.ics.crawler4j.parser.TextParseData
-
- setPageFetcher(PageFetcher) - Method in class edu.uci.ics.crawler4j.crawler.CrawlController
-
- setParentDocid(int) - Method in class edu.uci.ics.crawler4j.url.WebURL
-
- setParentUrl(String) - Method in class edu.uci.ics.crawler4j.url.WebURL
-
- setParseData(ParseData) - Method in class edu.uci.ics.crawler4j.crawler.Page
-
- setPassword(String) - Method in class edu.uci.ics.crawler4j.crawler.authentication.AuthInfo
-
- setPasswordFormStr(String) - Method in class edu.uci.ics.crawler4j.crawler.authentication.FormAuthInfo
-
- setPath(String) - Method in class edu.uci.ics.crawler4j.url.WebURL
-
- setPolitenessDelay(int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
Politeness delay in milliseconds (delay between sending two requests to
the same host).
- setPort(int) - Method in class edu.uci.ics.crawler4j.crawler.authentication.AuthInfo
-
- setPriority(byte) - Method in class edu.uci.ics.crawler4j.url.WebURL
-
- setProcessBinaryContentInCrawling(boolean) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
Should we process binary content such as images, audio, ...
- setProcessed(WebURL) - Method in class edu.uci.ics.crawler4j.frontier.Frontier
-
- setProtocol(String) - Method in class edu.uci.ics.crawler4j.crawler.authentication.AuthInfo
-
- setProxyHost(String) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
- setProxyPassword(String) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
If crawler should run behind a proxy and user/pass is needed for
authentication in proxy, this parameter can be used for specifying the password.
- setProxyPort(int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
- setProxyUsername(String) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
- setRedirect(boolean) - Method in class edu.uci.ics.crawler4j.crawler.Page
-
- setRedirectedToUrl(String) - Method in class edu.uci.ics.crawler4j.crawler.Page
-
- setResponseHeaders(Header[]) - Method in class edu.uci.ics.crawler4j.fetcher.PageFetchResult
-
- setResumableCrawling(boolean) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
If this feature is enabled, you would be able to resume a previously
stopped/crashed crawl.
- setRobotstxtServer(RobotstxtServer) - Method in class edu.uci.ics.crawler4j.crawler.CrawlController
-
- setShutdownOnEmptyQueue(boolean) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
Should the crawler stop running when the queue is empty?
- setSocketTimeout(int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
- setStatusCode(int) - Method in class edu.uci.ics.crawler4j.crawler.Page
-
- setStatusCode(int) - Method in class edu.uci.ics.crawler4j.fetcher.PageFetchResult
-
- setTag(String) - Method in class edu.uci.ics.crawler4j.parser.ExtractedUrlAnchorPair
-
- setTag(String) - Method in class edu.uci.ics.crawler4j.url.WebURL
-
- setText(String) - Method in class edu.uci.ics.crawler4j.parser.HtmlParseData
-
- setTextContent(String) - Method in class edu.uci.ics.crawler4j.parser.TextParseData
-
- setThread(Thread) - Method in class edu.uci.ics.crawler4j.crawler.WebCrawler
-
- setThreadMonitoringDelaySeconds(int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
- setThreadShutdownDelaySeconds(int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
- setTitle(String) - Method in class edu.uci.ics.crawler4j.parser.HtmlParseData
-
- setURL(String) - Method in class edu.uci.ics.crawler4j.url.WebURL
-
- setUseOnline(boolean) - Static method in class edu.uci.ics.crawler4j.url.TLDList
-
If online is set to true, the list of TLD files will be downloaded and refreshed,
otherwise the one cached in src/main/resources/tld-names.txt will be used.
- setUserAgent(String) - Method in class edu.uci.ics.crawler4j.robotstxt.HostDirectives
-
Change the user agent string used to crawl after initialization.
- setUserAgentName(String) - Method in class edu.uci.ics.crawler4j.robotstxt.RobotstxtConfig
-
- setUserAgentString(String) - Method in class edu.uci.ics.crawler4j.crawler.CrawlConfig
-
user-agent string that is used for representing your crawler to web
servers.
- setUsername(String) - Method in class edu.uci.ics.crawler4j.crawler.authentication.AuthInfo
-
- setUsernameFormStr(String) - Method in class edu.uci.ics.crawler4j.crawler.authentication.FormAuthInfo
-
- setValue(String, long) - Method in class edu.uci.ics.crawler4j.frontier.Counters
-
- setWebURL(WebURL) - Method in class edu.uci.ics.crawler4j.crawler.Page
-
- shouldFollowLinksIn(WebURL) - Method in class edu.uci.ics.crawler4j.crawler.WebCrawler
-
Determine whether links found at the given URL should be added to the queue for crawling.
- shouldVisit(Page, WebURL) - Method in class edu.uci.ics.crawler4j.crawler.WebCrawler
-
Classes that extends WebCrawler should overwrite this function to tell the
crawler whether the given url should be crawled or not.
- shutdown() - Method in class edu.uci.ics.crawler4j.crawler.CrawlController
-
Set the current crawling session set to 'shutdown'.
- shutdown() - Method in class edu.uci.ics.crawler4j.fetcher.IdleConnectionMonitorThread
-
- shutDown() - Method in class edu.uci.ics.crawler4j.fetcher.PageFetcher
-
- shuttingDown - Variable in class edu.uci.ics.crawler4j.crawler.CrawlController
-
Is the crawling session set to 'shutdown'.
- sleep(int) - Static method in class edu.uci.ics.crawler4j.crawler.CrawlController
-
- SniPoolingHttpClientConnectionManager - Class in edu.uci.ics.crawler4j.fetcher
-
Class to work around the exception thrown by the SSL subsystem when the server is incorrectly
configured for SNI.
- SniPoolingHttpClientConnectionManager(Registry<ConnectionSocketFactory>) - Constructor for class edu.uci.ics.crawler4j.fetcher.SniPoolingHttpClientConnectionManager
-
- SniSSLConnectionSocketFactory - Class in edu.uci.ics.crawler4j.fetcher
-
Class to work around the exception thrown by the SSL subsystem when the server is incorrectly
configured for SNI.
- SniSSLConnectionSocketFactory(SSLContext, HostnameVerifier) - Constructor for class edu.uci.ics.crawler4j.fetcher.SniSSLConnectionSocketFactory
-
- start(Class<T>, int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlController
-
Start the crawling session and wait for it to finish.
- start(CrawlController.WebCrawlerFactory<T>, int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlController
-
Start the crawling session and wait for it to finish.
- start(CrawlController.WebCrawlerFactory<T>, int, boolean) - Method in class edu.uci.ics.crawler4j.crawler.CrawlController
-
- startElement(String, String, String, Attributes) - Method in class edu.uci.ics.crawler4j.parser.HtmlContentHandler
-
- startNonBlocking(CrawlController.WebCrawlerFactory<T>, int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlController
-
Start the crawling session and return immediately.
- startNonBlocking(Class<T>, int) - Method in class edu.uci.ics.crawler4j.crawler.CrawlController
-
Start the crawling session and return immediately.
- statisticsDB - Variable in class edu.uci.ics.crawler4j.frontier.Counters
-
- statusCode - Variable in class edu.uci.ics.crawler4j.crawler.Page
-
Status of the page
- statusCode - Variable in class edu.uci.ics.crawler4j.fetcher.PageFetchResult
-