eu.dicodeproject.analysis.examples
Class UnquotedArchiveToSequenceFile

java.lang.Object
  extended by eu.dicodeproject.analysis.examples.MailArchiveToSequenceFile
      extended by eu.dicodeproject.analysis.examples.UnquotedArchiveToSequenceFile
All Implemented Interfaces:
org.apache.hadoop.fs.PathFilter

public class UnquotedArchiveToSequenceFile
extends MailArchiveToSequenceFile

Implements converting mbox archives to sequence files ignoring all quoted content to avoid text duplication.


Constructor Summary
UnquotedArchiveToSequenceFile(org.apache.hadoop.conf.Configuration conf, String prefix, org.apache.mahout.text.ChunkedWriter writer, Charset charset)
           
 
Method Summary
 
Methods inherited from class eu.dicodeproject.analysis.examples.MailArchiveToSequenceFile
accept
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

UnquotedArchiveToSequenceFile

public UnquotedArchiveToSequenceFile(org.apache.hadoop.conf.Configuration conf,
                                     String prefix,
                                     org.apache.mahout.text.ChunkedWriter writer,
                                     Charset charset)
                              throws IOException
Throws:
IOException


Copyright © 2011. All Rights Reserved.