Class Utf8Validator


  • public class Utf8Validator
    extends java.lang.Object
    Incremental UTF-8 validator. The validator runs with constant memory consumption (minimal state). Purpose is to validate UTF-8, not to decode (which could be done easily also, but we rely on Java built in facilities for that).

    Implements the algorithm "Flexible and Economical UTF-8 Decoder" by Bjoern Hoehrmann (http://bjoern.hoehrmann.de/utf-8/decoder/dfa/).

    • Constructor Summary

      Constructors 
      Constructor Description
      Utf8Validator()
      Create new incremental UTF-8 validator.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean isValid()
      Check if incremental validation (currently) has ended on a complete encoded Unicode codepoint.
      int position()
      Get end of validated position within stream.
      void reset()
      Reset validator state to begin validation of new UTF-8 stream.
      boolean validate​(byte[] data)
      Validate a chunk of octets for UTF-8.
      boolean validate​(byte[] data, int off, int len)
      Validate a chunk of octets for UTF-8.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • Utf8Validator

        public Utf8Validator()
        Create new incremental UTF-8 validator. The validator is already resetted and thus immediately usable.
    • Method Detail

      • reset

        public void reset()
        Reset validator state to begin validation of new UTF-8 stream.
      • position

        public int position()
        Get end of validated position within stream. When validate() returns false, indicating an UTF-8 error, this function can be used to get the exact position within the stream upon which the violation was encountered.
        Returns:
        Current position with stream validated.
      • isValid

        public boolean isValid()
        Check if incremental validation (currently) has ended on a complete encoded Unicode codepoint.
        Returns:
        True, iff currently ended on codepoint.
      • validate

        public boolean validate​(byte[] data,
                                int off,
                                int len)
        Validate a chunk of octets for UTF-8.
        Parameters:
        data - Buffer which contains chunk to validate.
        off - Offset within buffer where to continue with validation.
        len - Length in octets to validate within buffer.
        Returns:
        False as soon as UTF-8 violation occurs, true otherwise.
      • validate

        public boolean validate​(byte[] data)
        Validate a chunk of octets for UTF-8.
        Parameters:
        data - Buffer which contains chunk to validate.
        Returns:
        False as soon as UTF-8 violation occurs, true otherwise.