intarsys PDF library API

de.intarsys.pdf.parser
Class PDFParser

java.lang.Object
  extended by de.intarsys.pdf.parser.PDFParser
Direct Known Subclasses:
COSDocumentParser, CSContentParser

public abstract class PDFParser
extends Object

An abstract superclass for our two flavours of PDF Parsers.


Field Summary
static String C_WARN_ARRAYSIZE
           
static String C_WARN_ENDOBJ_MISSING
           
static String C_WARN_ENDSTREAMCORRUPT
           
static String C_WARN_ENDSTREAMEOL
           
static String C_WARN_ILLEGALHEX
           
static String C_WARN_NAMETOLONG
           
static String C_WARN_SINGLEEOL
           
static String C_WARN_SINGLEEOL_OBJ
           
static String C_WARN_SINGLESPACE
           
static String C_WARN_SINGLESPACE_OBJ
           
static String C_WARN_STREAMEOL
           
static String C_WARN_STREAMEXTERNAL
           
static String C_WARN_STREAMLENGTH
           
static String C_WARN_STRINGTOLONG
           
static String C_WARN_UNEVENHEX
           
static char CHAR_BS
           
static char CHAR_CR
           
static char CHAR_FF
           
static char CHAR_HT
           
static char CHAR_LF
           
static byte[] TOKEN_endobj
           
static byte[] TOKEN_endstream
           
static byte[] TOKEN_EOF
           
static byte[] TOKEN_false
           
static byte[] TOKEN_FDFHEADER
           
static byte[] TOKEN_ndstream
           
static byte[] TOKEN_null
           
static byte[] TOKEN_obj
           
static byte[] TOKEN_PDFHEADER
           
static byte[] TOKEN_R
           
static byte[] TOKEN_s_tream
           
static byte[] TOKEN_startxref
           
static byte[] TOKEN_stream
           
static byte[] TOKEN_trailer
           
static byte[] TOKEN_true
           
static byte[] TOKEN_xref
           
 
Constructor Summary
PDFParser()
           
 
Method Summary
 IPDFParserExceptionHandler getExceptionHandler()
           
 void handleError(COSLoadError error)
          Handle an error if an exceptionHandler is set.
 void handleWarning(COSLoadWarning warning)
          Handle a warning if an exceptionHandler is set.
static boolean isDelimiter(int i)
          evaluate to true if i is a PDF Delimiter char.
static boolean isDigit(int i)
          evaluate to true if i is a valid digit.
static boolean isEOL(int i)
          evaluate to true if i is a valid line terminator.
static boolean isNumberStart(int i)
          evaluate to true if i is a valid first char for a number token.
static boolean isOctalDigit(int i)
          evaluate to true if i is a valid octal digit.
static boolean isTokenStart(int i)
          evaluate to true if i is a valid string token start.
static boolean isWhitespace(int i)
          evaluate to true if i is a valid whitespace.
 Object parseElement(IRandomAccess input)
          parse the basic elements from the current stream position.
 STDocType parseHeader(IRandomAccess input)
          pdf header see PDF Reference v1.4, chapter 3.4.1 Header COSHEader ::= "%PDF-" version.
 int readInteger(IRandomAccess input, boolean consumeSpaceAfter)
          reads the next integer on input. consumes one trailing space if consumeSpaceAfter is set to true.
 void readSpaces(IRandomAccess input)
          read all characters until EOF or non space char appears. the first non space char is pushed back so the next char read is the first non space char.
 byte[] readToken(IRandomAccess input)
          read a single token.
 byte[] readToken(IRandomAccess input, List messages)
          derive of readToken, populates the messages list with non-fatal error messages
 void setExceptionHandler(IPDFParserExceptionHandler exceptionHandler)
           
static COSObject toCOSObject(byte[] data)
          parse the given byte array to a valid COSObject.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CHAR_CR

public static char CHAR_CR

CHAR_LF

public static char CHAR_LF

CHAR_HT

public static char CHAR_HT

CHAR_BS

public static char CHAR_BS

CHAR_FF

public static char CHAR_FF

TOKEN_PDFHEADER

public static final byte[] TOKEN_PDFHEADER

TOKEN_FDFHEADER

public static final byte[] TOKEN_FDFHEADER

TOKEN_EOF

public static final byte[] TOKEN_EOF

TOKEN_obj

public static final byte[] TOKEN_obj

TOKEN_endobj

public static final byte[] TOKEN_endobj

TOKEN_false

public static final byte[] TOKEN_false

TOKEN_true

public static final byte[] TOKEN_true

TOKEN_null

public static final byte[] TOKEN_null

TOKEN_startxref

public static final byte[] TOKEN_startxref

TOKEN_trailer

public static final byte[] TOKEN_trailer

TOKEN_xref

public static final byte[] TOKEN_xref

TOKEN_stream

public static final byte[] TOKEN_stream

TOKEN_s_tream

public static final byte[] TOKEN_s_tream

TOKEN_endstream

public static final byte[] TOKEN_endstream

TOKEN_ndstream

public static final byte[] TOKEN_ndstream

TOKEN_R

public static final byte[] TOKEN_R

C_WARN_UNEVENHEX

public static final String C_WARN_UNEVENHEX
See Also:
Constant Field Values

C_WARN_ILLEGALHEX

public static final String C_WARN_ILLEGALHEX
See Also:
Constant Field Values

C_WARN_STRINGTOLONG

public static final String C_WARN_STRINGTOLONG
See Also:
Constant Field Values

C_WARN_NAMETOLONG

public static final String C_WARN_NAMETOLONG
See Also:
Constant Field Values

C_WARN_ARRAYSIZE

public static final String C_WARN_ARRAYSIZE
See Also:
Constant Field Values

C_WARN_SINGLESPACE

public static final String C_WARN_SINGLESPACE
See Also:
Constant Field Values

C_WARN_SINGLEEOL

public static final String C_WARN_SINGLEEOL
See Also:
Constant Field Values

C_WARN_STREAMEOL

public static final String C_WARN_STREAMEOL
See Also:
Constant Field Values

C_WARN_ENDSTREAMEOL

public static final String C_WARN_ENDSTREAMEOL
See Also:
Constant Field Values

C_WARN_ENDSTREAMCORRUPT

public static final String C_WARN_ENDSTREAMCORRUPT
See Also:
Constant Field Values

C_WARN_STREAMEXTERNAL

public static final String C_WARN_STREAMEXTERNAL
See Also:
Constant Field Values

C_WARN_STREAMLENGTH

public static final String C_WARN_STREAMLENGTH
See Also:
Constant Field Values

C_WARN_SINGLESPACE_OBJ

public static final String C_WARN_SINGLESPACE_OBJ
See Also:
Constant Field Values

C_WARN_SINGLEEOL_OBJ

public static final String C_WARN_SINGLEEOL_OBJ
See Also:
Constant Field Values

C_WARN_ENDOBJ_MISSING

public static final String C_WARN_ENDOBJ_MISSING
See Also:
Constant Field Values
Constructor Detail

PDFParser

public PDFParser()
Method Detail

isDelimiter

public static final boolean isDelimiter(int i)
evaluate to true if i is a PDF Delimiter char.

See pdf spec delimiter characters.

Parameters:
i - i a byte representation
Returns:
true if i is a PDF delimiter char

isDigit

public static final boolean isDigit(int i)
evaluate to true if i is a valid digit.

Parameters:
i - i a byte representation
Returns:
true if i is a valid digit

isEOL

public static final boolean isEOL(int i)
evaluate to true if i is a valid line terminator.

Parameters:
i - i a byte representation
Returns:
true if i is a valid line terminator

isNumberStart

public static final boolean isNumberStart(int i)
evaluate to true if i is a valid first char for a number token.

Parameters:
i - i a byte representation
Returns:
true if i is a valid first char for a number token

isOctalDigit

public static final boolean isOctalDigit(int i)
evaluate to true if i is a valid octal digit.

Parameters:
i - i a byte representation
Returns:
true if i is a valid octal digit

isTokenStart

public static final boolean isTokenStart(int i)
evaluate to true if i is a valid string token start.

Parameters:
i - i a byte representation
Returns:
true if i is a valid string token start

isWhitespace

public static final boolean isWhitespace(int i)
evaluate to true if i is a valid whitespace.

See pdf spec "white space characters"

Parameters:
i - i a byte representation
Returns:
true if i is a valid whitespace

toCOSObject

public static COSObject toCOSObject(byte[] data)
                             throws IOException,
                                    COSLoadException
parse the given byte array to a valid COSObject.

Parameters:
data - a byte array containing COS encoded objects
Returns:
a COSObject
Throws:
IOException
COSLoadException

getExceptionHandler

public IPDFParserExceptionHandler getExceptionHandler()

handleError

public void handleError(COSLoadError error)
                 throws COSLoadException
Handle an error if an exceptionHandler is set.

Parameters:
error -
Throws:
COSLoadException

handleWarning

public void handleWarning(COSLoadWarning warning)
                   throws COSLoadException
Handle a warning if an exceptionHandler is set.

Parameters:
warning -
Throws:
COSLoadException

parseElement

public Object parseElement(IRandomAccess input)
                    throws IOException,
                           COSLoadException
parse the basic elements from the current stream position.

see PDF Reference v1.4, chapter 3.2 Objects

COSObject ::= COSToken | COSBoolean | COSString | COSNumber | COSName | COSNull | COSArray | COSDictionary | COSStream

Returns:
the object parsed
Throws:
IOException
COSLoadException

parseHeader

public STDocType parseHeader(IRandomAccess input)
                      throws IOException,
                             COSLoadException
pdf header see PDF Reference v1.4, chapter 3.4.1 Header COSHEader ::= "%PDF-" version.

Throws:
IOException
COSLoadException

readInteger

public int readInteger(IRandomAccess input,
                       boolean consumeSpaceAfter)
                throws IOException
reads the next integer on input. consumes one trailing space if consumeSpaceAfter is set to true. Consumes leading spaces and comments.

Parameters:
input -
consumeSpaceAfter -
Returns:
The integer read.
Throws:
IOException

readSpaces

public void readSpaces(IRandomAccess input)
                throws IOException
read all characters until EOF or non space char appears. the first non space char is pushed back so the next char read is the first non space char.

Throws:
IOException

readToken

public byte[] readToken(IRandomAccess input)
                 throws IOException
read a single token.

Returns:
the array of characters belonging to the token
Throws:
IOException

readToken

public byte[] readToken(IRandomAccess input,
                        List messages)
                 throws IOException
derive of readToken, populates the messages list with non-fatal error messages

Parameters:
input -
messages -
Returns:
token bytes
Throws:
IOException

setExceptionHandler

public void setExceptionHandler(IPDFParserExceptionHandler exceptionHandler)

intarsys PDF library API

Copyright © 2006 intarsys consulting GmbH. All Rights Reserved.