tesseract  3.03
tesseract::TessResultRenderer Class Reference

#include <renderer.h>

Inheritance diagram for tesseract::TessResultRenderer:
tesseract::TessBoxTextRenderer tesseract::TessHOcrRenderer tesseract::TessPDFRenderer tesseract::TessTextRenderer tesseract::TessUnlvRenderer

List of all members.

Public Member Functions

virtual ~TessResultRenderer ()
void insert (TessResultRenderer *next)
TessResultRenderernext ()
bool BeginDocument (const char *title)
bool AddImage (TessBaseAPI *api)
bool AddError (TessBaseAPI *api)
bool EndDocument ()
const char * full_typename () const
const char * file_extension () const
const char * title () const
int imagenum () const
virtual bool GetOutput (const char **data, int *data_len) const

Protected Member Functions

 TessResultRenderer (const char *type, const char *extension)
virtual bool BeginDocumentHandler ()
virtual bool AddImageHandler (TessBaseAPI *api)=0
virtual bool AddErrorHandler (TessBaseAPI *api)
virtual bool EndDocumentHandler ()
void ResetData ()
void ReserveAdditionalData (int relative_len)
void AppendString (const char *s)
void AppendData (const char *s, int len)

Detailed Description

Interface for rendering tesseract results into a document, such as text, HOCR or pdf. This class is abstract. Specific classes handle individual formats. This interface is then used to inject the renderer class into tesseract when processing images.

For simplicity implementing this with tesesract version 3.01, the renderer contains document state that is cleared from document to document just as the TessBaseAPI is. This way the base API can just delegate its rendering functionality to injected renderers, and the renderers can manage the associated state needed for the specific formats in addition to the heuristics for producing it.

Definition at line 45 of file renderer.h.


Constructor & Destructor Documentation

Definition at line 32 of file renderer.cpp.

                                        {
  delete[] output_data_;
  delete next_;
}
tesseract::TessResultRenderer::TessResultRenderer ( const char *  type,
const char *  extension 
) [protected]

Called by concrete classes

Definition at line 24 of file renderer.cpp.

    : full_typename_(type), file_extension_(extension),
      title_(""), imagenum_(-1),
      output_data_(NULL),
      next_(NULL) {
  ResetData();
}

Member Function Documentation

Called to inform the renderer when tesseract failed on an image.

Definition at line 71 of file renderer.cpp.

                                                  {
  ++imagenum_;
  bool ok = AddErrorHandler(api);
  if (next_) {
    ok = next_->AddError(api) && ok;
  }
  return ok;
}
bool tesseract::TessResultRenderer::AddErrorHandler ( TessBaseAPI api) [protected, virtual]

Definition at line 130 of file renderer.cpp.

                                                         {
  return true;
}

Adds the recognized text from the source image to the current document. Invalid if BeginDocument not yet called.

Note that this API is a bit weird but is designed to fit into the current TessBaseAPI implementation where the api has lots of state information that we might want to add in.

Definition at line 62 of file renderer.cpp.

                                                  {
  ++imagenum_;
  bool ok = AddImageHandler(api);
  if (next_) {
    ok = next_->AddImage(api) && ok;
  }
  return ok;
}
void tesseract::TessResultRenderer::AppendData ( const char *  s,
int  len 
) [protected]

Definition at line 120 of file renderer.cpp.

                                                          {
  ReserveAdditionalData(len);
  memcpy(output_data_ + output_len_, s, len);
  output_len_ += len;
}
void tesseract::TessResultRenderer::AppendString ( const char *  s) [protected]

Definition at line 116 of file renderer.cpp.

                                                   {
  AppendData(s, strlen(s));
}
bool tesseract::TessResultRenderer::BeginDocument ( const char *  title)

Starts a new document with the given title. This clears the contents of the output data.

Definition at line 50 of file renderer.cpp.

                                                        {
  ResetData();

  title_ = title;
  imagenum_ = -1;
  bool ok = BeginDocumentHandler();
  if (next_) {
    ok = next_->BeginDocument(title) && ok;
  }
  return ok;
}

Reimplemented in tesseract::TessPDFRenderer, and tesseract::TessHOcrRenderer.

Definition at line 126 of file renderer.cpp.

                                              {
  return true;
}

Finishes the document and finalizes the output data Invalid if BeginDocument not yet called.

Definition at line 80 of file renderer.cpp.

                                     {
  bool ok = EndDocumentHandler();
  if (next_) {
    ok = next_->EndDocument() && ok;
  }
  return ok;
}

Reimplemented in tesseract::TessPDFRenderer, and tesseract::TessHOcrRenderer.

Definition at line 134 of file renderer.cpp.

                                            {
  return true;
}
const char* tesseract::TessResultRenderer::file_extension ( ) const [inline]

Definition at line 85 of file renderer.h.

{ return file_extension_; }
const char* tesseract::TessResultRenderer::full_typename ( ) const [inline]

Definition at line 84 of file renderer.h.

{ return full_typename_; }
bool tesseract::TessResultRenderer::GetOutput ( const char **  data,
int *  data_len 
) const [virtual]

The results are not defined if EndDocument has not yet been called. Returns the current output from the renderer. The data is owned by the renderer and only valid until the next call into the renderer that may modify document state (such as Begin/End Document or AddImage.

Definition at line 88 of file renderer.cpp.

                                                                         {
  *data = output_data_;
  *data_len = output_len_;
  return true;
}

Returns the index of the last image given to AddImage or AddError (i.e. images are incremented whether the image succeeded or not)

This is always defined. It means either the number of the current image, the last image ended, or in the completed document depending on when in the document lifecycle you are looking at it. Will return -1 if a document was never started.

Definition at line 97 of file renderer.h.

{ return imagenum_; }

Definition at line 37 of file renderer.cpp.

                                                        {
  if (next == NULL) return;

  TessResultRenderer* remainder = next_;
  next_ = next;
  if (remainder) {
    while (next->next_ != NULL) {
      next = next->next_;
    }
    next->next_ = remainder;
  }
}

Definition at line 55 of file renderer.h.

{ return next_; }
void tesseract::TessResultRenderer::ReserveAdditionalData ( int  relative_len) [protected]

Definition at line 101 of file renderer.cpp.

                                                               {
  int total = relative_len + output_len_;
  if (total <= output_alloc_)
    return;

  if (total < 2 * output_alloc_) {
    total = 2 * output_alloc_;
  }

  char* new_data = new char[total];
  memcpy(new_data, output_data_, output_len_);
  delete[] output_data_;
  output_data_ = new_data;
}

Definition at line 94 of file renderer.cpp.

                                   {
  delete[] output_data_;
  output_data_ = new char[kInitialAlloc];
  output_alloc_ = kInitialAlloc;
  output_len_ = 0;
}
const char* tesseract::TessResultRenderer::title ( ) const [inline]

Definition at line 86 of file renderer.h.

{ return title_; }

The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines