Click the File #1 button to choose one PDF file and then the File #2 button to choose another (ideally very similar) PDF file, then click the Compare button to perform the comparison, and when that's finished, navigate through the pairs of differing pages using the View combobox or using the Previous and Next buttons. Alternatively, drag two files—either separately or together—and drop them onto DiffPDF's view panels, then click the Compare button.
When the Compare button is pressed, DiffPDF does a high-speed scan of every pair of pages (~100 pairs of pages per second on the author's machine). To make the scan as fast as possible DiffPDF does a very rough check of each pair of pages—so it is possible that it identifies some false positives (i.e., page pairs that are really the same). False positives are quite rare. (There are no false negatives—differences are never missed.)
The default comparison mode is Words which does a smart text comparison word by word for each pair of pages. This mode is fairly liberal regarding whitespace and tries to ignore layout changes (within a page) insofar as possible. It also treats all hyphens (soft-hyphen, minus sign, etc.), the same, that is, as a plain hyphen. This mode is best for alphabetic languages like English.
The Characters comparison mode does a smart text comparison character by character for each pair of pages. This mode is liberal regarding whitespace at the ends of lines and tries to ignore layout changes (within a page) insofar as possible. It also treats all hyphens (soft-hyphen, minus sign, etc.), the same, that is, as a plain hyphen. This mode is best for logographic languages like Chinese and Japanese.
The Appearance comparison mode can be used to detect changes in fonts, diagrams, or any other visual aspects. This mode is absolutely strict and compares each pair of pages pixel for pixel. By default this mode shows differences using highlighting just like the Words and Characters modes do. However, it is also possible to compare using composition modes which can be useful to detect very small and subtle differences that aren't immediately apparent.
Zoning is an experimental feature designed to produce more accurate results (i.e., fewer false positives). Its main use is for pages that have tables or that mix alphabetic and logographic text, since these can cause the underlying popplar PDF library to provide the page's words mixed up. Warning: using zoning for large complex pages (bigger than A4, multiple columns, tables) in Characters mode can be very slow. (The current focus for the zoning code is functionality not efficiency.) Furthermore, in some cases zoning can cause an increase in false positives—this can occur because the zoning code reorders the text that is fed to the sequence matcher and sometimes the reordering is wrong. Getting this right is non-trivial; changing the tolerances may help.
The Tolerance/R value is the maximum distance between text (i.e., word) rectangles for the rectangles to be placed in the same zone. Lower values create more zones; higher values create fewer zones. More zones are expensive to compute but can produce more accurate results; fewer zones may reduce false positives. The Tolerance/Y value is is used for rounding y coordinates to the nearest multiple of this value. For example, if Tolerance/Y is 5 and a word at position (452,137) is followed by a superscript at (468,140), both will be treated as having a y coordinate of 140.
By default DiffPDF compares every pair of pages in the two PDFs (or as many pairs of pages as the number of pages in the shorter PDF). It is also possible to compare particular pages or page ranges. For example, if there are two versions of a PDF file, one with pages 1-12 and the other with pages 1-13 because of an extra page having been added as page 4, they can be compared by specifying two page ranges, 1-12 for the first and 1-3, 5-13 for the second. This will make DiffPDF compare pages in the pairs (1, 1), (2, 2), (3, 3), (4, 5), (5, 6), and so on, to (12, 13).
This dialog is invoked by clicking the Options button. The dialog supports changing the highlighting color, whether to use a pen or fill or both, and the fill's opacity. The Square Size is used when doing Appearance mode comparisons: the smaller the size the more fine-grained the highlighting is—and the slower to compute. The Rule width determines the thickness of the margin rules which are used to indicate the vertical position of differences; the rules can be switched off using a Rule width of 0.
The Controls, Actions, and Log views are in dock widgets—these can be dragged into other dock areas (in which case they will reshape themselves as necessary), or dragged to float free. The Log can also be closed; right click a dock area and check the Log checkbox to open it again.
Although DiffPDF is a GUI program, if run from a console with two PDF files listed on the command line, DiffPDF will start up and immediately compare them in Words mode, or in Appearance mode if their names are preceded with -a or --appearance on the command line, or in Characters mode if their names are preceded with -c or --character on the command line. Run DiffPDF with --help to see all the command line options.
If you're specifically looking for a command line PDF comparison tool, e.g., for automated testing, try comparepdf.
There are also debugging options. Use --debug=2 and --debug=3 to write the texts in the order they are fed to the sequence matcher into temporary files (e.g., /tmp/page1.txt, etc.). The text reordering is done by the TextItems::columnZoneYxOrder() method in the textitem.cpp file: suggestions for improvement are welcome! (Note that when using --debug3 coordinates are output in y, x order.)