← Blog

Reading order — what is and isn't preserved

"Order" sounds like one thing, but PDFs carry several orderings at once. When you combine two files, only one of them — the visual page order — is reliably preserved. The others either get patched together imperfectly or dropped outright.

Visual order: the /Pages array

The simplest and strongest concept of order in a PDF is the /Pages tree at the catalog level. It is an array (or balanced tree) of references to Page objects in display order. Page 1 is the first reference; page 50 is the fiftieth.

Combining always preserves this order — it's literally the operation of concatenating the input arrays. The order you see in the combiner UI maps directly to slots in the new /Pages array.

Content stream order — rarely matters

Inside each page, drawing operations run in stream order. Putting "draw text X at Y" before "draw a rectangle" means the text is below the rectangle visually if they overlap. This is preserved trivially because each page object is copied verbatim during a combine.

The exception is text extraction. Many extractors infer reading order from drawing order, which is unreliable: a column-oriented PDF often draws all left-column lines first, then all right-column lines, or interleaves them by line index. Combining doesn't change this; it inherits it.

Logical order: the structure tree (tagged PDFs)

Accessible PDFs (PDF/UA, PDF/A-2a) carry a StructTreeRoot at the catalog level: a hierarchical tree of <Document>, <Sect>, <P>, <H1> ... nodes that mirror semantic structure. Screen readers walk this tree, not the visual page order.

Naïve combining flattens both inputs' trees into a single stub and effectively destroys the tagged structure. Preserving structure across a combine requires:

Walking each input's structure tree in parallel with its /Pages.
Building a new root with <Part> sections for each input.
Rewriting all MCID (marked content ID) references inside content streams to match the new tree.

Most combiners — including ours — drop the structure tree. If accessibility matters, tag the combined output with a separate tool, or accept that screen readers will fall back to drawing order.

Annotation order

Each page has an /Annots array — annotations are drawn (and Tab-key navigated) in the order they appear. Combining preserves per-page annotation order. The catalog-level /Tabs entry, which controls whether annotation tab order follows row-by-row, column-by-column, or structure-tree order, is dropped.

Named destinations

A PDF can declare named destinations like { "Chapter1" → page 47 view } in the catalog's /Names tree. External hyperlinks (PDF-to-PDF references via the web) often resolve to named destinations rather than page numbers. After a combine, those names from input B point to pages whose object IDs were renumbered — without rewriting, they break.

If both inputs share a name (common: "TOC", "Cover"), at least one set has to be prefixed or dropped. Most tools silently drop them on collision.

Bookmarks (outline tree)

Outlines are a separate, optional ordered tree at the catalog level. They reference pages and named destinations and have their own ordering independent of /Pages. Combining outlines is a small project: see the dedicated post on bookmarks.

What this means in practice

For most combines — assembling chapters, joining a cover page to a body, stitching scanned pages with a typed appendix — only the visual page order matters, and CombinePDF preserves it precisely. If your inputs are tagged PDFs, accessible PDFs, or have an interlocking system of bookmarks and named destinations spanning files, plan a separate post-processing step to rebuild that metadata after combining.