Java Read Pdf Example

The PdfDocument is the main class in PDFOne Java. It represents a PDF document and allows you to create, read, and enhance PDF documents. It offers numerous methods for you to render PDF elements such as text, images, shapes, forms, watermarks, and annotations on to documents. Creating Your First PDF Document. How to Read and Write PDF Files in Java Learn how to create, read, and write to PDF documents using PDFOne. To read an existing PDF document, you need to create a PdfDocument object and then call its 'load' method with the file pathname or a memory stream containing the PDF file. Java ZipInputStream. ZipInputStream is a Java class that implements an input stream filter for reading files in the ZIP file format. It has support for both compressed and uncompressed entries.

Java Programming Examples Tutorial Pdf

If you have a few years of Linux experience, and you're interested in sharing that with the community (and getting paid for your work of course), have a look at the 'Write for Us' page.Cheers. Eugen

1. Introduction

Java

In this quick article, we'll focus on creating PDF document from scratch based on popular iText and PdfBox library.

Read

2. Maven Dependencies

Let's take a look at the Maven dependencies, which needs to be included in our project:

The latest version of the libraries can be found here: iText and PdfBox.

One extra dependency is necessary to add, in case our file will need to be encrypted. The Bounty Castle Provider package contains implementations of cryptographic algorithms and is required by both libraries:

The latest version of the library can be found here: The Bounty Castle Provider.

3. Overview

Both, the iText and PdfBox are java libraries used for creation/manipulation of pdf files. Although the final output of the libraries is the same, they operate in a bit different manner. Let's take a look at them.

4. Create Pdf in IText

4.1. Insert Text in Pdf

Let's have a look, at the way a new file with “Hello World” text is inserted in pdf file

Creating a pdf with a use of the iText library is based on manipulating objects implementing Elements interface in Document (in version 5.5.10 there are 45 of those implementations).

The smallest element which can be added to the document and used is called Chunk, which is basically a string with applied font.

Additionally, Chunk‘s can be combined with other elements like Paragraphs, Section etc. resulting in nice looking documents.

4.2. Inserting Image

The iText library provides an easy way to add an image to the document. We simply need to create an Image instance and add it to the Document.

4.3. Inserting Table

We might face a problem when we would like to add a table to our pdf. Luckily iText provides out-of-the-box such functionality.

First what we need to do is to create a PdfTable object and in constructor provide a number of columns for our table. Now we can simply add new cell by calling

Now we can simply add new cell by calling the addCell method on the newly created table object. iText will create table rows as long as all necessary cells are defined, what it means is that once you create a table with 3 columns and add 8 cells to it, only 2 rows with 3 cells in each will be displayed.

Let's take a look at the example:

We create a new table with 3 columns and 3 rows. The first row we will treat as a table header with a changed background color and border width:

The second row will be composed of three cells just with text, no extra formatting.

We can include not only text in cells but also images. Additionally, each cell might be formatted individually, in the example presented below we apply horizontal and vertical alignment adjustments:

4.4. File Encryption

In order to apply permission using iText library, we need to have already created pdf document. In our example, we will use our iTextHelloWorld.pdf file generated previously.

Once we load the file using PdfReader, we need to create a PdfStamper which is used to apply additional content to file like metadata, encryption etc:

In our example, we encrypted the file with two passwords. The user password (“userpass”) where a user has only read-only right with no possibility to print it, and owner password (“ownerpass”) that is used as master key allowing a person to have full access to pdf.

If we want to allow the user to print pdf, instead of 0 (third parameter of setEncryption) we can pass:

Of course, we can mix different permissions like:

Keep in mind that using iText to set access permissions, we are also creating a temporary pdf which should be deleted and if not it could be fully accessible to anybody.

5. Create Pdf in PdfBox

Java

5.1. Insert Text in Pdf

As opposite to the iText, the PdfBox library provides API which is based on stream manipulation. There are no classes like Chunk/Paragraph etc. The PDDocument class is an in-memory Pdf representation where the user writes data by manipulating PDPageContentStream class.

Let's take a look at the code example:

5.2. Inserting Image

Inserting images is straightforward.

First we need to load a file and create a PDImageXObject, subsequently draw it on the document (need to provide exact x,y coordinates).

That's all:

5.3. Inserting a Table

Unfortunately, PdfBox does not provide any out-of-box methods allowing creating tables. What we can do in such situation is to draw it manually – literally, draw each line until our drawing resembles our dreamed table.

5.4. File Encryption

PdfBox library provides a possibility to encrypt, and adjust file permission for the user. Comparing to iText, it does not require to use an already existing file, as we simply use PDDocument. Pdf file permissions are handled by AccessPermission class, where we can set if a user will be able to modify, extract content or print a file.

Subsequently, we create a StandardProtectionPolicy object which adds password-based protection to the document. We can specify two types of password. The user password, after which person will be able to open a file with applied access permissions and owner password (no limitations to the file):

Our example presents a situation that if a user provides user password, the file cannot be modified and printed.

6. Conclusions

In this tutorial, we discussed ways of creating a pdf file in two popular Java libraries.

Full examples can be found in the Maven based project over on GitHub.

I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2:

>> CHECK OUT THE COURSE
  • PDFBox Tutorial
  • PDFBox Useful Resources
  • Selected Reading

In the previous chapter, we have seen how to add text to an existing PDF document. In this chapter, we will discuss how to read text from an existing PDF document.

Extracting Text from an Existing PDF Document

Extracting text is one of the main features of the PDF box library. You can extract text using the getText() method of the PDFTextStripper class. This class extracts all the text from the given PDF document.

Following are the steps to extract text from an existing PDF document.

Step 1: Loading an Existing PDF Document

Read Pdf File In Java

Load an existing PDF document using the static method load() of the PDDocument class. This method accepts a file object as a parameter, since this is a static method you can invoke it using class name as shown below.

Step 2: Instantiate the PDFTextStripper Class

The PDFTextStripper class provides methods to retrieve text from a PDF document therefore, instantiate this class as shown below.

Step 3: Retrieving the Text

You can read/retrieve the contents of a page from the PDF document using the getText() method of the PDFTextStripper class. To this method you need to pass the document object as a parameter. This method retrieves the text in a given document and returns it in the form of a String object.

Step 4: Closing the Document

Finally, close the document using the close() method of the PDDocument class as shown below.

Example

Suppose, we have a PDF document with some text in it as shown below.

This example demonstrates how to read text from the above mentioned PDF document. Here, we will create a Java program and load a PDF document named new.pdf, which is saved in the path C:/PdfBox_Examples/. Save this code in a file with name ReadingText.java.

Compile and execute the saved Java file from the command prompt using the following commands.

Upon execution, the above program retrieves the text from the given PDF document and displays it as shown below.