Node.js Project to Extract Text From PDF File Using pdf-to-text Library in Javascript Full Tutorial For Beginners

Node.js Project to Extract Text From PDF File Using pdf-to-text Library in Javascript Full Tutorial For Beginners

 

Welcome folks today in this tutorial we will be seeing how to extract text from pdf file using javascript and node.js. For this purpose we will be using the pdf-to-text library. All the source code of the application will be shown below.

 

 

 

Get Started

 

 

 

In order to get started we need to install the pdf-to-text library by using the npm command

 

npm i pdf-to-text

 

After installing this library you need to make an index.js file and copy paste the following code

 

 

index.js

 

 

So in the first example we will be getting the information regarding the pdf file which we will be providing as the input file to the library

 

 

Getting Information About PDF File

 

 

var pdfUtil = require('pdf-to-text');
var pdf_path = "sample.pdf";
 
pdfUtil.info(pdf_path, function(err, info) {
    if (err) throw(err);
    console.log(info);
});

 

 

 

If you run your node.js script by running the command as shown below

 

node index.js

 

 

 

 

See also  Build Advanced Scientific Calculator Using Math.js Library in Browser Using HTML5 & Javascript Full Project For Beginners

 

Now you can see that we have received an object which contains all the info about the pdf file which is passed to the library function. It contains the title of the pdf file, author of the pdf file, size and time of creation etc.

 

Now we will be extracting the text portion of the pdf file using this library method called as pdfToText() which will take the pdf file and it will return all the text which is present in all the pages

 

Extracting Text From PDF File

 

 

var pdfUtil = require('pdf-to-text');
var pdf_path = "sample.pdf";
 

pdfUtil.pdfToText(pdf_path,function(err, data) {
    if (err) throw(err);
    console.log(data); //print text    
  });

 

 

Now if you run your node.js script you will see all your text portion of pdf file as shown below

 

 

 

Leave a Reply