Python 3 (Pillow + Fitz + PyMuPDF) Example Script to Extract all Images From PDF Document Full Project For Beginners

Python 3 (Pillow + Fitz + PyMuPDF) Example Script to Extract all Images From PDF Document Full Project For Beginners

 

Welcome folks today in this blog  post we will be extracting all images from pdf document in python using fitz and PyMuPDF Library. All the full source code of the application is given below.

 

 

 

Get Started

 

 

In order to get started we need to install the following libraries using the pip command as shown below

 

pip install pillow

 

pip install fitz

 

pip install PyMuPDF

 

After you install these libraries inside your python project now just make an app.py file and copy paste the following code to it

 

app.py

 

import fitz # PyMuPDF
import io
from PIL import Image

# file path you want to extract images from
file = "###inputfilepath##.pdf"
# open the file
pdf_file = fitz.open(file)
# iterate over PDF pages
for page_index in range(len(pdf_file)):
    # get the page itself
    page = pdf_file[page_index]
    image_list = page.getImageList()
    # printing number of images found in this page
    if image_list:
        print(f"[+] Found a total of {len(image_list)} images in page {page_index}")
    else:
        print("[!] No images found on page", page_index)
    for image_index, img in enumerate(page.getImageList(), start=1):
        # get the XREF of the image
        xref = img[0]
        # extract the image bytes
        base_image = pdf_file.extractImage(xref)
        image_bytes = base_image["image"]
        # get the image extension
        image_ext = base_image["ext"]
        # load it to PIL
        image = Image.open(io.BytesIO(image_bytes))
        # save it to local disk
        image.save(open(f"image{page_index+1}_{image_index}.{image_ext}", "wb"))

 

See also  Python 3 Script to Extract Text From PDF File Using PyPDF2 Library Full Tutorial For Beginners

 

Now in the above python snippet of code just replace the input path of pdf file where which you need to extract images

Now if you execute the python script by typing the below command you will see it will extract all the images which are present inside the pdf document

 

python app.py

 

 

 

 

 

Leave a Reply