Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Question
Tuesday, September 10, 2013 3:11 PM | 1 vote
Hi All,
I've been looking around online, and have seen various solutions for extracting / finding text in a PDF using PoSH. However, one thing I don't appear to have found yet is something that can identify attached images (JPG, BMP etc) in the PDF, and extract each to a file.
Can this be done? Having extracted the image file, I then want to compare it to a "library" file, to ensure that the content is as expected.
Am I on a mission impossible here? Or is it just a case of being close, and needing the cigar handed to me? :)
Cheers
Steve
All replies (4)
Tuesday, September 10, 2013 8:15 PM âś…Answered | 3 votes
Hi Steve,
I'm not aware of a native way to do that through PowerShell but I have done it by using pdfimages.exe that is part of the free xpdf pdftools set from http://www.foolabs.com/xpdf/download.html (scroll down to the precompiled binaries section and download the x86 Windows version):
#Extracts images from a PDF file
#the images are converted to .jpg
function Extract-PDFImages($pdfPath,$imgFolder,$imgPrefix){
if (!(Test-Path $imgFolder)){
New-Item $imgFolder -ItemType Dir | Out-Null
}
$root="$imgFolder\$imgPrefix"
& 'C:\misc\PDFTools\bin32\pdfimages.exe' "-j" "$pdfPath" "$root"
}
Extract-PDFImages "c:\My.pdf" "c:\users\Administrator\Desktop\test" "img"
Wednesday, September 11, 2013 10:26 AM
Hi,
I would like to confirm what is the current situation.
Please feel free to let us know if you need further assistance.
Regards.
If you have any feedback on our support, please click here
Vivian Wang
TechNet Community Support
Monday, October 21, 2013 3:32 PM
Hi Mike / Vivian,
Apologies for not replying sooner - project demands meant I got sidetracked. As part of that, I've found a workaround that doesn't need PoSH (Using Smartbear's TestComplete / JScript)
That said, this workaround is still "in development" - and I'm not totally sure it's the solution - nor such a good solution as Mike's above.
If nothing else - Mike's suggestion of using pdfimages.exe may prove VERY useful - even if in JScript, not in PoSH.
Apologies again, and thank you both very much for your time and assistance!
Jack of few trades. Master of even fewer.
Wednesday, June 7, 2017 3:50 PM
Would this same solution work on Windows 10?