Sunday, September 20, 2009

Characters Calculation of PDF Document with PHP




In this tutorial, We wil convert the document format also. We will convert pdf to text file
through php and then read its content to calculate the number of characters in the file.



SOLUTION




It is quite simple to calculate characters of a pdf document. To accomplish this task. I will
use pdf2html Linux Library.
Please download and install pdf2html library from http://sourceforge.net/projects/pdftohtml/


Code to execute pdf conversion and characters calculation.

Linux command execution to convert the pdf to text format.

'/usr/bin/pdftotext ' . $file_path; //File path must be the absolute server path.

PHP
shell_exec('/usr/bin/pdftotext ' . $file_path);


Complete code to upload a file to the processed folder in your root directory.

if(move_uploaded_file($_FILES[$filen]['tmp_name'],'processed/'.$_FILES[$filen]['name'])){

$file_name=$_FILES[$filen]['name'];
$file_path=$_SERVER['DOCUMENT_ROOT'].'/processed/'.$_FILES[$filen]['name'];


$file_name=str_replace('.pdf','.txt',$file_name);

$output=shell_exec('/usr/bin/pdftotext ' . $file_path);

sleep(2);
$handle = fopen($file_name, "r");
$contents = fread($handle, filesize($file_name));
fclose($handle);
$file_count = strlen(str_replace(' ','',$contents));



}

TroubleShooting

1. shell_exec function will not execute. If you don't have permission to run ssh commands
and also if your php is running in the safe mode.

2. This script will generate a text file with same name and directory where you have placed
the pdf file. So if the file isn't create in that directory and your program will work you
will able to track the file in the root directory. This means you have to correct your
file path.

3. Cannot count the calulation and upload the file. It is necessary to change the rights
of processed folder to 777.

If you have further questions about this post, kindly post your comments.


2 comments:

  1. Thanks, for converting pdf to html in windows, I am using anybizsoft pdf to html converter. It support the conversion of encrypted pdfs.

    ReplyDelete
  2. The code can work on linux and windows both. It requires php. You can use this code for a proofreading website to calculate the number characters or words to provide a quote instantly.

    ReplyDelete

All rights are reserved. Nobleatom.com
Software Development Services.
Contact me: khubabmazhar596@hotmail.com

Web Design Increase Page Rank Internet blogs DigNow.net web directory1Abc DirectorySeo friendly web directory