Thursday, March 8, 2007

A Web Form Based PDFs Merger

Task: Provide a web-based application allows users to merger multiple files. Application is running in Grails framework.


Analysis: This problem can be split into three sub-tasks and be solved in three steps: upload multiple pdf files, retrieve submitted files and merger multiple pdfs.

Step1: Upload Multiple PDFs Files


Upload and process multiple files from a Web Form is not a trivial task because file input element allows uploading only one file at a time. Inspired by StickBlog's excellent post: Upload multiple files with a single file element. I decided to use Javascript instead of Applet to achieve this goal. I modified StickBlog's code to accommodate my needs. The user interface pdfmerger.gsp has the following element:


<g:form action="merge" method="post" enctype="multipart/form-data">

<input id='myfile' type='file' name='' onChange="addElement()"></input>
<input type="submit" value="Submit">

<br>Files list (Please note maximun number of uploaded files is 5):
<!-- This is where the output will appear -->
<div id="filesList"></div>
</g:form>


The event onChange of file input is captured. Each time a file is selected, the Javascript function addElement in script.js is invoked


var new_row = document.createElement('div' );


and a new <div> element is created and three elements: a text input box, a button and a file input box are appended to it. The text input box is just for display the file name. You can use other element for this purpose too:


var new_row_input =document.createElement( 'input' );
new_row_input.type = 'text';
new_row_input.name = "ins_" + (childs.length + 1)
new_row_input.value = element.value;


The button is used to delete a corresponding uploaded file if it is clicked on:


var new_row_button =document.createElement( 'input' );

new_row_button.type = 'button';
new_row_button.onclick = function (){
...
...
}


The file input box stored the uploaded files and will be submitted to server, and we like to make it invisible:


var new_row_file_input =document.createElement( 'input' );
new_row_file_input.setAttribute ('name','file_' + count);
new_row_file_input.setAttribute ('id','file_' + count);
new_row_file_input.value = element.value;
new_row_file_input.style.opacity = 0;


Finally, the newly created <div> element is appended to <div>with id"file_list " in pdfmerger.gsp:


new_row.appendChild(new_row_input);
new_row.appendChild( new_row_button );
new_row.appendChild(new_row_file_input);

target_list.appendChild (new_row);


The complete Javascript can be found here.


Step 2: Retrieve Submitted Files

Submitted files are retrieved and processed on server side in acontroller called PdfmergerController.java.In Grails, retrieving files is very easy by using build in Spring file system.We want to retrieve all submitted files and stored into an ArrayList for furtherprocessing.


for (i in 0.. max_num-1){
def file_name = "file_" + i;
def f = request.getFile(file_name);

if(f!=null && !(f.isEmpty())){
//println "file content type " + f.getContentType()
FileInputStream ins = f.getInputStream()
pdfs.add(ins);
}
}


Step 3: Merger Multiple PDFs

Once we have all submitted files in the list pdfs, we are ready to mergerthe PDFs. I just adopted the source code of method "concatPDFs" from Abhi'spost. It works so well with my system. The source code ofPdfmergerController.java can be found here.We added a beforeInterceptor to validatefiles content type before request is processed further.

Commentary

1. If server side processing is done ina J2ee environment, a third party library is needed since Java Servlet and Jspdo not have building mechanism to handle web form based file uploading. I hadApache Jakarta CommonsFileUpload package. It is an open source and can be downloaded from Apache Jakarta CommonsFileUpload project. Another package Apache Jakarta Commons IOproject is also needed internally by FileUpload package.

2. In step 3, we do file content typecheck on server side. However, it is also a good idea to validate content typeat client side when file is uploaded.

3. I just test the sample applicationwith IE and Firefox and no other else.

References

1. Uploadmultiple files with a single file element(StickBlog)
2. MergePDF files with iText (Abhi on Java)
3. Grails onlinetutorial
4. Tutorial: iText ByExample
5. ApacheJakarta Commons FileUpload project
6.
Apache Jakarta Commons IOproject





4 comments:

Graeme Rocher said...

Interesting post, I can't seem to access your Google doc examples though. Have you set permissions correctly so that your docs are public?

Xiaoyun Tang said...

Thanks, Graeme. I had fixed it.

Anonymous said...

Nice! Would be great to have "multiple file upload" available for integration more directly, fx as a plugin, by scaffolding or something similar in Grails... Cheers!

Anonymous said...

Great post. Just found one small change needed to the .js file if your working off the sample code.

var target_list = document.getElementById( 'fileList')

var target_list = document.getElementById( 'files_list')