Stanislav Oaserele: Merge .docx files in Java using docx4j

Monday, July 4, 2011

Merge .docx files in Java using docx4j

This code is used in one of the projects I was attached to. The first source file (stream) is used as master document, so, all styles defined in this one will be applied to subdocument (incl. headers and footers).

public class DocxService {
  private static final String CONTENT_TYPE = "application/vnd.openxmlformats-officedocument.wordprocessingml.document";

  public InputStream mergeDocx(final List<InputStream> streams) throws Docx4JException, IOException {

    WordprocessingMLPackage target = null;
    final File generated = File.createTempFile("generated", ".docx");

    int chunkId = 0;
    Iterator<InputStream> it = streams.iterator();
    while (it.hasNext()) {
      InputStream is = it.next();
      if (is != null) {
        if (target == null) {
          // Copy first (master) document
          OutputStream os = new FileOutputStream(generated);
          os.write(IOUtils.toByteArray(is));
          os.close();

          target = WordprocessingMLPackage.load(generated);
        } else {
          // Attach the others (Alternative input parts)
          insertDocx(target.getMainDocumentPart(), IOUtils.toByteArray(is), chunkId++);
        }
      }
    }

    if (target != null) {
      target.save(generated);
      return new FileInputStream(generated);
    } else {
      return null;
    }
  }

  private static void insertDocx(MainDocumentPart main, byte[] bytes, int chunkId) {
    try {
      AlternativeFormatInputPart afiPart = new AlternativeFormatInputPart(new PartName("/part" + chunkId + ".docx"));
      afiPart.setContentType(new ContentType(CONTENT_TYPE));
      afiPart.setBinaryData(bytes);
      Relationship altChunkRel = main.addTargetPart(afiPart);

      CTAltChunk chunk = Context.getWmlObjectFactory().createCTAltChunk();
      chunk.setId(altChunkRel.getId());

      main.addObject(chunk);
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}

Note: Generated file can be opened only in Microsoft Office 2007 or newer (Win/Mac). OpenOffice/LibreOffice render only the master document, ignorind attached ones.

21 comments:

AnonymousAugust 8, 2011 at 1:11 PM
Not work in fedora 13. Empty docx is given as result.
ReplyDelete
Replies
UnknownAugust 8, 2011 at 1:28 PM
Thank you for your feedback.
I will install Fedora 13 on a VM and will try to reproduce.
ReplyDelete
Replies
AnonymousAugust 8, 2011 at 3:09 PM
Please, send me (barusin@wszib.edu.pl) merged file produced by Your code on Windows (and source files).
ReplyDelete
Replies
AnonymousAugust 8, 2011 at 4:35 PM
Sorry, my mistake, it works but open office could't open result document. Under Mac and Windows (Office 2011, 2007) it works fine.
ReplyDelete
Replies
UnknownAugust 8, 2011 at 4:44 PM
No problem. Anyway, I sent You a sample project with updated sources.
I will update this post soon too :)
ReplyDelete
Replies
AnonymousAugust 8, 2011 at 4:48 PM
Thanks! Realy good job.
ReplyDelete
Replies
JohnSeptember 9, 2011 at 6:43 PM
Thanks for the code but unfortunately I'm unable able to get it to produce any results. I pass the mergeDocx method a list of FileInputStreams that point to the files to be concatenated, then the code runs and seems to process everything but doesn't produce any resultant file and I can't figure out how to make it do so.

Any tips? Thanks!
ReplyDelete
Replies
UnknownSeptember 9, 2011 at 7:09 PM
Hi, John.

I think the best tip will be a working application. You can download it from http://dl.dropbox.com/u/23122948/docx4jDemo.tar.gz
Import it as maven project in your favorite IDE.

If you have more questions or need some help, don't hesitate to ask, I'll be glad to help.
ReplyDelete
Replies
JohnSeptember 9, 2011 at 7:19 PM
Hi Stanislaw,

I managed to find a solution for my problem, luckily it was as simple as replacing lines 29 and 30 with:

SaveToZipFile saver = new SaveToZipFile(target);
saver.save(outputfilepath);

Where 'outputfilepath' is a string pointing to the desired save location.

However I'll still inspect the demo that you've provided as I'm sure I can still learn a thing or two from it. Many thanks for your super quick response and help - brilliant!
ReplyDelete
Replies
DanielDecember 11, 2012 at 5:36 PM
Hello
I've tried the example from above for several documents. The resulting document size seems to be approximately equal size to the sum of other documents, but only first one is visible. Do I miss something like page breaks ?
I am using docx4j 2.9.0-SNAPSHOT.jar
Thanks
ReplyDelete
Replies
braynnerJanuary 15, 2013 at 3:58 PM
hi,
in my case it only worked when I removed the line 39.
good work!
ReplyDelete
Replies
UnknownJanuary 17, 2013 at 4:40 PM
Hi Braunner.
I'm glad that the snipped was useful for you.
ReplyDelete
Replies
UnknownJanuary 17, 2013 at 4:41 PM
2Anonymous.

You can download the file from this adress:
https://www.dropbox.com/s/w4yrfw979zimzw0/docx4jDemo.tar.gz
ReplyDelete
Replies
AnonymousMarch 12, 2015 at 11:47 AM
This is great! Thank you very much!
ReplyDelete
Replies
Christian PadovanoMay 3, 2016 at 1:08 AM
same error me the result file is corrupted ..it said Microsoft 2016 for Mac OS X
Is there a way to fix it?
ReplyDelete
Replies

Add comment