Monday, July 4, 2011

Merge .docx files in Java using docx4j

This code is used in one of the projects I was attached to. The first source file (stream) is used as master document, so, all styles defined in this one will be applied to subdocument (incl. headers and footers).
public class DocxService {
  private static final String CONTENT_TYPE = "application/vnd.openxmlformats-officedocument.wordprocessingml.document";

  public InputStream mergeDocx(final List<InputStream> streams) throws Docx4JException, IOException {

    WordprocessingMLPackage target = null;
    final File generated = File.createTempFile("generated", ".docx");

    int chunkId = 0;
    Iterator<InputStream> it = streams.iterator();
    while (it.hasNext()) {
      InputStream is = it.next();
      if (is != null) {
        if (target == null) {
          // Copy first (master) document
          OutputStream os = new FileOutputStream(generated);
          os.write(IOUtils.toByteArray(is));
          os.close();

          target = WordprocessingMLPackage.load(generated);
        } else {
          // Attach the others (Alternative input parts)
          insertDocx(target.getMainDocumentPart(), IOUtils.toByteArray(is), chunkId++);
        }
      }
    }

    if (target != null) {
      target.save(generated);
      return new FileInputStream(generated);
    } else {
      return null;
    }
  }

  private static void insertDocx(MainDocumentPart main, byte[] bytes, int chunkId) {
    try {
      AlternativeFormatInputPart afiPart = new AlternativeFormatInputPart(new PartName("/part" + chunkId + ".docx"));
      afiPart.setContentType(new ContentType(CONTENT_TYPE));
      afiPart.setBinaryData(bytes);
      Relationship altChunkRel = main.addTargetPart(afiPart);

      CTAltChunk chunk = Context.getWmlObjectFactory().createCTAltChunk();
      chunk.setId(altChunkRel.getId());

      main.addObject(chunk);
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}
Note: Generated file can be opened only in Microsoft Office 2007 or newer (Win/Mac). OpenOffice/LibreOffice render only the master document, ignorind attached ones.

21 comments:

  1. Not work in fedora 13. Empty docx is given as result.

    ReplyDelete
  2. Thank you for your feedback.
    I will install Fedora 13 on a VM and will try to reproduce.

    ReplyDelete
  3. Please, send me (barusin@wszib.edu.pl) merged file produced by Your code on Windows (and source files).

    ReplyDelete
  4. Sorry, my mistake, it works but open office could't open result document. Under Mac and Windows (Office 2011, 2007) it works fine.

    ReplyDelete
  5. No problem. Anyway, I sent You a sample project with updated sources.
    I will update this post soon too :)

    ReplyDelete
  6. Thanks! Realy good job.

    ReplyDelete
  7. Thanks for the code but unfortunately I'm unable able to get it to produce any results. I pass the mergeDocx method a list of FileInputStreams that point to the files to be concatenated, then the code runs and seems to process everything but doesn't produce any resultant file and I can't figure out how to make it do so.

    Any tips? Thanks!

    ReplyDelete
  8. Hi, John.

    I think the best tip will be a working application. You can download it from http://dl.dropbox.com/u/23122948/docx4jDemo.tar.gz
    Import it as maven project in your favorite IDE.

    If you have more questions or need some help, don't hesitate to ask, I'll be glad to help.

    ReplyDelete
    Replies
    1. Hi Stanislaw,

      Can you put again the working application that was at http://dl.dropbox.com/u/23122948/docx4jDemo.tar.gz, please? I have problems with the code above and I want urgent to resolved it, please.

      Delete
  9. Hi Stanislaw,

    I managed to find a solution for my problem, luckily it was as simple as replacing lines 29 and 30 with:

    SaveToZipFile saver = new SaveToZipFile(target);
    saver.save(outputfilepath);

    Where 'outputfilepath' is a string pointing to the desired save location.

    However I'll still inspect the demo that you've provided as I'm sure I can still learn a thing or two from it. Many thanks for your super quick response and help - brilliant!

    ReplyDelete
    Replies
    1. Hi John,
      Can you send me, please, a link to download the working demo application /docx4jDemo.tar.gz?

      Delete
  10. Hello
    I've tried the example from above for several documents. The resulting document size seems to be approximately equal size to the sum of other documents, but only first one is visible. Do I miss something like page breaks ?
    I am using docx4j 2.9.0-SNAPSHOT.jar
    Thanks

    ReplyDelete
    Replies
    1. Hello,
      excuse me Daniel, where did you find docx4j 2.9.0-SNAPSHOT.jar?
      thx

      Delete
  11. hi,
    in my case it only worked when I removed the line 39.
    good work!

    ReplyDelete
    Replies
    1. Hi Braynner,
      Can you post here, please, a link to download a working demo application? I need it very quickly, please!

      Delete
  12. Hi Braunner.
    I'm glad that the snipped was useful for you.

    ReplyDelete
  13. 2Anonymous.

    You can download the file from this adress:
    https://www.dropbox.com/s/w4yrfw979zimzw0/docx4jDemo.tar.gz

    ReplyDelete
    Replies
    1. Hi, Stanislav,
      Thank you very much for the code but unfortunately I have some problem with the resulted file. The code runs and seems to process everything, generate the resulted file that concatenate 2 docx files and when i want to open the resulted file i get the following error: "The file is corrupt and cannot be opened. Location: Part: word/document.xml."

      Any idea? It's very urgent to me and I'll appreciate your help. Thank's for all.

      Delete
    2. I tried to open even your ZResult file that is the resulted file and i get same error: "The file is corrupt and cannot be opened. Location: Part: word/document.xml."
      How can I solve this problem? Thank's again, Stanislav and I'll wait an answer from you quickly, please.

      Delete
  14. This is great! Thank you very much!

    ReplyDelete
  15. same error me the result file is corrupted ..it said Microsoft 2016 for Mac OS X
    Is there a way to fix it?

    ReplyDelete