Jeromy Anglim's Blog: Psychology and Statistics


Monday, March 15, 2010

Converting a Microsoft Word Document into a LaTeX Document

This post discusses my experience converting a large MS Word document into a LaTeX document using Word-to-LaTeX. Along the way I encountered several challenges. I thought I'd document them in case it may be of interest to others.

Overview of Options
Having a good conversion method is important when transitioning existing Word documents to LaTeX and when colloborating with others who are not familiar with LaTeX. Wilfred Hennings provides a great page that summarises options for converting documents from PC Wordprocessors to LaTeX. The page list options and provides recommendations. Wilfred's Quick Comparison List is particularly useful.

My Experience with Word-To-LaTeX:
I started my conversion journey with Michal Kerbt's Word-to-LaTeX (Word-to-XML) Convertor. It's free software but Michal accepts donations. I used it in it's stand alone form. It provides many options for converting documents. While it generally worked well, the following discusses various challenges in the conversion process with a description of what I did in response.

Convert *.docx to *.doc format:
Problem: The convertor did not appear to work with Word 2007 files (*.docx).
Solution: Save As Word 97 (*.doc) format.

Set up a PostScript printer:
Problem: In order to convert graphics files a postscript printer needs to be setup.
Solution: (Warning: I've had some problems with the EPS files generated; I'm not sure if its related to this printer setup)
  1. Install a postcript printer: Adobe sets out one way to set up a virtual printer. I followed these instructions. Here is a direct link to the downloads. In short it involves installing a PostScript printer driver with a PPD specification. 
  2. Specify in Word-to-LaTeX: This printer is then specified in the Word-to-LaTeX configuration: Figures/Eq/Documents - Figures - PostScript printer.
Big complex documents:
Problem: Long and complex documents can take a while to run (e.g., 15 minutes for a 60,000 word document with many styles and tables on a 2007 laptop).
Solution: Hey. Who cares! Just let it run. It's quicker than trying to do it manually.

Security Alert Over Macros:
Problem: The program installs a macro in the Word Startup folder. My version of Word (Word 2007) disabled this by default.
Solution: It is possible to enable all macros. However, this is not particularly safe. I decided to delete the file from: "C:\Program Files\Microsoft Office\Office12\STARTUP" and just run the program through its stand-alone desktop interface.

Tidying Up
Problem: The *.tex document was not exactly what I wanted.
Solution: Several options presented themselves.
  1. Change input: I could alter the Word document. I could remove styles, remove unwanted fonts, and so on.
  2. Change process: Word-to-LaTeX presents many configuration options which I could play with.
  3. Post-process: I could apply various replacement operations on the *.tex created by Word-to-LaTeX.
The solution I adopted combined all three approaches. For example, I converted hidden text in the Word document to a particular style. This meant that it was enclosed in a command in *.tex that was easy to find and replace in post-processing.

Start Up Problem
Problem: When I ran Word-to-LaTeX, I obtained the following error:
Conversion started.
Fatal error: Call was rejected by callee.
   at Word.DocumentClass.Activate()
   at WordToLatex.WLConvertor.Convert()
   at WordToLatex.Bin.WLApplication.Main(String[] args)

Solution I closed Word-to-LaTeX. I closed Word. I then pressed control+alt+delete and ended any WINWORD processes that were running. I then restarted Word-to-LaTeX. As an additional point it was sometimes necessary to close Word-to-LaTeX

Conversion Problem
Problem: I obtained the following error.
Converting document fields.
Unknown error: Object reference not set to an instance of an object.
   at WordToLatex.WLProcessFields.FieldHyperlink(Field field)
   at WordToLatex.WLProcessFields.ProcessField(Field field)
   at WordToLatex.WLProcessFields.ProcessAllFields()
   at WordToLatex.WLConvertor.ConvertInner()
   at WordToLatex.WLConvertor.Convert()

Solution:
  • Divide and conquer: Dividing a long document into smaller parts to identify which parts could be processed was one strategy. If you do this, it may be good to put the files in separate folders, otherwise image files from one subdocument may be overridden by a latter subdocument. 
  • Paste into Fresh Document: Another trick that worked for me was to copy and paste the contents of the document into a new document. I'm not sure why this worked. Perhaps it worked because it removed a number of custom styles I had.
Problems importing EPS files
Problem: I let Word-to-LaTeX convert the images to EPS. I could view these images in an editor and they had indeed been converted. However, when I added them in LaTeX, only white space was shown.
Solution: For pictures derived from R I just created them again, this time using the postscript driver. 
Opening the image in Adobe Acrobat Professional and saving as EPS was one option for the other pictures.


10 comments:

  1. Hi, have you tried Word2Tex for OpenOffice macros?

    ReplyDelete
  2. UPDATE (10th Jan 2011): The website for Word-to-LateX has moved to: www.wordtolatex.com
    The software allows for a trial, but it is no longer free.

    ReplyDelete
  3. Just saw this post on options:
    http://www.charlietanksley.net/philtex/converting-to-latex/

    ReplyDelete
  4. what about a word document with mathematical equations ?

    ReplyDelete
  5. @Anonymous LaTeX is very well suited to writing mathematics. Once you learn how to write using LaTeX, writing mathematics can be like writing normal sentences. Because equations are all plain text, they are more secure. I've had the experience of converting between Word formats, and equations being converted to images, preventing subsequent editing. That said, Microsoft does have equation writing facilities. If they work for you, that's fine. I also imagine, there would be collaboration situations where Word might be required.

    ReplyDelete
  6. For LaTeX novices, and for people who need a fast turnaround of their document, I would recommend using a professional TeX typesetting service:
    http://www.gaussnewton.com/convert-word-to-latex-tex

    That's my 2 cents!

    ReplyDelete
    Replies
    1. @anonymous, are you a customer or do you work for the company that provides the service. It is good to know that you can pay someone to convert Word documents to LaTex documents, but your post looks like advertising. If it is, you should disclose your affiliation.

      Delete
  7. @Jeromy, I have used their service about 6 months ago - to convert my masters engineering thesis to LaTeX. I was kind of tired using trial software converters to do this (table and figures almost never work as you want it to!) and I did not know LaTeX well enough to convert my 150 odd page thesis. Your post is very informative, and I have now started to learn properly the LaTeX typesetting system!

    ReplyDelete
    Replies
    1. Thanks for the clarification. And I guess paying a fee to convert could be more time efficient, especially if you have research funds for such a task. I notice that there is a similar services such as http://www.grindeq.com/index.php?p=service

      When I first converted to LaTeX it was for my thesis, I used it as a learning experience, but it certainly took some time.

      Delete