Jeromy Anglim's Blog: Psychology and Statistics


Thursday, September 24, 2009

Recovering a Corrupted Excel 2007 File | XLSX and XLSM Format

An Excel 2007 file that I was working on recently became corrupted. The following are some things that I learnt in my process of recovering the file.

  • I tried various recovery options that did not work: Recovery Toolbox, clicking the open and repair option in Excel; searching for auto save files in User - Application Data - Microsoft - Excel; I tried changing the file extension to ".xls", ".xlsm", "xlsx". Basically I kept getting the error message "Excel cannot open the file ... because the file format or the file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file."
  • Excel 2007 files (e.g., ".xlsx", ".xlsm") are in an XML format. If you change the file extension (i.e., "xlsx", "xlsm") to ".zip", you can unzip the Excel file and look at its components. JKP provides a helpful overview of the file format. 
  • The "xl" folder in the extracted zip file seemed to contain most of what I needed, although it was not arranged in a particularly accessible form. In particular, "sharedStrings.xml" had most of the cell content but in a disorganised form. "workbook.xml" had the workbook names and "tables\...xml" had the column names for my tables.
  • If you drag and drop the "xml" files into Excel into the xl folder, you can open them as a table and have a look.
Thus, with a little fiddling I was able to recreate my Excel file.



My file was called "meta.xlsx"; this is what it looked like after converting to "meta.zip" and extracting the zip file.