I once paid a lawyer to draft a legal agreement for my business. The document was fabulous and given to me as a Microsoft® Word file so that I could easily add additional information each time I used the document. When I received the file I noticed that the document’s metadata (hidden properties) held the name of another law firm, not the law firm that this document had been sent from. Not a good look at all.
How did this happen? Simple. My document was a copy of a document from another law firm. When the copy was made, possibly using File, Save As, the document’s metadata was carried over into my file. The lawyer may not have been aware that Word files store metadata, information that can reveal details of the author and organisation from which it originated. Even though my lawyer had indeed drafted the document, the original template of the file had been created at another law firm. So even though it was his work, it looked as though it had originated elsewhere.Metadata isn’t a word you hear used that often in normal business conversation. In fact you may be asking “what is it” and “how do I find it”?
What is metadata?
The definition, as given by BusinessDictionary.com is…
“Data that serves to provide context or additional information about other data. For example, information about the title, subject, author, typeface, enhancements and size of that data file of a document…”
To put this simply, metadata are the ‘file properties’ for the document.
Sounds harmless enough. Don’t be fooled. Metadata is hidden from sight. Unless you know where to locate and edit or remove it you could possibly be endangering the privacy of your clients, team members or even be accused of plagiarism.
Therefore if you are going to share an electronic copy of a document with clients or another organisation it is a good idea to review and, if required, remove this information prior to sharing.
Avoiding risk – how to view the metadata of a document
Metadata is stored in most files. Essentially what you need to be looking for is the ‘properties’ information for your file.
In an Adobe PDF file you would find it under File, Properties (yes…metadata copies into a PDF as well). In most files created using Microsoft® applications the metadata can easily be found by navigating to the File tab, Info. The document’s properties will be displayed in the Properties pane on the right of the window.
Here’s a screen of the properties of the document I used to create this blog.
By default you won’t see all of the properties for the file, just a shortened list. Many of the properties are still hidden from view. To see a fuller list of properties in the pane click Show All Properties at the bottom of the pane.
In my opinion the best place to check the document properties is by clicking the drop-down arrow next to the Properties list at the top of the pane and then selecting Advanced Properties.
This will then open a dialog box where you can navigate through a number of tabs to view an even fuller list of properties.
The screen below is a good example of where Save As has been used to create a new document based off of an existing one.
The screen shows the properties for our ‘Office 365 User Guide’. The guide was created by my colleague Susan using a copy of an Excel manual that I had created. Susan used a copy of the manual so that she could easily reuse all of the styles, headers, footers, fonts etc. Therefore she only needed to update the content. This scenario is very much like what my lawyer had done.
What you will notice is that  the name of the file and the Title of the document differ. The original document Title is still held in the properties.  When the original file was created I was using software licensed to one of my former businesses, this has been captured and saved in the document and subsequently copied into this document through the Save As process.  The author details are still showing me as being the creator, when in fact I only created the original document, the content of which has now been stripped out and replaced with Susan’s content. This document wasn’t being shared externally so the incorrect metadata wasn’t a problem. However, if we had been sharing it externally this data would have been manually updated to reflect the correct information for the document.
Editing or removing document properties
Many of the file’s properties can easily be edited or removed simply by editing or deleting the content from a property field. This can be done in the Properties pane or the Advanced Properties dialog box.
However, you may like to check out the Inspect Document feature. This feature finds and reveals any hidden data within your file that could be potentially sensitive. Once it has performed the check you can then opt to remove the data.
Please note: You might like to create a copy of your file prior to using the Inspector as it isn’t possible to restore the data once the Document Inspector removes it. To remind yourself which version no longer contains sensitive data you might like to add a tag or even detail this in the name of the document, e.g. “filename” – Inspected.docx.
Using Inspect Document
- Open the file you want to inspect and then from the File tab click the Info tab.
- Click Check for Issues and then select Inspect Document.
- The Document Inspector dialog box will be displayed. Select the check boxes for the content you would like inspected. Note: please refer to the ‘Document Inspector options’ at the end of this post for a full list of what the Inspector finds and removes.
- Click Inspect. The results of the inspection will be displayed. Click Remove All next to the data you want removed from the document.
- The unwanted data will be removed from your document.
The screen below shows the properties for Susan’s ‘Office 365 User Guide’ once I have used the Document Inspector to remove the document properties. At this point I could now click into the Title, Company, Manager and Author fields and update them.
Document Inspector options
Comments, Revisions, Versions and Annotations
Removes Comments, Revision marks from tracked changes,
Document version information and Ink annotations.
Document Properties and Personal Information
Document properties, including information from the
Summary, Statistics, and Custom tabs of the Document Properties dialog box,
Content type info, the User (author) name and Template name.
Task Pane Apps
If your organisation uses customised Task Pane Apps the
Inspector will locate and remove them from your document. If Inspector finds
a Task Pane App in your document and you are unsure of what it is you should
speak with your IT dept.
Identifies if a document has been embedded into your
Macros, Forms and Active X controls
Identifies where macros, forms and Active X controls
will travel with the document.
Identifies where text is hidden by collapsed Headings.
Custom XML Data
If your organisation utilises customised XML data this could
hold information that will travel with the document. If Inspector finds XML
in your document and you are unsure of what it is you should speak with your
Headers, Footers and Watermarks
Information in headers and footers and any Watermarks.
In most cases we would not need to remove these.
Objects that are not visible because they have been
formatted as invisible, e.g. via the Selection Pane.
Removes any text that has been formatted as Hidden Text
via the Font dialog box. This doesn’t include any text that has been hidden
by other methods, e.g. white font or behind a picture.