Quick-and-Dirty Email Archiving

I was charged by my organization’s executive director with preserving the emails that he has been regularly sending to staff members to keep us up to date on everything that has been happening during the COVID-19 pandemic. Born digital records continue to be an area that I need to professionally develop (using some of that stimulus money to start working on my DAS looks better and better), so this was an interesting challenge.

First of all, this is not a workflow for systematic email archiving in a records retention/bulk fashion. All I was trying to do was get my executive director’s emails out of Office 365 and into a PDF. This is a document collection and not a real email archive.

When first trying to turn these into PDFs I just used the print dialogue in the web browser version of Outlook to print off the emails as PDFs. That worked, sort of. Only the text gets rendered, and all links have their hypertext stripped out. If possible I wanted those saved as well. The one nice thing is that it does print off the header information with the document.

I use my own laptop for the majority of my work and have been using the stock Windows email client for email. When a message is highlighted the “three dot” menu (inside the red box below) offers a “Save As” option.

Here you can save individual messages (no bulk option) as a *.eml file. The other major tool in this workflow was Microsoft Word, which is conveniently able to open *.eml files. Then each file becomes a document like any other you would open in Word. The only problem is that the *.eml file does not save the header information in any way that I was able to find, so I ended up copying/pasting information back into the document once it was open in Word to recreate the From:, To:, Subject: and date/time stamp. From there I used Word’s “Save as Adobe PDF” functionality to actually turn these into PDFs for long term preservation with functional hypertext links.

So that’s one kludgey way to go about this, but there are of course any number of options here. In the course of writing this blog post I started playing around in the real Outlook Windows client as well which again provides inconsistent results but can get us everything we are looking for. The “Print to PDF” function works well as it again nicely maintains the header information, but again flattens out the URLs and breaks them.

I’ve always avoided the “Save to Adobe PDF” function in Outlook as it saves it in a portfolio format, annoying if you simply want to save one document. However this function actually turns out to be the key. Highlight all the files you want to save, enter the “File” menu, and click that “Save to Adobe PDF” button to get a portfolio of all the documents.

Next you have to work in Adobe Acrobat to split the portfolio into a bunch of individual documents. Once you open the portfolio clock on the “Open Document” option with a specific email selected. This will break that particular item out into its own tab in Acrobat.

Once you have that document open in it’s own tab, you can open the “File” menu and select “Save As” to save the individual email as its own document. This option provides the best of all possible worlds, in that you get intact HTML markup with links that can be clicked as well as a computer formatted header block instead of trying to create one manually. Hooray for learning!