CONTENTdm: Uploading via Tab-Delimited Text File

This post will provide an overview of how to upload resources using the CONTENTdm (CDM) Project Client using the tab-delimited text file (TDTF) to avoid during this process. This whole blog post is written assuming a Windows environment, sorry Mac users. Also I apologize that the section discussing errors doesn’t provide examples of the specific error messages, but addressing these most commons problems should get you to a solution hopefully.

Uploading Using the TDTF

Importing using the TSTF can be utilized when selecting either the option to upload multiple simple digital objects:

Adding multiple items in CDM

Of while uploading complex objects after selecting “Object List” from the list of methods to add resources:

Adding Complex Objects in CDM
Selecting the TDTF for Complex Object Upload

Generating a Tab-Delimited Text File

Considering the TDTF is a pretty simple data encoding option in a simple .txt file, it can be tricky to create one correctly. Internally, all metadata is generated in an Excel Workbook (.xlsx) file. The headings for each column correspond to the metadata application profile created in CONTENTdm ahead of time. While it is possible to simply use the “Save As” option inside Excel itself to save as a TDTF, I highly recommend avoiding this route. Through some quirk of programming, if any of your metadata fields contain quotation marks, for instance a description such as:

This portrait depicts John "The Barbarian Librarian" Dewees while sitting at the reference desk of the Local History and Genealogy department of the Toledo Lucas County Public Library on July 1, 2020.

The results TDTF will instead save an extra set of quotation marks that will ultimately be uploaded to CONTENTdm and visible to the public:

This portrait depicts John ""The Barbarian Librarian"" Dewees while sitting at the reference desk of the Local History and Genealogy department of the Toledo Lucas County Public Library on July 1, 2020.

I’m assuming anyone reading this takes pride in high-quality and clean metadata, and the resulting public display will drive you to distraction as much as it has myself. Once the double-quotes are in the public display, it’s very difficult to rectify as well, as the Search-and-Replace tool in the web administrator version of CDM isn’t effective at removing these; it will be a manual process to get rid of them. Learn from my mistakes.

All that being said, there is luckily a very simple and straightforward method for correctly generating a TDTF from an Excel Workbook. Simply highlight all of the cells needed, including the header row with your metadata field names, copy them, and paste them into a text editor. For instance:

Highlight and then copy / CTRL+C in Excel…
…and then Paste / CTRL+V in a Text Editor

Displayed above is Notepad++ which is my choice of text editor generally as it’s robust enough to be useful and provide extra utility over something like the Windows standard Notepad, but not so complicated that it gets overly intimidating.

Finally, the file needs to be saved in order to be imported into CDM, and this introduces an extra bit of weirdness. Ultimately we will be importing our metadata as a plain text file, but even if you choose that option during the Save dialogue in Notepad++ (the first option in the file format dropdown selector) the file will not import properly into CDM. Instead save the file without even bothering to select a format in Notepad++ (“All Types”) and instead just manually add “.txt” to the end of the filename wherever you’ve saved the file:

Manually adding .txt to the filename

Once completed, this TDTF will be ready to import into the CDM Project Client.

Errors to watch Out For

The CDM Project Client is nothing if not finicky and then are plenty of problems to look out for when uploading using TDTF. I’ll address these in order of complexity starting with the easiest to fix:

Empty Lines in TDTF
The first one can be found in a prior screenshot:

Empty lines in a TDTF will always throw up an error

Typically when copying over from Excel, a blank line will be added to the bottom of the TDTF, always make sure to delete this as the CDM Project Client will invariably throw up an error and reject your input of new records.

Incorrect File Names
Some of my most frequent issues simply arise out of filenames inaccuracies. Look for this in particular if some of your records get rejected on import but not all of them, there is a good chance that for whatever reason the filenames aren’t matching up the way that CDM expects. This tends to be most common for me when importing multiple simple digital objects, as opposed to complex objects.

Another general purpose but related file naming point is ensuring that the last column of your metadata either has the file or directory names for what it is your are importing into the Project Client. This is covered in most of the OCLC CONTENTdm training but it bears repeating; you won’t get very far in your work if the file names are buried in a center column of your spreadsheet.

Line Breaks and Extra Tabs In Metadata Fields
Another common error can be extra tabs and line breaks hiding inside of metadata fields. For instance if you head to a public collection in one of your CDM collections, copy a field, and then paste it into Excel, it may look perfectly normal but actually be hiding a line break that needs to be removed. Ie. when copying from websites they may be bringing their own text formatting along for the ride.

This may look like a normal field and identical to the rest…
…but when looking at the field we see there are two extra line breaks that shouldn’t be there.

This issue in particular can be a very opaque one, as looking at the spreadsheet potentially won’t display any problems and staring at the wall of text in the TDTF can be very hard to parse. It may take going through cell by cell to identify the culprit.

Image File Size
This is more of a corner case, but an issue that has cropped up when attempting to add really large and high resolution digital objects such as maps and architectural drawings. The Project Client throws up an error message when it runs out of memory, meaning specifically that the asset files are too large. Troubleshooting this is a weird experience too as the CDM Project Client recommends shutting down other programs to free up RAM, but that very rarely fixes the issue for me. I don’t even have a specific file size I would recommend, though keeping individual files at 10 MB in size or less will yield better results. Also note that I say individual files; a complex object made up of 9 MB files should still work ok.

To address this I’ve created actions in Photoshop to decrease image size to a certain percentage. So occasionally what I’ve done is reduce a large file to 90% total size, try and upload, see if it works and if it still shows an Out of Memory error, reduce the new image to 90% total size again, try and upload, etc… until the file is small enough to get online.

Optical Character Recognition
This one is honestly a pretty devious one. If you are importing large amounts of text in need of OCR and see an error along the lines of “The Project Client encountered an unexpected error during OCR” the first thing I would recommend checking is your OCR license limit in the CONTENTdm Project Client:

The OCR option in the CDM menus
If this reads “0 of 10000 pages remaining this month.” you’ve found your problem

Why the CDM Project Client doesn’t just tell you that your license for the month has run out, I have absolutely no idea, but instead it throws up an error that suggested (to me at least) that there was something funky going on with the image itself, not with my organizational OCR license. This for instance can be caused by uploading Large-Size/Low-PPI images, which drain your OCR license incredibly fast. Again, learn from my mistakes.

Hope any of this is at all useful to folks out there.