XML Parsing Errors During FTP Upload to the Internet Archive

One of the most common stumbling blocks I’ve come across when uploading ebooks to the Internet Archive via FTP has been the following XML parsing error:

Two commons problems may result in seeing this error upon submitting your GET request to have the item ingested and derived for access in the Internet Archive.

Problem One: Ampersands. I am not an expert in XML but from what I’ve been able to read, the ampersand (“&”) is a special character reserved for specific uses in XML and not able to be parsed as plain-text. If you have an ampersand in your metadata, simply substitute the special character for the word “and”; problem solved. I run into this problem most frequently when uploading early 20th century books that use an ampersand in the transcribed <publisher_original> field.

Problem Two: Incorrect XML tags. I had planned ahead on writing up this quick explainer because this is a problem that I’ve run into a few times, however in the process of creating the metadata for this upload I actually ran into a metadata problem that actually needed fixing unintentionally. I managed to insert a “1” in the third opening <subject> tag without even realizing it. It took my intern and I about 20 minutes to catch this problem as I was using this as a sample to show her how to upload materials to the Internet Archive via FTP, and she was the one who caught it (Thanks Noelle!). Double check your tags to ensure that spelling is correct and no additional characters have been introduced.

If you’ve done the above, and avoided any other problems, then you should see the following and can rest easy knowing you’ve successfully contributed one more item to the cultural record.