LXML is an awesome python tool for reading and editing xml files, and we’ve been using it extensively during the grant period to do programmatic cleanup to our legacy EAD files. To give an example of just how powerful the library is, late last week we ran a script to make tens of thousands of edits to all of our ~2800 EAD files, and it took all of 2 minutes to complete. This would have been an impossible task to complete manually, but lxml made it easy.
We want to share the love, so in this post we’ll be walking through how we use the tool to make basic xml edits, with some exploration of the pitfalls and caveats we’ve encountered along the way.
http://archival-integration.blogspot.com/2015/10/tools-for-programming-archivist-ead.html