Author Archives: Chris Kemp

“Draw Back the Curtain” Now Online

DTP began working with UR Hillel and Richmond’s Jewish Family Services a few years ago to digitize materials and build a project website in support of “Draw Back the Curtain,” a student-directed documentary.  At the request of JFS, we recently linked the award-winning film to the site and it is now openly available and free to watch.

draw-back-image

From the project’s about page:

After more than 70 years of closed borders, the former Soviet Union allowed more than one million Jews to immigrate to America and Israel in the late 1980s. American activism under Operation Exodus had a large part in this change in policy and the Richmond Jewish community in the resettlement of 800 refugees.

Twenty-five years later, Jewish Family Services and the University of Richmond Hillel are creating a permanent collection of the experiences and memories of the families who immigrated and the community volunteers who welcomed them.

“Draw Back the Curtain,” a feature-length documentary film, is the culmination of three years of student driven research and interviewing of immigrants and resettlement volunteers. The larger project includes multiple museum exhibitions, and an upcoming digital archive.

This project is a great example of the library working with groups on and off campus to create something meaningful to the Richmond community and beyond.  Take some time to browse around the site, check out the exhibits, and watch the film.

Digital Toolbox: Omeka

You know that go-to tool in your toolbox that you just can’t go without? The one you always seem to use no matter what job you start? The one you preach to your friends about? The one you seem to use in unorthodox ways? The tool which, if absent, dooms a project to failure? (That’s a stretch – there’s always a way!)

In the digital collections/humanities/content world at Boatwright, Omeka has become that tool. It’s an open source web publishing platform that is to creating content-rich cultural heritage online experiences as WordPress is to building blogs. Built using open and widely-adopted frameworks, Omeka’s flexibility, large user and developer communities, and host of add-ons make it a low-barrier joy to work with. We in the library have used it for student- and faculty-driven projects, external partnerships, and for building our own thematic sites.

Our work with Omeka started some years ago with a small exhibit focused on the history of football at the University of Richmond. The launch of UR Football Comes Home was synced with the opening of the new Robins Stadium on campus.

UR Football Comes Home

UR Football Comes Home

BML’s For the Centuries site, released as a celebration of UR’s first century at its suburban campus, is one of our larger projects and made use of several plugins for the first time (for us at least), namely Neatline and Exhibit Builder.

For the Centuries

For the Centuries

The Fight for Knowledge captures content produced through an ongoing series of undergraduate courses taught by Dr. Laura Browder and Dr. Patricia Herrera. The site incorporates student multimedia projects alongside archival content.

The Fight For Knowledge

The Fight For Knowledge

The Historian’s Workshop, another ongoing project, is a collaborative faculty/student/staff project which focuses on the Congressional Papers of Watkins Moorman Abbitt, which is housed in Boatwright Library’s Special Collections. Read more about our work with Dr. Nicole Sackley and the course which launched the site on our blog here.

The Historian's Workshop

The Historian’s Workshop

Discovery, Technology and Publishing supports several Omeka sites while not maintaining responsibility for the content. Dr. Jeannine Keefer’s Urban Campus site, which features Neatline exhibits, is among these.

Urban Campus

Urban Campus

Draw Back the Curtain, a collaborative project with Richmond’s Jewish Family Services, features images digitized by Discovery, Technology and Publishing.

Draw Back the Curtain

Draw Back the Curtain

A Pilgrims Progress is the first complete catalog of Windsor McCay’s early 20th century comic of the same name. It’s content is maintained by Kirsten McKinney (GC ’15) while DTP maintains the site. Read more about McKinney and her work on our blog here.

A Pilgrim's Progress

A Pilgrim’s Progress

We’re building some skills and experience in working under Omeka’s hood, too. Focusing primarily on making theme-based customizations, we’ve identified new areas to build skills (primarily PHP coding, but also revision control – a most useful way to keep track of code changes – and to recover from the inevitable failures). Our team has also streamlined the process of launching a new site on Amazon Web Services, and is investigating the ability to bring up a site in a fully-automated fashion.

In the end, though, Omeka is just a tool, even if it is extremely flexible and easy to use. You need to have the skills, vision and resilience – not to mention the content – to make it suit your needs. Our team here has those traits, and we’ll be releasing more Omeka-based projects in the near future – keep watch on the library’s website and this blog for announcements!

Note: several staff members from Boatwright will be presenting on their Omeka projects at this week’s Virginia Chapter of the Association of College & Research Libraries (VLACRL) spring meeting. Titled Omeka and More: Web Publishing, Digital Collections, and Online Exhibits, Jeannine Keefer will present on her Urban Campus site, and Crista LaPrade and Angie White will discuss our recent For the Centuries project. Many thanks to all three for representing UR and BML at the meeting!

DTP in Classroom Collaborations

We (Leigh McDonald and Chris Kemp) were given the opportunity to be involved in an undergraduate class last semester: The Historian’s Workshop, taught by Dr. Nicole Sackley. The course immersed students in the worlds of archives, digital libraries, museums and public history. The students were each placed in the roles of researcher and expert while working with one of Boatwright’s largely unprocessed archival collections, the Congressional Papers of Watkins Moorman Abbitt. Each of the eleven students was assigned a box of archival materials from the collection to work with, and Lynda Kachurek (Head of Rare Books and Special Collections) instructed them on archival processing methods. The students read and examined all of the documents in “their box,” and selected representative materials to describe and display in an online exhibit.

That’s where Discovery, Technology and Publishing came in. We digitized materials and launched an Omeka site to present them. We also presented three metadata workshops to the students, which focused on how to examine a document’s contents, effectively describe it, and upload it into Omeka. Since the students were the experts on their particular materials, there would be no one better equipped to provide in-depth descriptions of each item. Leigh and Chris randomly selected a document already digitized from those chosen by the students, worked through the Dublin Core metadata fields as examples, prepped our materials and headed into the workshops feeling prepared for anything. That randomly selected document turned out to be a much better lesson for the students and for us than we had imagined.

rats

Letter from W. E. Skelton to W. M. Abbitt

The document above is the one we chose. It seems pretty simple on the surface – a piece of correspondence between a constituent and his congressman regarding the work of the Agricultural Extension Service agents in his district – and we suggested describing it accordingly. An attached report described a rat control campaign in the Hampton Roads area and included statistics on the rat population in the U.S. Therefore, the first Library of Congress Subject Heading we suggested was, of course, “Rats”.

During the workshop, however, Professor Sackley asked the student why he chose this particular document and his response was enlightening. Because he had gained some perspective on Congressman Abbitt and his tenure from studying the documents in his box, he read the documents as a rather elegant but subtle plea for the need for African-American workers in his division, not just an informational letter about pest control. The line, “We are extremely limited in staff and cannot be ‘all things to all people,’” was the hint. Based on this, it was then possible to complete the metadata description more accurately by adding terms such as: African American agricultural extension workers and Virginia Polytechnic Institute’s Agricultural Extension Service. We would not necessarily have picked up on the full significance of the document without the student’s input, which illustrates the analytical skills the students brought to the table when selecting documents from the collection. We are sure that similar conversations could have occurred regarding many other documents in the collection.

Given the subject matter of this archival collection and the course readings, the students expected to find much on the topic of massive resistance to school integration in Virginia, but they discovered so much more. Students uncovered numerous interesting documents, including an exchange between Abbitt and then-Texas senator Lyndon Johnston, a letter from a high school senior regarding the statehood of Hawaii, and a pamphlet listing the names of supposed communists in Hollywood, California. All of these findings brought to life the people and the historical period, and gave the students a perspective on the times that would have been absent without access to the primary sources.

The work the students did last fall was impressive on many levels, but it only scratched the surface of what the Abbitt Papers contain. Abbitt was a congressman from 1948-1973, and his archival collection is made up of 285 boxes of material. Fortunately, Professor Sackley is going to teach the course once more in the fall of 2015. We in DTP are looking forward to getting back into the classroom with students once again, and learning right along with them.

See the fruits of the students’ labor at the course’s dedicated Omeka site, which is publicly available but still “in the workshop”: http://historiansworkshop.richmond.edu

Hear directly from the students at the Historian’s Workshop blog: http://blog.richmond.edu/historiansworkshop/

And read a feature news story about the course on the UR Website: http://news.richmond.edu/features/kp4/article/-/12356/the-historians-workshop-students-learn-about-archiving-and-digital-collections-in-hands-on-history-course.html

Written by Chris Kemp and Leigh McDonald

A New Exhibit: The 1914 Campus in 3D

Taking a look at the items and exhibits included in our centennial project, For the Centuries, visitors will discover that we uncovered and aggregated a wide range of materials for the site. While many of the digital objects on the site tell stories or have special significance all by themselves, other objects and data needed a bit of interpretation. Take graduate hometown data, for example: a spreadsheet of dates and places doesn’t say much, but if the locations are mapped and interactive as they are in our hometowns exhibit, patterns of student geographic distribution can easily be seen over time. This post is about another example of such interpretation – the conversion of a number of physical items into digital files, and the creation of something new.

The good folks at the Virginia Baptist Historical Society pointed us toward an undated topographic survey map of the campus area. Based on the building footprints present on the map, we believe that it dates to 1911, the year following Ralph Cram’s initial General Plan.

A portion of the campus area topographic map at the VBHS.

A portion of the campus area topographic map at the VBHS. While many of the footprints here represent buildings that were not constructed, North Court can be picked out on the left, and Ryland Hall is at the bottom center.

We quickly realized that this single item provided the foundation for something impressive, and that when combined with data from other materials we’d gathered from University Facilities and elsewhere, we’d be able to use it to generate a three-dimensional model of the 1914 campus, complete with the initial buildings. Three departments in Information Services, Discovery, Technology and Publishing (DTP) in Boatwright Memorial Library, the Center for Teaching, Learning, and Technology (CTLT), and the Digital Scholarship Lab (DSL), had the expertise and ability to work together to pull this off.

Production of the model involved a variety of techniques and technologies. The topographic map, blueprints and photographs were imaged by DTP staff using a Phase One P65+ digital back. Students and staff in DTP and DSL then worked together to digitize the map’s topographic lines and render an elevation map file using ArcGIS. The blueprints and photographs provided the information needed to create three-dimensional models of the campus’ buildings using Sketchup. (Be sure to check out this post, by Justin Madron of the DSL, about the techniques used to accomplish this.) In the CTLT, the elevation map and building models were merged into a single 3D object using Sketchup Pro and printed on a 3D Systems ProJet 460Plus printer.

Several student employees contributed in important ways to this project. Stefan St. John (DSL) georectified the maps used for this project. Jackie Palmer (DTP) digitized the survey map’s topographic lines and campus features. Jackie and Lily Calaycay (DSL) worked together to model the campus buildings from data embedded in source documents. Selmira Avdic, Francisco Cuevas, Lisa Hozey and Umurcan Solak (CTLT) assisted with the 3D printing and tile finishing process.

The completed model, now on display on the second floor of Boatwright Library, depicts the campus as it was on opening day in 1914, and serves to demonstrate the relative scale of the buildings and topography of the grounds. Reproductions of contemporary photographs of each building are distributed around the model. Come by Boatwright to see the results of our collaboration.

The completed model is displayed on the second floor of Boatwright Memorial Library.

The completed model is displayed on the second floor of Boatwright Memorial Library.

Also visit the library’s centennial celebration site, For the Centuries, at http://centuries.richmond.edu.

Photos by Angie White and Nate Ayers.

How you, dear reader, can help correct bad OCR

We have a problem that only human eyes can solve. Yours can help.

Here’s some background. In Discovery, Technology and Publishing, we use optical character recognition (OCR) software to extract text from document images in order to make them machine-readable and searchable. In simple terms, the OCR process works through a bit of binary “yes/no” logic – either something exists in a given place, or nothing does. No matter what kind of image you put into the software (color, grayscale, whatever), the application creates a temporary black and white version. That is the version to which the “yes/no” operation is applied – the resulting pixel patterns in the image are compared to “known character” patterns. Different software packages use different logic, but in the end all those “known characters” get put together and output to a text file – or something similar.

A black and white rendering of text from a Tokyo War Crimes Trial document. Your eyes can tell what most of these words are, but trust me - a machine is going to have a rough time.

A black and white rendering of text from a Tokyo War Crimes Trial document. Your eyes can tell what most of these words are, but trust me – a machine is going to have a rough time.

In the past we’ve done a variety of things with these files – from loading the pure text content into searchable database fields (as in a previous implementation of our America at War collection), to embedding the text within image files (the Student Research portions of the UR Scholarship Repository), and applying extensive XML markup to historical documents, enabling customized searching and manipulation of information (see our site focused on the published Proceedings of the Virginia Secession Convention). For folks who are dedicated to going paperless, there are plenty of OCR applications available for mobile devices, too.

OCR is a great tool, but the technology has limitations. Depending on the printing process that created an original document, a capital S might look a bit like the numeral 5 as a result of artifacts on the paper, a smudge of ink, or damaged type. The type of original materials we’re working with makes a difference, too: the high-resolution camera we use to digitize rare materials at Boatwright results in fantastic images, but the best camera on the planet can’t change the fact that microfilm is, well, microfilm. It’s a great format for preserving content, but a lousy medium from which to digitize. Occasionally, microfilm is all we have to work from.

Exposure problems during the microfilming process have a lasting impact on the usability of the images. Much of the text, particularly in the underexposed document to the right, is unreadable to an OCR application.

Exposure problems during the microfilming process have a lasting impact on the usability of the images. Much of the text, particularly in the underexposed document to the right, is unreadable to an OCR application.

Take our Collegian collection, for example. As part of UR’s 175th anniversary about 10 years ago, the full-run of the student newspaper, the Collegian, was digitized. Most of these issues existed only on seldom-used reels of microfilm rather than paper, and, as a result of the age of the papers when they were initially microfilmed, many of the resulting images were not ideal for OCR purposes. The software knew that there where characters in the images provided, and oftentimes the resulting text was way off base. If you’ve ever tried to identify long-passed family members in old, faded photographs, you have an understanding of what the OCR software is going through: you know that the person you’re looking for is there – recognizing them among the crowd is the issue. Take that one step further by attempting to identify every individual, and you’ll have an idea of the computational difficulty that the OCR process can sometimes face.

1919-5th-Marine-regiment

The 5th Marine Regiment in front of the US Capitol in 1919: Great-great-grandpa – where are you?

Fast-forward to 2014, and our Collegian collection is still online – in fact, among our digital collections, the Collegian regularly receives the highest volume of traffic. The difficulty with OCR remains, though we’ve recently incorporated a mechanism which allows users to correct the text output of the OCR process. The changes made to the underlying text files are reindexed and searchable immediately upon saving – talk about instant gratification.

So if you’re someone who is interested in the history of the University of Richmond from the students’ perspective, I invite you to contribute a little bit of time to enhance this collection. Simply click the image below, then the “Register” link at the top of the collection home page to get started.

Screen shot 2014-09-29 at 3.01.45 PM

April Update

Stats for April 2014

  • Materials cataloged/awaiting cataloging: 4,750/381
  • Catalog records revised: 4,403
  • Page images digitized: 2,610
  • Still images digitized: 576
  • Library catalog visitors/page views: 14,443/64,971
  • Library catalog searches: 26,082
  • Digital collection visitors/page views: 3,171/10,020

Project Snapshot

UR Scholarship: 77 more theses and undergraduate honors papers were deposited into the Scholarship Repository in April. In our ongoing retrospective digitization effort, 20 more papers (894 pages) were imaged and prepped for uploading. This work will continue during the summer.

Centennial Exhibit: In big news, we have determined a title for our project. It comes from an article Dr. Boatwright wrote for the Religious Herald in 1910 shortly after the Richmond College Board of Trustees approved the purchase of land at Westhampton for the new site. (See Angie’s great post for more information about the Religious Herald.) After an eloquent description of the varied landscape and features of the area, Dr. Boatwright writes: “Amid such surroundings we plan to build for the centuries. May our twin Colleges soon crown the western heights above the river and the lake!” As tribute to Dr. Boatwright’s vision and leadership in bringing the colleges to our present location, we have titled the site “For the Centuries.” Work continues on the project as a whole.

Draw Back the Curtain: Our department has been supporting this ongoing documentary project for some time, and in April we digitized an additional 256 items, which brings the imaging portion of our work to a conclusion. In the meantime we’ve installed an Omeka instance for the project team to work with during the summer – they will be using it in parallel with the museum exhibits and documentary, bringing the stories of Jewish immigrants to Richmond from the former Soviet Union online.

March Update

Stats for March 2014

  • Materials cataloged/awaiting cataloging: 2,855/461
  • Catalog records revised: 3,858
  • Page images digitized: 2,344
  • Still images digitized: 537
  • Library catalog visitors/page views: 11,544/58,273
  • Library catalog searches: 23,967
  • Digital collection visitors/page views: 3,263/9,114

Project Snapshot

Centennial Exhibit: March saw the passing of a couple large milestones for the project – namely the completion of the Westhampton College flipbook and the aggregation of the last bit of hometown data for the last 100 years worth of graduates. We’ll start work on mapping all of that geographic data in April. We had a couple of meetings with the Center for Teaching, Learning and Technology and the Digital Scholarship Lab to explore some opportunities for collaboration, and the DSL has been georeferencing several of our historic maps for use in the exhibit. The interface group has been working away on adjusting the design for the site – Andy Morton has brought some really good ideas and work to the project.

UR Scholarship: In March, 91 more theses and undergraduate papers were uploaded into the UR Scholarship Repository, and 29 more – 838 pages worth – were digitized and prepared for uploading. During the month, 115 honors papers were downloaded 759 times, and 222 master’s theses were downloaded 2,234 times.

Richmond Dispatch: We’ve been working for some time on a complete overhaul of our Daily Dispatch collection, our first big project that was released back in 2007. The Dispatch collection contains the complete run of Richmond’s “newspaper of record” from November 1860 through December 1865. At nearly 24 million words, completely encoded in XML, it is a deep collection that has contributed widely, to projects ranging from scholarly research to family history and genealogy. While it’s a great project it could use an extreme makeover, and it’s in the process of getting one. We’ve completely rebuilt the guts of the system using standardized TEI data and an XML database, have implemented a new page viewing mechanism for high-resolution images, and have worked with University Communications to plan an updated, more functional user interface. While the programming work needed to finish this project off is sidelined due to other priorities, we will be picking it up again very soon.

February Update

Not-so-random stats for February 2014

  • Materials cataloged/awaiting cataloging: 6,213/762
  • Catalog records revised: 3,135
  • Page images digitized: 4,682
  • Still images digitized: 888
  • Library catalog visitors/page views: 12,387/61,273
  • Library catalog searches: 25,794
  • Digital collection visitors/page views: 3,101/8,830

Project snapshot

Centennial Exhibit: The first pass metadata has been completed for most materials, although more materials were uncovered during a “last call” at the Virginia Baptist Historical Society. Nearly the whole department has been pitching in to the metadata push, and everyone is doing great work. During the visit to the Virginia Historical Society, Crista LaPrade and Angie White discovered some more materials, including several portraits of the first Westhampton College graduates – we’ve licensed their use for this project. Work on the Westhampton Class of 1915 flipbook is proceeding on schedule, and the final sources for student hometowns were tracked down. (See Angie’s recent post about the Westhampton College scrapbook for more information on that item.) In March, we’ll put the finishing touches on our test system, which will include the flipbook and hometown map interactive features, all of the digital objects, and a customized Omeka theme.

UR Scholarship: Following a productive January during which 61 theses were loaded into the repository and 24 more theses were digitized, February resulted in 62 uploads and 41 additional digitized works. Crista has been working with our great student employees to get this work accomplished, and more will be done in March – perhaps despite spring break. The Master’s Theses Collection continues to be the second most used collection in the repository (right behind Law Faculty Publications), with 1,663 document downloads in January.

Tokyo Trial: Sixty documents were uploaded to the Legal Tools Database during February – these included the various Judgments of the International Military Tribunal for the Far East and Proceedings in Chambers. Some technical difficulties prevented us from uploading the Summations and court papers, but we’ll continue that work in March. We also got back into working on the TEI Annotator, an application we’ve been developing with the assistance of some fantastic contractors. The TEI Annotator will play a substantial role in our development of a set of semantic services that will enrich and allow annotation/enhancement of text documents encoded according to the Text Encoding Initiative‘s XML schema. More on this project will be coming soon.

Collegian: As our first foray into the wonderful world of crowdsourcing, we’ve worked with our vendor to incorporate a feature called User Text Correction into the Collegian collection. This enables registered users to correct some of the really poor OCR that often results from microfilm imaging. (Our collection includes microfilm images and OCR in all issues between November, 1914 and April, 2006.) User-corrected text is reindexed immediately, making all of the corrections discoverable immediately. Instant gratification. All you need to do is go to the Collegian site, register via the link in the top right hand corner, access an article and start correcting away. Please give it a try and let us know what you think.

There is much more going on than is recorded here – this post indicates progress on a very small slice of our responsibilities. We have ongoing work involving faculty and student projects, digitization for external partnerships, projects for other library departments, as well as the tasks and maintenance that comprise our everyday work. The variety of things being done is a bit astounding to me at times, and I’m glad that we’ve assembled a capable, collaborative team that helps our organization meet its goals.

International Collaborations, and a Visit to the UN Archives

On Monday the 20th, I went to New York to visit the Archives and Records Management Section of the United Nations. I’ll write why in a moment, but first let me try to express how surprising this experience was. After meeting with several project partners at a nearby hotel restaurant to discuss and lay plans for our upcoming work, we walked a few chilly blocks to an utterly unexceptional door. We were buzzed through and confronted by a small sign, equally unremarkable and easily overlooked from outside.

unarm-entry

This might not seem so surprising, but after having worked in a library for years, living and breathing the importance of providing information to users, I suppose I was expecting a slightly more grand or inspiring entrance…

But it was here that I and the rest of the project team, surrounded by the historic documents of the United Nations, met with the chief of the Archives Unit, Paola Casini, to discuss what I believe may be our most important contribution to both scholarship and the international community: the digitization of the United Nations War Crimes Commission documents.

For the last few years the Boatwright Memorial Library has been collaborating with the Muse Law Library to digitize the papers of the International Military Tribunal for the Far East – better known as the Tokyo War Crimes Trial. In its Special Collections, the Law Library holds a nearly complete collection of the papers of the tribunal, which were a gift from the family of David Nelson Sutton, a 1915 Richmond College graduate who served as Associate Counsel to the Prosecution during the trial. Sutton’s duties for the prosecution included questioning witnesses and presenting evidence related to the charges associated with the “Rape of Nanking” and Imperial Japan’s illegal narcotics trade.

Library staff and University of Richmond students have been scanning, extracting text, and using an XML format called TEI to encode data and description within the documents themselves. Our goal is to produce an openly accessible and fully searchable archive of the court-produced documents that leverages the strengths of XML-based documents for the purposes of presentation and manipulation. For example, specific XML tags are required within the files to normalize personal and organizational names throughout the collection, to link entire documents or portions of documents to others, and to georeference place names. Proper XML tagging, combined with the use of predefined thesauri, will allow faceted searching and potentially revealing presentations of the resulting data.

As part of this work, the University of Richmond has become a partner of the International Criminal Court’s Legal Tools Database project, contributing PDF versions of our Tokyo Trial documents to that resource. The overarching goal of the Legal Tools Database is to provide free and open access to legal information necessary for the prosecution of war crimes, crimes against humanity, genocide and other crimes within the jurisdiction of the ICC. By compiling all primary legal sources related to prosecuting violations of international criminal law, developing case management applications, and providing an e-learning platform, the project will equip legal practitioners in developing nations with the tools they need to do their work. The Nuremberg and Tokyo Tribunals were vital in identifying the need for a permanent international court to the young United Nations, and their records are of great value to the project.

Our meetings were productive on Monday, and outlined an ambitious path forward in our respective projects. At the University of Richmond, we will complete our local digitization of Tokyo Trial documents and work with the UN Archives to identify the UNWCC materials not present in the Law Library’s collection as priorities for digitization. Other project partners will work to digitize papers from various nations’ military courts, including those of the United Kingdom, Poland and, eventually, the United States. While the rest of the team went to lunch I stayed at the archives, reviewing several reels of microfilm to verify that these will all be uploaded to the Legal Tools Database and freely available for use by researchers, students and legal professionals alike.

unarm-promo

To wrap this up, I must say how thankful I am that our work at the library and the University of Richmond can, in some small way, contribute to an important international project like the Legal Tools Database and, ultimately, to the greater good. Our work continues, and if you are interested in participating please don’t hesitate to contact us.