Each year, the Library’s Discovery, Technology and Publishing department participates in the Information Services disaster recovery exercise. We do this in order to prove that we can provide continuity in case some unfortunate incident occurs.
Right now you are probably thinking “Unfortunate incident? Continuity? Disaster recovery? What does all that mean, and why should I care?”
The disasters, I discuss here, focus on those disasters that could threaten our network infrastructure and so, for simplicity’s sake, I call them “technology disasters.” Such a disaster could affect all systems that communicate on the UR network. Everything from email and Banner to even being able to browse UR’s website. A technology disaster can be very serious and could occur as part of a larger disaster, affecting the whole school physically (a tornado or hurricane as examples), or could be totally unnoticed by the UR community at all until, of course, they try to do something on the network.
What does technology disaster mean?
We live, and work, in a connected world and those connections occur through servers, routers, cables, cell towers, and people being available to monitor and react to issues that occur. A technology disaster threatens those connections. Anything that can unexpectedly cause any of these resources to become unavailable can potentially be disastrous, depending on the amount of time involved to recover the connections. Student records, staff and faculty compensation, and online learning are examples of systems that could be affected by a technology disaster.
These disasters can occur in many different ways but they generally have similar end results in common regardless of the type of disaster. For whatever reasons they occur (natural, accidental or man-made), time is of the essence. The examples below assume long term issues:
Loss of power – Long term power outages.
Loss or failure of equipment – Due, for example, to a water pipe bursting and flooding a server room, a UPS failure or a fire destroying cables that are necessary for data to flow across campus.
Loss of location – Due to having to abandon a server area due to smoke, water or fire.
Loss of internet connectivity – Due to losing UR’s long term connection to the internet.
Loss of personnel availability – Due to UR staff not being physically available to be on campus.
What does continuity mean?
Continuity involves determining which systems provide essential University services and making sure those systems are back online as soon as possible. This essential service determination was made by Information Services managers in close discussions with other departments on campus. The library system is included among these essential services: students must interact with Library resources for class assignments, and faculty members rely on our services for their research and teaching.
What does the Library disaster recovery exercise involve?
The Library’s part in this exercise includes building our database and web server, and developing documentation in a disaster recovery environment. Later, when the disaster is declared, we repeat this process and fine tune our documentation. This time however, since the information services disaster recovery exercise focuses on a long term disaster as a worst case scenario, we recreate the Library’s server at a secure site in a different geographic region, outside the Richmond area thus mitigating the losses in the examples above in a matter of hours instead of days. After the exercise we store the software and documentation off campus at a secure site. In the event of an actual disaster, that software and documentation will then be transferred to the remote location and we will have our servers up again quickly.