FW: [-empyre-] Preservation



I guess this question is addressed to the National Library participants.

A reasonable question. Given the great deal of resources required in
archiving digital objects, no-one wants to duplicate work.

That said, one answer I would give is: just try and find a complex web
site in the Internet Archive and see what you come up with - you will, I
am sure, be very frustrated. The ambition of the IA is to sweep the web
and pick up as much as it can. This is fine in a take what you can get
sort of way, and there is no doubt the IA is a magnificent resource.
But, for example, since they cannot do the quality assurance work on the
scale required for the amount of resources they attempt to archive it is
full of gaps and dysfunctional web sites. Since they archive without
express permission of the publishers their harvesters must follow
robot.txt rules so if the site says "don't index me" it will not be
gathered. The IA will also takedown archived resources upon request. 

The difference with PANDORA is that being selective we seek permission
to archive and we also do quality assurance testing and "fixing" of
archived resources. For example, if you archive a site with RealMedia
you only get the .ram metafile with your harvester. Unless someone
chases up the publisher for delivery of the actual media file (.rm .ra)
then you have not archived the site properly. The IA does not do this
sort of thing whereas we do. Also, we need to be able to plan for long
term access; leaving it up to a third party that may or may not exist in
50 years time does not really meet the responsibilities the Library is
charged with. 

That said, the Library of Congress, for example, works with the IA to do
quality collections such as those for September 11 and for various US
elections. The IA could certainly be used as an agent to do the
harvesting, but the long term management of the files for preservation
needs to be managed elsewhere. I would also say that we, the National
Library, are in fact working cooperatively with the IA and a number of
other leading players in the web archiving business to develop
strategies, tools, procedures etc. for advancing web archiving. We are
both members of the International Internet Preservation Consortium. See
the IIPC web site for more about this http://www.netpreserve.org

Paul

-----Original Message-----
From: empyre-bounces@lists.cofa.unsw.edu.au
[mailto:empyre-bounces@lists.cofa.unsw.edu.au] On Behalf Of Henry
Warwick
Sent: Friday, 25 June 2004 5:48 PM
To: soft_skinned_space
Subject: [-empyre-] Preservation


Why?

http://www.archive.org/

how does it differ from what you're trying to do, and in light of
archive.org's efforts, why replicate it?

HW

_______________________________________________
empyre forum
empyre@lists.cofa.unsw.edu.au
http://www.subtle.net/empyre





This archive was generated by a fusion of Pipermail 0.09 (Mailman edition) and MHonArc 2.6.8.