LOCKSS
("Lots of Copies Keep Stuff Safe")
1999-ben a Stanford University Libraries által indított projekt és open source peer-to-peer rendszerű platform digitális tartalmak biztonságos, hosszú távú megőrzésére és az azokhoz való (szabályozott) hozzáférésre. Eredetileg elektronikus folyóiratok raktározására találták ki, de azóta már mindenféle egyéb digitális anyagot (pl. disszertációkat, kormányzati dokumentumokat, fényképeket, audiovizuális fájlokat, webes tartalmakat) is beletöltenek a LOCKSS-t használó könyvtárak és egyéb intézmények. Bármely könyvtár létrehozhat (akár egy közepes PC-n) egy, a digitális anyagot tároló LOCKSS Box-ot, amit regisztrál az elosztott elven működő tárhely hálózatba. Ha egy tartalomtulajdonos engedélyt ad a könyvtárnak az online dokumentumai archiválására, akkor ehhez csak egy nyilatkozatot (permission statement) és egy jegyzéket (manifest) kell elhelyeznie a webhelyén. A LOCKSS Box ezután elkezdi egy crawlerrel begyűjteni a jegyzékben szereplő tartalmakat (megadott mélységig), összeveti a más Box-okban található azonos tartalommal (és automatikusan javítja az esetleges hibákat), továbbá web proxy-n, cache-en vagy metaadat feloldón keresztül hozzáférést biztosít a jogosult használóknak, ha az eredeti webhely elérhetetlenné válik. A rendszer automatikusan migrálja az elavult formátumokat újakra a megjelenítéshez.
Thus an important part of a LOCKSS plugin is a heuristic that
guesses how much content a crawl is expected to collect. If a
crawl fails to collect as much as expected, or collects a lot
more, an alert is generated. The ingest team will assess the crawl
in question to detect how the user experience has been optimized,
and tweak the plugin to match.
Ingestion
Stanford University LOCKSS Program staff analyze the target content’s URL structure, file formats and delivery mechanisms. They design, implement and update a tailored, content-specific preservation action plan that serves publishers, librarians and readers.
The publisher permits the LOCKSS system to collect, preserve and provide access to the content by putting a LOCKSS manifest page on the content’s website. The manifest page contains a LOCKSS permission statement and links to the issues (or other parts) of the content as they are published. The required manifest page is ingested and preserved with the original content, negating the need for paper contracts.
Software called a LOCKSS Plugin tells each institution’s LOCKSS Box where to find the publisher’s LOCKSS manifest page, and how far to follow the chains of web links. A LOCKSS Plugin encapsulates a publisher’s content model by listing parameters specific to each publishing platform. The LOCKSS team builds, tests and distributes plugins to LOCKSS Boxes registered with the LOCKSS Alliance.
Every LOCKSS Box is at an IP address, and this IP address falls within its parent University’s IP address range. Authorized LOCKSS Boxes independently collect ‘subscribed to’ content or ‘open access’ content directly from the publisher’s website. The publisher authorizes or denies a LOCKSS box’s access to content through their IP address access control system. Thus, all LOCKSS activity is registered on a publisher’s web logs. Publishers have access to real time statistics through their own systems. Preservation
The LOCKSS software continually monitors the content in each LOCKSS Box to ensure that it is being properly preserved, by cooperating over the Internet with other LOCKSS boxes to compare each box’s copies of the same content using technology that won an ACM research award:
Once ingest is complete, the monitoring technology ensures
that each LOCKSS Box has collected all intended content, thus preserving the authoritative version. The software monitors LOCKSS Boxes at regular intervals to determine whether any content has been damaged or lost, and can arrange for content repair from another LOCKSS Box.
The administrator of each LOCKSS Box can monitor the preservation status of the content in their Box, by looking at delivered content and the management tools available through the LOCKSS Box web administrative interface. Delivery
An institution’s LOCKSS Box can provide readers with continual, seamless access to branded publisher content. The LOCKSS system preserves content at it’s original URL, critically retaining the content’s relationship to other web resources. An institution’s LOCKSS Box delivers content to authorized readers only when the publisher’s website is unavailable (subscription canceled, network traffic, publisher server down). The LOCKSS Program works to preserve, and to deliver to readers, the publisher’s original artifact, in other words – what the publisher published.
LOCKSS Boxes provide three main ways for readers to access the content they preserve: by proxying (i.e. acting like a web cache), by serving (acting like a web server) or by serving through integration with an OpenURL resolver.
Proxying Institutions often run web proxies to allow off-campus users to access subscription content. Libraries that integrate their LOCKSS Box into a proxy (PAC Files, EZ Proxy, ICP, Squid) ensure a reader’s URL request is seamlessly fulfilled when the content is unavailable from the publisher’s website.
Basic Serving In the basic serving model, articles are accessed using a local URL pointing to the LOCKSS Box. The LOCKSS Box checks if the publisher will provide content to fulfill a reader’s request. If the content is not available from the publisher, the LOCKSS Box serves its own copy to the reader.
OpenURL Serving Libraries can integrate their LOCKSS Box with their library catalog and OpenURL resolver by adding their LOCKSS Box as a target to an OpenURL Resolver. See the video and white paper.
Post cancellation access to all preserved content is ensured as the content is under the library’s local custody. Management
Librarians administer their institution’s LOCKSS Boxes through a web browser that allows them to easily select new content for preservation, monitor content’s preservation status and a variety of other functions. The Stanford University LOCKSS staff provides support to LOCKSS Alliance participants.
Three audit and verification tools detail what content is in a library’s LOCKSS Box and the content’s preservation status.
On demand, a LOCKSS Box produces a KBART (Knowledge Bases And
Related Tools) report of the locally preserved content. A LOCKSS Box displays detailed preservation status for each Archival Unit. (An Archival Unit is typically a volume of a journal, or a complete book). A LOCKSS Box administrator can use a properly configured web browser from an authorized IP address to view preserved content through an “audit proxy.” The viewer sees the content as it was collected by the LOCKSS system.
Sustainable Format Migration
LOCKSS preserves all web published formats (animations, datasets, moving images, still images, software, sound, text) and genres (journals, books, blogs, websites, scanned files, audio, video). The LOCKSS software is format-agnostic and preserves all content in its original format, as delivered from the publisher, including the format metadata that enables a browser to render the content.
There is a risk that web content becomes obsolete when a reader’s browser cannot render a requested format. This has yet to occur; however, the LOCKSS Program’s approach to this risk is to migrate content on access. When a format is obsolete, in other words, when a reader’s web browser cannot display the content, the LOCKSS Box dynamically migrates the content to a newer format for display. This method, called “migration on access,” leverages the capabilities built into HTTP. If a reader requests LOCKSS- preserved content and that reader’s browser cannot display the content in its original format, the LOCKSS Box converts the original format to a format that the browser can display (a temporary access copy) and delivers the content to the reader. This “on the fly” migration ensures that readers see the latest and best version of scholarly material.
The LOCKSS Program’s “migration on access” approach has significant advantages over “format normalization” as it preserves original artifact, has much less overhead and expense, and uses the most up to date technology.
Preserving the content in its original format satisfies
archival requirements. It also allows the LOCKSS system to be frugal with storage space. We know of no preservation system that discards the original bits after migrating them to a new format. Migrating and keeping both the original and the migrated copy multiplies the storage requirements for a preservation system by the number of migrations. Preserved content is migrated by the most recent, and presumably best, technology available at the time the reader requests access. Preserved content is rarely accessed. Performing migration only when and if it is needed reduces the resource cost. Content can be migrated directly from the original to the current format, minimizing the effects of format conversion artifacts. The format converters, once developed, can themselves be preserved to document the original format.
The LOCKSS system performs bit preservation
and migrates content forward in time by leveraging the capabilities of HTTP and HTTPS.
https://en.wikipedia.org/wiki/LOCKSS https://www.lockss.org https://github.com/lockss