Webarchiválás bibliográfia – részletes lista

Utolsó frissítés dátuma: 2021-08-10
Cím szerint rendezett lista (704 tétel)


28 – After COVID? Classical mechanics

Típus Könyvfejezet
Szerző Graeme Hawley
Szerkesztő David Baker
Szerkesztő Lucy Ellis
URL https://www.sciencedirect.com/science/article/pii/B9780323884938000367
Sorozat Chandos Digital Information Review
Kiadó Chandos Publishing
Oldalszám 291-302
ISBN 978-0-323-88493-8
Dátum January 1, 2021
Egyéb DOI: 10.1016/B978-0-323-88493-8.00036-7
Hozzáférés 2021. 07. 15. 11:05:38
Könyvtár Katalógus ScienceDirect
Nyelv en
Kivonat COVID-19 has understandably been foremost in our minds over the last year and will continue to be for some time, but it is not the only urgent crisis that individuals, societies, and nations face. This essay looks at current events through the lens of Alvin Toffler’s publication The Third Wave, focusing especially on the accelerative nature of change today and how it increases complexity. Graeme Hawley, Head of General Collections at the National Library of Scotland, considers what accelerative change means in terms of the collections he is responsible for, and the extent to which COVID-19 is likely to impact accelerative change in the immediate future. The essay takes a broad look at topics that, although distinct in themselves, all share the qualities of velocity, and all seem to be happening at roughly the same time so that we can situate the post-COVID world in its fuller context.
Könyv címe Libraries, Digital Information, and COVID
Rövid cím 28 – After COVID?
Hozzáadás dátuma 2021. 08. 09. 8:44:18
Módosítás dátuma 2021. 08. 09. 8:44:18

Címkék:

  • Web archiving
  • National libraries
  • Social media
  • Accelerative change
  • Complexity
  • Digital publishing
  • National Library of Scotland
  • Velocity

404 Not Found – Ki őrzi meg az internetet; Webarchiválás workshop az Országos Széchényi Könyvtárban

Típus Folyóiratcikk
Szerző Márton Németh
Kötet 64
Szám 11
Oldalszám 577-582
Kiadvány Tudományos és műszaki tájékoztatás
ISSN 0041-3917
Dátum 2017
Egyéb Number: 11
Kivonat 2017. október 13-án első alkalommal került sor kifejezetten a számítógépes világháló archiválásával foglalkozó rendezvényre az Országos Széchényi Könyvtárban (OSZK). Az intézmény és a Kormányzati Informatikai Ügynökség (KIFÜ) keretei között zajló Országos Könyvtári Rendszer (OKR) projekt egyik munkacsoportjaként idén tavasztól kezdhettünk el egy kísérleti projekt keretében foglalkozni a webarchiválással. (Bővebb információt a http://mekosztaly.oszk.hu/mia oldalon lehet találni erről.) Célunk az, hogy a projektidőszak végére egy olyan koncepcióval álljunk elő, mely lehetővé teszi, számos európai nemzeti könyvtár mintájára, az üzemszerű munkafolyamatként zajló webarchiválási tevékenység ellátását, illetve szervezését az OSZK részéről. Egy olyan rendszert kívánunk létrehozni, amely a kulturális örökség hosszú távú megőrzésének feladata mellett képes kiszolgálni az oktatás, a tudományos kutatás, az állami szervek, az üzleti szféra és az egyes internethasználók igényeit is. Az archívum megvalósulásával a most csak jelen időben létező magyar internetnek „múltja” is lenne, és olyan lehetőségek nyílnak meg a mai és a jövőbeli felhasználói számára, amelyek jelenleg nem, vagy csak nehézkesen valósíthatók meg (pl. megszűnt weboldalak megtalálása, webhelyek időbeli változásának elemzése és vizualizálása, stabil hivatkozhatóság, idődimenziót is tartalmazó szöveg- és adatbányászati alkalmazások futtatása, internettörténeti kutatások, hiteles másolatok szolgáltatása). A projekt első fél évét mintegy lezárva került sor rendezvényünkre. A program összeállításakor különös gondot fordítottunk a meglévő külföldi szakmai tapasztalatok, illetve az itthoni előzmények bemutatására. A workshop hangsúlyos céljaként szerepelt továbbá a teljes közgyűjteményi szféra (
Hozzáadás dátuma 2021. 08. 09. 8:43:28
Módosítás dátuma 2021. 08. 09. 8:43:28

Címkék:

  • webarchiválás
  • 404 workshop
  • MIA pilot projekt
  • Országos Széchényi Könyvtár

404 Not Found – Ki őrzi meg az internetet?

Típus Folyóiratcikk
Szerző Csaba Latorcai
URL http://ojs.elte.hu/kf/article/view/2295
Kötet 67
Szám 1.
Oldalszám 28-30
Kiadvány Könyvtári Figyelő
ISSN 1586-5193
Dátum Április 17, 2021
Egyéb Number: 1.
Section: Műhely
Folyóirat rövid neve KF
Hozzáférés 2021. 08. 04. 2:00:00
Nyelv magyar
Kivonat Az egyre növekvő mennyiségű digitális tartalom és az internethasználat általános elterjedése megköveteli, hogy a digitális térben keletkezett adattartalom, a digitális múlt tartós és biztonságos megőrzése megvalósuljon a tudományos feldolgozás, a jövő nemzedékei számára történő átörökítés és társadalmi hasznosítás érdekében – hangsúlyozta a cikk szerzője, Latorcai Csaba, az Emberi Erőforrások Minisztériuma közigazgatási államtitkára a 2020. december 2-i „404 Not Found – Ki őrzi meg az internetet?” című, az Országos Széchényi Könyvtár által rendezett online workshopon tartott előadásában. A digitális térben keletkezett adatok szakszerű és biztonságos megőrzéséhez, hasznosításához megfelelő szakmai háttérre van szükség, amelyet az Országos Széchényi Könyvtár, mint a magyar nemzet könyvtára biztosít. Az Országos Széchényi Könyvtár informatikai fejlesztéséhez szükséges források biztosításáról a Kormány 2016-ban határozatot hozott. Az előkészítő munkálatok lezárultak, a nemzeti könyvtár felkészült a webarchiválási feladatok folyamatos ellátására, és Kormányhatározat, illetve törvénymódosítás teremti meg 2021. január 1-től a webarchiválás jogi és finanszírozási kereteit. A nemzeti könyvtár a hazai könyvtárak minél szélesebb körét bevonva, követve és alakítva a nemzetközi szakmai tendenciákat, megvalósítja a hungarikumnak minősülő webes tartalom tartós megőrzését, használatra bocsátását, biztosítja annak tudományos feldolgozását.
Hozzáadás dátuma 2021. 08. 09. 8:44:42
Módosítás dátuma 2021. 08. 09. 8:44:42

2014 not found: a cross-platform approach to retrospective web archiving

Típus Folyóiratcikk
Szerző Anat Ben-David
URL https://www.tandfonline.com/doi/full/10.1080/24701475.2019.1654290
Kötet 3
Szám 3-4
Oldalszám 316-342
Kiadvány Internet Histories
ISSN 2470-1475
Dátum 2019-10-02
Egyéb Number: 3-4
DOI 10.1080/24701475.2019.1654290
Hozzáadás dátuma 2021. 08. 09. 8:43:41
Módosítás dátuma 2021. 08. 09. 8:43:41

Címkék:

  • Google
  • Internet Archive
  • Twitter
  • War in Gaza
  • web archiving
  • Wikipedia
  • YouTube

A Baseline Search Engine for Personal Life Archives

Típus Dolgozat
Szerző Liting Zhou
Szerző Duc-Tien Dang-Nguyen
Szerző Cathal Gurrin
URL http://dl.acm.org/citation.cfm?doid=3133202.3133206
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 21-24
ISBN 978-1-4503-5503-2
Dátum 2017
DOI 10.1145/3133202.3133206
Kivonat In lifelogging, as the volume of personal life archive data is ever increasing, we have to consider how to take advantage of a tool to extract or exploit valuable information from these personal life archives. In this work we motivate the need for, and present, a baseline search engine for personal life archives, which aims to make the personal life archive searchable, organizable and easy to be updated. We also present some preliminary results, which illustrate the feasibility of the baseline search engine as a tool for getting insights from personal life archives.
Kiadvány címe Proceedings of the 2nd Workshop on Lifelogging Tools and Applications – LTA '17
Hozzáadás dátuma 2021. 08. 09. 8:42:41
Módosítás dátuma 2021. 08. 09. 8:42:41

Címkék:

  • Lifelogging
  • Personal Life Archive
  • Search Engine

A browser for browsing the past web

Típus Dolgozat
Szerző Adam Jatowt
Szerző Yukiko Kawai
Szerző Satoshi Nakamura
Szerző Yutaka Kidawara
Szerző Katsumi Tanaka
URL http://portal.acm.org/citation.cfm?doid=1135777.1135923
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 877
ISBN 1-59593-323-9
Dátum 2006
DOI 10.1145/1135777.1135923
Kivonat We describe a browser for the past web. It can retrieve data from multiple past web resources and f eatures a passive browsing style based on change detection and pr esentation. The browser shows past pages one by one along a tim e line. The parts that were changed between consecutive page versions are animated to reflect their deletion or insertion, thereby drawing the user’s attention to them. The browser enables automatic skipping of changeless periods and filtered br owsing based on user specified query
Kiadvány címe Proceedings of the 15th international conference on World Wide Web – WWW '06
Hozzáadás dátuma 2021. 08. 09. 8:43:32
Módosítás dátuma 2021. 08. 09. 8:43:32

Címkék:

  • web archives
  • Past web
  • web archive browsing

A common language

Típus Folyóiratcikk
Szerző Marc Weber
URL http://www.tandfonline.com/doi/abs/10.1080/24701475.2017.1317118
Kötet 1
Szám 1-2
Oldalszám 26-38
Kiadvány Internet Histories
ISSN 2470-1475
Dátum 2017-01-02
Egyéb Number: 1-2
Publisher: Routledge
DOI 10.1080/24701475.2017.1317118
Kivonat What would a cultural history of the Internet look like? The question almost makes no sense: the Internet spans the globe and traverses any number of completely distinct human groups. It simply cannot have a single culture. And yet, like the railroad, the telegraph and the highway system before it, the Internet has been an extraordinary agent for cultural change. How should we study that process? To begin to answer that question, this essay returns to four canonical studies of earlier technologies and cultures: Carolyn Marvin's When Old Technologies Were New; Leo Marx's The Machine in the Garden; Ruth Schwarz Cowan's More Work for Mother and Lynn Spigel's Make Room for TV. In each case, the essay mines the earlier works for research tactics and uses them as jumping-off points to explore the ways in which the Internet requires new and different approaches. It concludes by speculating on the ways that the American-centric nature of much earlier work will need to be replaced with a newly global focus and research tactics to match.
Hozzáadás dátuma 2021. 08. 09. 8:41:45
Módosítás dátuma 2021. 08. 09. 8:41:45

A cseh web és a kötelespéldány-rendelet

Típus Folyóiratcikk
Szerző Ludmila Celbová
Szerző Margit Prókai
Szám 3
Oldalszám 518-520
Kiadvány Könyvtári figyelő : külföldi lapszemle
Dátum 2009
Egyéb Number: 3
Kivonat Csehországban nincs jogszabály az elektronikus kötelespéldányok beszolgáltatásáról. 2000 óta foglalkoznak a nemzeti könyvtárban a webarchiválással, de a probléma nemzeti és nemzetközi szinten sem egyértelmű. Érinti a szerzői jogi törvényt, a nyomtatott kötelespéldányokról szóló szabályozást és a könyvtári törvény intézkedéseit.
Hozzáadás dátuma 2021. 08. 09. 8:43:28
Módosítás dátuma 2021. 08. 09. 8:43:28

Címkék:

  • cikkreferátum
  • -könyvtárügyi
  • Elektronikus publikáció
  • Hozzáférhetőség
  • Jogszabály
  • Kötelespéldány
  • Megőrzés

A decade of web archiving in the National and University Library in Zagreb

Típus Dolgozat
Szerző Karolina Holub
Szerző Ingeborg Rudomino
Hely Cape Town
Kiadó IFLA — International Federation of Library Associations and Institutions
Dátum 2015
Kivonat Due to the dynamic nature of the web, its explosive growth, short lifespan, instability and similar characteristics, the importance of its archiving has become priceless for future generations. The National and University Library in Zagreb (Nacionalna i sveučilišna knjižnica u Zagrebu, NSK), as a memory institution responsible for collecting, cataloguing, archiving and providing access to all types of resources, recognized the significance of collecting and storing online content as part of the NSK's core activities. This is supported by positive legal environment since 1997 when Croatia passed the Law on libraries which subjected online publications to legal deposit. In 2004 NSK established the Croatian Web Archive (Hrvatski arhiv weba, HAW) in collaboration with the University Computing Centre (Srce) and developed a system for capturing and archiving Croatian web resources. From 2004 to 2010 only selective archiving of web resources was conducted according to preestablished selection criteria. Taking into account NSK’s responsibility to preserve resources on Croatian social, scientific and cultural history, the importance of taking a snapshot of all publicly available resources under the national top level domain (.hr) was been recognized in 2011. Since then national domain harvestings have been conducted annually. In addition, in 2011 NSK started to run thematic harvestings of national importance. The paper will present the NSK's ten years’ experience in managing web resources with the emphasis on implementation of the system for selective and domain harvesting as well as the challenges for providing access to archived resources. Also, the harvested data from 2004 to 2014 will be analysed. The findings will illustrate the variability of URLs, frequency of harvesting and types of content. The data from the last four .hr harvestings will also be presented
Kiadvány címe Preservation and Conservation with Information Technology. IFLA 2015 South Africa
Hozzáadás dátuma 2021. 08. 09. 8:41:48
Módosítás dátuma 2021. 08. 09. 8:41:48

Címkék:

  • legal deposit
  • : web archiving
  • Croatian Web Archive
  • national domain harvesting
  • selective harvesting
  • thematic harvesting

A Deep Learning Approach to Identify Not Suitable for Work Images

Típus Folyóiratcikk
Szerző Daniel Bicho
Kötet 6
Szám 1
Oldalszám 11
Dátum 2020
Egyéb Number: 1
Könyvtár Katalógus Zotero
Nyelv en
Kivonat Web Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for the future. Arquivo.pt is a WA initiative holding a huge amount of content, including image files. However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not Suitable For Work (NSFW). This work proposes a solution to classify NSFW images found at Arquivo.pt, with deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset. The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively. The proposed solution is integrated into the Arquivo.pt Image Search System, available at https://arquivo.pt/images.jsp.
Hozzáadás dátuma 2021. 08. 09. 8:44:15
Módosítás dátuma 2021. 08. 09. 8:44:15

A Framework for Aggregating Private and Public Web Archives

Típus Dolgozat
Szerző Mat Kelly
Szerző Michael L. Nelson
Szerző Michele C. Weigle
URL http://dl.acm.org/citation.cfm?doid=3197026.3197045
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 273-282
ISBN 978-1-4503-5178-2
Dátum 2018
DOI 10.1145/3197026.3197045
Kivonat Personal and private Web archives are proliferating due to the increase in the tools to create them and the realization that Internet Archive and other public Web archives are unable to capture personalized (e.g., Facebook) and private (e.g., banking) Web pages. We introduce a framework to mitigate issues of aggregation in private, personal, and public Web archives without compromising potential sensitive information contained in private captures. We amend Memento syntax and semantics to allow TimeMap enrichment to account for additional attributes to be expressed inclusive of the requirements for dereferencing private Web archive captures. We provide a method to involve the user further in the negotiation of archival captures in dimensions beyond time. We introduce a model for archival querying precedence and short-circuiting, as needed when aggregating private and personal Web archive captures with those from public Web archives through Memento. Negotiation of this sort is novel to Web archiving and allows for the more seamless aggregation of various types of Web archives to convey a more accurate picture of the past Web.
Kiadvány címe Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries – JCDL '18
Hozzáadás dátuma 2021. 08. 09. 8:42:39
Módosítás dátuma 2021. 08. 09. 8:42:39

Címkék:

  • web archiving
  • memento
  • privacy
  • personalization

A Framework for Verifying the Fixity of Archived Web Resources

Típus Szakdolgozat
Szerző Mohamed Aturban
URL https://www.proquest.com/dissertations-theses/framework-verifying-fixity-archived-web-resources/docview/2451138951/se-2?accountid=15756
Hely Ann Arbor
Dátum 2020
Pontos lelőhely 2451138951
Egyéb ISBN: 9798678108180
Publication Title: ProQuest Dissertations and Theses
28089682
Típus Ph.D.
Nyelv English
Egyetem Old Dominion University
Kivonat The number of public and private web archives has increased, and we implicitly trust content delivered by these archives. Fixity is checked to ensure that an archived resource has remained unaltered (i.e., fixed) since the time it was captured. Currently, end users do not have the ability to easily verify the fixity of content preserved in web archives. For instance, if a web page is archived in 1999 and replayed in 2019, how do we know that it has not been tampered with during those 20 years? In order for the users of web archives to verify that archived web resources have not been altered, they should have access to fixity information associated with these resources. However, most web archives do not allow accessing fixity information and, more importantly, even if fixity information is available, it is provided by the same archive delivering the resource, not by an independent archive or service.
In this research, we present a framework for establishing and checking the fixity on the playback of archived resources, or mementos. The framework defines an archive-aware hashing function that consists of several guidelines for generating repeatable fixity information on the playback of mementos. These guidelines are results of our 14-month study for identifying and quantifying changes in replayed mementos over time that affect generating repeatable fixity information. Changes on the playback of mementos may be caused by JavaScript, transient errors, inconsistency in the availability of mementos over time, and archive-specific resources. Changes are also caused by transformations in the content of archived resources applied by web archives to appropriately replay these resources in a user's browser. The study also shows that only 11.55% of mementos always produce the same fixity information after each replay, while about 16.06% of mementos always produce different fixity information after each replay. The remaining 72.39% of mementos produce multiple unique fixity information. We also find that mementos may disappear when web archives move to different domains or archives.
In addition to defining multiple guidelines for generating fixity information, the framework introduces two approaches, Atomic and Block, that can be used to disseminate fixity information to web archives. The main difference between the two approaches is that, in the Atomic approach, the fixity information of each archived web page is stored in a separate file before being disseminated to several on-demand web archives, while in the Block approach, we batch together fixity information of multiple archived pages to a single binary-searchable file before being disseminated to archives. The framework defines the structure of URLs used to publish fixity information on the web and retrieve archived fixity information from web archives. Our framework does not require changes in the current web archiving infrastructure, and it is built based on well-known web archiving standards, such as the Memento protocol. The proposed framework will allow users to generate fixity information on any archived page at any time, preserve the fixity information independently from the archive delivering the archived page, and verify the fixity of the archived page at any time in the future.
Terjedelem 260
Archívum ProQuest One Academic
Hozzáadás dátuma 2021. 08. 09. 8:44:40
Módosítás dátuma 2021. 08. 09. 8:44:40

Címkék:

  • Web archiving
  • Archives
  • Memento
  • Computer science
  • Library science
  • 0399:Library science
  • 0646:Web Studies
  • 0984:Computer science
  • Archived web pages
  • Framework
  • Verifying fixity
  • Web studies

A Framework for Web Archiving and Guaranteed Retrieval

Típus Könyvfejezet
Szerző A Devendran
Szerző K Arunkumar
Szerkesztő Neha Sharma
Szerkesztő Amlan Chakrabarti
Szerkesztő Valentina Emilia Balas
URL http://link.springer.com/10.1007/978-981-13-9364-8_16
Hely Singapore
Kiadó Springer Singapore
Oldalszám 205-215
ISBN 978-981-13-9364-8
Dátum 2020
Egyéb DOI: 10.1007/978-981-13-9364-8_16
Kivonat As of today, ‘web.archive.org’ has more than 338 billion web pages archived. How many of those pages are 100% retrieval. How many of the pages were left out or ignored just because the page doesn’t have some compatibility issue? How many of them were vernacular language and encoded in different formats (before UNICODE is standardized)? If we are talking about the content-type text. Consider other mime types which were encoded and decoded with different algorithms. The fundamental reason for this lies with the fundamental representation of digital data. We all know a sequence of 0 s and 1 s doesn’t make proper sense unless it is decoded properly. At the time of archiving, the browsers which could have rendered properly might have gone obsolete or upgraded way beyond to recognize old formats or the browser platforms could have been upgraded to recognize old formats. We studied various data preservation, web archiving related works and proposed a new framework that could store the exact client browser details (user-agent) in the WARC record and use it to load corresponding browser @ client side and render the archived content.
Könyv címe Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1016
Hozzáadás dátuma 2021. 08. 09. 8:43:40
Módosítás dátuma 2021. 08. 09. 8:43:40

Címkék:

  • Web archiving
  • Guaranteed retrieval
  • Personal data

A Grounded Theory of Information Quality for Web Archives

Típus Dokumentum
Szerző Brenda Reyes Ayala
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Dátum 2018
Egyéb Place: United States, North America
Kivonat Presentation for the dissertation defense of Brenda Reyes Ayala. This presentation builds a theory of information quality for web archives that is grounded in human-centered data.
Hozzáadás dátuma 2021. 08. 09. 8:42:10
Módosítás dátuma 2021. 08. 09. 8:42:10

Címkék:

  • web archiving
  • grounded theory
  • information quality

A historian's view on the right to be forgotten

Típus Folyóiratcikk
Szerző Antoon De Baets
URL http://10.0.4.56/13600869.2015.1125155
Kötet 30
Szám 1-2
Oldalszám 57-66
Kiadvány International Review of Law, Computers & Technology
ISSN 1360-0869
Dátum 2016-01-02
Egyéb Number: 1-2
Publisher: Routledge
ISBN: 13600869
DOI 10.1080/13600869.2015.1125155
Kivonat This essay explores the consequences for historians of the ‘right to be forgotten', a new concept proposed by the European Commission in 2012. I first explain that the right to be forgotten is a radical variant of the right to privacy and clarify the consequences of the concept for the historical study of public and private figures. I then treat the hard cases of spent and amnestied convictions and of internet archives. I further discuss the applicability of the right to be forgotten to dead persons as part of the problem of posthumous privacy, and finally point to the ambiguity of the impact of the passage of time. While I propose some compromise solutions, I also conclude that a generalized right to be forgotten would lead to the rewriting of history in ways that impoverish our insights not only into anecdotal lives but also into the larger trends of history. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:43
Módosítás dátuma 2021. 08. 09. 8:42:43

Címkék:

  • WEB archives
  • privacy
  • right to be forgotten
  • amnesty
  • DATA protection laws
  • EUROPEAN Commission
  • internet archives
  • passage of time
  • PERSONALLY identifiable information
  • posthumous privacy
  • private and public figures
  • RIGHT of privacy
  • RIGHT to be forgotten
  • right to forget
  • spent convictions

A History of an Internet Exchange Point

Típus Folyóiratcikk
Szerző Juan Camilo Cardona Restrepo
Szerző Rade Stanojevic
URL http://doi.acm.org/10.1145/2185376.2185384
Kötet 42
Szám 2
Oldalszám 58-64
Kiadvány SIGCOMM Comput. Commun. Rev.
ISSN 0146-4833
Dátum 2012
Egyéb Number: 2
Publisher: ACM
Citation Key: CardonaRestrepo:2012:HIE:2185376.2185384
Place: New York, NY, USA
DOI 10.1145/2185376.2185384
Kivonat In spite of the tremendous amount of measurement efforts on understanding the Internet as a global system, little is known about the 'local' Internet (among ISPs inside a region or a country) due to limitations of the existing measurement tools and scarce data. In this paper, empirical in nature, we characterize the evolution of one such ecosystem of local ISPs by studying the interactions between ISPs happening at the Slovak Internet eXchange (SIX). By crawling the web archive waybackmachine.org we collect 158 snapshots (spanning 14 years) of the SIX website, with the relevant data that allows us to study the dynamics of the Slovak ISPs in terms of: the local ISP peering, the traffic distribution, the port capacity/utilization and the local AS-level traffic matrix. Examining our data revealed a number of invariant and dynamic properties of the studied ecosystem that we report in detail.
Hozzáadás dátuma 2021. 08. 09. 8:43:11
Módosítás dátuma 2021. 08. 09. 8:43:11

Címkék:

  • internet exchange
  • internet traffic
  • peering
  • traffic matrix

A Holistic View on Web Archives

Típus Könyvfejezet
Szerző Helge Holzmann
Szerző Wolfgang Nejdl
Szerkesztő Daniel Gomes
Szerkesztő Elena Demidova
Szerkesztő Jane Winters
Szerkesztő Thomas Risse
URL https://doi.org/10.1007/978-3-030-63291-5_8
Hely Cham
Kiadó Springer International Publishing
Oldalszám 85-99
ISBN 978-3-030-63291-5
Dátum 2021
Egyéb DOI: 10.1007/978-3-030-63291-5_8
Hozzáférés 2021. 07. 15. 9:52:26
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat In order to address the requirements of different user groups and use cases of web archives, we have identified three views to access and explore web archives: user-, data- and graph-centric. The user-centric view is the natural way to look at the archived pages in a browser, just like the live web is consumed. By zooming out from there and looking at whole collections in a web archive, data processing methods can enable analysis at scale. In this data-centric view, the web and its dynamics as well as the contents of archived pages can be looked at from two angles: (1) by retrospectively analysing crawl metadata with respect to the size, age and growth of the web and (2) by processing archival collections to build research corpora from web archives. Finally, the third perspective is what we call the graph-centric view, which considers websites, pages or extracted facts as nodes in a graph. Links among pages or the extracted information are represented by edges in the graph. This structural perspective conveys an overview of the holdings and connections among contained resources and information. Only all three views together provide the holistic view that is required to effectively work with web archives.
Könyv címe The Past Web: Exploring Web Archives
Hozzáadás dátuma 2021. 08. 09. 8:43:58
Módosítás dátuma 2021. 08. 09. 8:43:58

A Memento Web Browser for iOS

Típus Dolgozat
Szerző Heather Tweedy
Szerző Frank McCown
Szerző Michael L Nelson
URL http://doi.acm.org/10.1145/2467696.2467764
Hely New York, NY, USA
Kiadó ACM
Oldalszám 371-372
ISBN 978-1-4503-2077-1
Dátum 2013
Egyéb Series Title: JCDL '13
Citation Key: Tweedy:2013:MWB:2467696.2467764
DOI 10.1145/2467696.2467764
Kivonat The Memento framework allows web browsers to request and view archived web pages in a transparent fashion. However, Memento is still in the early stages of adoption, and browser-plugins are often required to enable Memento support. We report on a new iOS app called the Memento Browser, a web browser that supports Memento and gives iPhone and iPad users transparent access to the world's largest web archives.
Kiadvány címe Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:35
Módosítás dátuma 2021. 08. 09. 8:43:35

Címkék:

  • web archiving
  • memento
  • mobile web
  • web browser

A Method for Identifying Personalized Representations in Web Archives

Típus Folyóiratcikk
Szerző Mat Kelly
Szerző Justin F Brunelle
Szerző Michele C Weigle
Szerző Michael L Nelson
URL https://search.proquest.com/docview/1622284455?accountid=27464
Kötet 19
Szám 11-12
Kiadvány D-Lib Magazine
ISSN 1082-9873, 1082-9873
Dátum 2013-11
Egyéb Number: 11-12
Publisher: Corporation for National Research Initiatives, Reston, VA
Place: Old Dominion University mkelly@cs.odu.edu
DOI http://dx.doi.org/10.1045/november2013-kelly
Nyelv English
Kivonat Web resources are becoming increasingly personalized – two different users clicking on the same link at the same time can see content customized for each individual user. These changes result in multiple representations of a resource that cannot be canonicalized in Web archives. We identify characteristics of this problem by presenting a potential solution to generalize personalized representations in archives. We also present our proof-of-concept prototype that analyzes WARC (Web ARChive) format files, inserts metadata establishing relationships, and provides archive users the ability to navigate on the additional dimension of environment variables in a modified Wayback Machine. Adapted from the source document.
Hozzáadás dátuma 2021. 08. 09. 8:43:04
Módosítás dátuma 2021. 08. 09. 8:43:04

Címkék:

  • Web archiving
  • Web sites
  • 9.15: TECHNICAL SERVICES – PRESERVATION
  • article
  • Customization
  • Methods

A MIA pilot rövid bemutatása

Típus Előadás
Előadó György Kampis
URL https://webarchivum.oszk.hu/wp-content/uploads/2020/03/Kampis_Gyorgy_MIA_pilot_GK_ea.pptx
Hely Budapest
Dátum 2017
Egyéb Presenters: _:n5857
Hozzáférés 2020. 08. 17. 17:18:30
Találkozó neve 404 Not Found” workshop
Nyelv hu-HU
Hozzáadás dátuma 2021. 08. 09. 8:43:46
Módosítás dátuma 2021. 08. 09. 8:43:46

A New Online Archive of Encoded Fado Transcriptions.

Típus Folyóiratcikk
Szerző TIAGO GONZAGA VIDEIRA
Szerző JORGE MARTINS ROSA
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 12
Szám 3/4
Oldalszám 229-243
Kiadvány Empirical Musicology Review
ISSN 15595749
Dátum 2017-07
Egyéb Number: 3/4
Publisher: Empirical Musicology Review
Kivonat A new online archive of encoded fado transcriptions is presented. This dataset is relevant as the first step towards a cultural heritage archive and as source material for the study of songs associated with fado practice using empirical, analytical and systematic methodologies (namely MIR techniques). It is also relevant as a source for artistic purposes, namely the creation of new songs. We detail the constitution of this symbolic music corpus and present how we conceived of and implemented a methodology for testing its internal consistency using a supervised classification system. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:38
Módosítás dátuma 2021. 08. 09. 8:42:38

Címkék:

  • WEB archives
  • FADOS
  • INFORMATION retrieval
  • methodology
  • music information retrieval
  • symbolic corpus

A Proposal of a Big Web Data Application and Archive for the Distributed Data Processing with Apache Hadoop

Típus Könyvfejezet
Szerző Martin Lnenicka
Szerző Jan Hovad
Szerző Jitka Komarkova
Szerkesztő Manuel Núñez
Szerkesztő Ngoc Thanh Nguyen
Szerkesztő David Camacho
Szerkesztő Bogdan Trawiński
URL http://link.springer.com/10.1007/978-3-319-24306-1_28
Hely Cham
Kiadó Springer International Publishing
Oldalszám 285-294
ISBN 978-3-319-24306-1
Dátum 2015
Egyéb DOI: 10.1007/978-3-319-24306-1_28
Kivonat In recent years, research on big data, data storage and other topics that represent innovations in the analytics field has become very popular. This paper describes a proposal of a big web data application and archive for the distributed data processing with Apache Hadoop, including the framework with selected methods, which can be used with this platform. It proposes a workflow to create a web content mining application and a big data archive, which uses modern technologies like Python, PHP, JavaScript, MySQL and cloud services. It also shows the overview about the architecture, methods and data structures used in the context of web mining, distributed processing and big data analytics.
Könyv címe Computational Collective Intelligence. Lecture Notes in Computer Science, vol 9330
Hozzáadás dátuma 2021. 08. 09. 8:43:27
Módosítás dátuma 2021. 08. 09. 8:43:27

Címkék:

  • Apache Hadoop
  • Web content mining
  • Big data analytics
  • Big web data
  • Distributed data processing
  • Python

A quantitative approach to evaluate Website Archivability using the CLEAR+ method

Típus Folyóiratcikk
Szerző Vangelis Banos
Szerző Yannis Manolopoulos
URL https://search.proquest.com/docview/1785958458?accountid=27464
Kötet 17
Szám 2
Oldalszám 119-141
Kiadvány International Journal on Digital Libraries
ISSN 14325012
Dátum 2016-06
Egyéb Number: 2
Publisher: Springer Science & Business Media
Place: Heidelberg
DOI http://dx.doi.org/10.1007/s00799-015-0144-4
Nyelv English
Kivonat Website Archivability (WA) is a notion established to capture the core aspects of a website, crucial in diagnosing whether it has the potential to be archived with completeness and accuracy. In this work, aiming at measuring WA, we introduce and elaborate on all aspects of CLEAR+, an extended version of the Credible Live Evaluation Method for Archive Readiness (CLEAR) method. We use a systematic approach to evaluate WA from multiple different perspectives, which we call Website Archivability Facets. We then analyse archiveready.com, a web application we created as the reference implementation of CLEAR+, and discuss the implementation of the evaluation workflow. Finally, we conduct thorough evaluations of all aspects of WA to support the validity, the reliability and the benefits of our method using real-world web data.
Hozzáadás dátuma 2021. 08. 09. 8:42:36
Módosítás dátuma 2021. 08. 09. 8:42:36

Címkék:

  • Web archiving
  • Library And Information Sciences–Computer Applica
  • Digital archives
  • Web sites
  • Data mining
  • Web harvesting
  • Website Archivability

A query language for multi-version data web archives

Típus Folyóiratcikk
Szerző Marios Meimaris
Szerző George Papastefanatos
Szerző Stratis Viglas
Szerző Yannis Stavrakas
Szerző Christos Pateritsas
Szerző Ioannis Anagnostopoulos
URL http://10.0.4.87/exsy.12157
Kötet 33
Szám 4
Oldalszám 383-404
Kiadvány Expert Systems
ISSN 02664720
Dátum 2016-08
Egyéb Number: 4
Publisher: Wiley-Blackwell
DOI 10.1111/exsy.12157
Kivonat The Data Web refers to the vast and rapidly increasing quantity of scientific, corporate, government and crowd-sourced data published in the form of Linked Open Data, which encourages the uniform representation of heterogeneous data items on the web and the creation of links between them. The growing availability of open linked datasets has brought forth significant new challenges regarding their proper preservation and the management of evolving information within them. In this paper, we focus on the evolution and preservation challenges related to publishing and preserving evolving linked data across time. We discuss the main problems regarding their proper modelling and querying and provide a conceptual model and a query language for modelling and retrieving evolving data along with changes affecting them. We present in details the syntax of the query language and demonstrate its functionality over a real-world use case of evolving linked dataset from the biological domain. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:42
Módosítás dátuma 2021. 08. 09. 8:42:42

Címkék:

  • ARCHIVES
  • WEB archives
  • archiving
  • CROWDSOURCING
  • data evolution
  • Data Web
  • HETEROGENEOUS computing
  • INFORMATION visualization
  • LINKED data (Semantic Web)
  • linked data preservation
  • QUERY languages (Computer science)

A Reference Model for a Trusted Service Guaranteeing Web-content

Típus Könyvfejezet
Szerző Mihai Togan
Szerző Ionut Florea
Szerkesztő Helmut Reimer
Szerkesztő Norbert Pohlmann
Szerkesztő Wolfgang Schneider
URL http://link.springer.com/10.1007/978-3-658-10934-9_18
Hely Wiesbaden
Kiadó Springer Fachmedien Wiesbaden
Oldalszám 216-224
ISBN 978-3-658-10933-2 978-3-658-10934-9
Dátum 2015
Egyéb DOI: 10.1007/978-3-658-10934-9_18
Hozzáférés 2020. 08. 20. 9:42:44
Könyvtár Katalógus DOI.org (Crossref)
Nyelv en
Könyv címe ISSE 2015
Hozzáadás dátuma 2021. 08. 09. 8:43:49
Módosítás dátuma 2021. 08. 09. 8:43:49

A registry of archived electronic journals

Típus Folyóiratcikk
Szerző Sue Sparks
Szerző Hugh Look
Szerző Mark Bide
Szerző Adrienne Muir
URL http://journals.sagepub.com/doi/10.1177/0961000610361552
Kötet 42
Szám 2
Oldalszám 111-121
Kiadvány Journal of Librarianship and Information Science
ISSN 0961-0006
Dátum 2010-06-07
Egyéb Number: 2
DOI 10.1177/0961000610361552
Hozzáadás dátuma 2021. 08. 09. 8:42:44
Módosítás dátuma 2021. 08. 09. 8:42:44

A semantic architecture for preserving and interpreting the information contained in Irish historical vital records

Típus Folyóiratcikk
Szerző Christophe Debruyne
Szerző Oya Deniz Beyan
Szerző Rebecca Grant
Szerző Sandra Collins
Szerző Stefan Decker
Szerző Natalie Harrower
URL http://link.springer.com/10.1007/s00799-016-0180-8
Kötet 17
Szám 3
Oldalszám 159-174
Kiadvány International Journal on Digital Libraries
ISSN 1432-5012
Dátum 2016-09-01
Egyéb Number: 3
DOI 10.1007/s00799-016-0180-8
Hozzáadás dátuma 2021. 08. 09. 8:41:41
Módosítás dátuma 2021. 08. 09. 8:41:41

A Semantic Layer Querying Tool

Típus Dolgozat
Szerző Renato Stoffalette João
URL https://doi.org/10.1145/3437963.3441710
Sorozat WSDM '21
Hely New York, NY, USA
Kiadó Association for Computing Machinery
Oldalszám 1101–1104
ISBN 978-1-4503-8297-7
Dátum March 8, 2021
DOI 10.1145/3437963.3441710
Hozzáférés 2021. 07. 15. 2:00:00
Könyvtár Katalógus ACM Digital Library
Kivonat Web archiving is the process of gathering data from the Web, storing it and ensuring the data is preserved in an archive for future explorations. Despite the increasing number of web archives, the absence of meaningful exploration methods remains a major hurdle in the way of turning them into a useful information source. With the creation of profiles describing metadata information about the archived documents it is possible to offer a more exploitable environment that goes beyond the simple keyword-based search. By exploring the expressive power of SPARQL language and providing a user friendly web-based search interface, users can run sophisticated queries searching for documents that meet their information needs.
Kiadvány címe Proceedings of the 14th ACM International Conference on Web Search and Data Mining
Hozzáadás dátuma 2021. 08. 09. 8:44:08
Módosítás dátuma 2021. 08. 09. 8:44:08

Címkék:

  • web archives
  • semantic layers
  • information retrieval
  • SPARQL

A Study of Automation from Seed URL Generation to Focused Web Archive Development: The CTRnet Context

Típus Dolgozat
Szerző Seungwon Yang
Szerző Kiran Chitturi
Szerző Gregory Wilson
Szerző Mohamed Magdy
Szerző Edward A Fox
URL http://doi.acm.org/10.1145/2232817.2232881
Hely New York, NY, USA
Kiadó ACM
Oldalszám 341-342
ISBN 978-1-4503-1154-0
Dátum 2012
Egyéb Series Title: JCDL '12
Citation Key: Yang:2012:SAS:2232817.2232881
DOI 10.1145/2232817.2232881
Kivonat In the event of emergencies and disasters, massive amounts of web resources are generated and shared. Due to the rapidly changing nature of those resources, it is important to start archiving them as soon as a disaster occurs. This led us to develop a prototype system for constructing archives with minimum human intervention using the seed URLs extracted from tweet collections. We present the details of our prototype system. We applied it to five tweet collections that had been developed in advance, for evaluation. We also identify five categories of non- relevant files and conclude with a discussion of findings from the evaluation.
Kiadvány címe Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:08
Módosítás dátuma 2021. 08. 09. 8:43:08

Címkék:

  • crawling
  • archiving
  • digital library
  • crisis tragedy and recovery network
  • seed URL generation
  • tweet

A Supplementary Tool for Web-archiving Using Blockchain Technology

Típus Folyóiratcikk
Szerző John E. de Villiers
Szerző André P. Calitz
URL http://www.scielo.org.za/scielo.php?script=sci_abstract&pid=S2077-72132020000100003&lng=en&nrm=iso&tlng=en
Kötet 25
Oldalszám 1-14
Kiadvány The African Journal of Information and Communication
ISSN 2077-7213
Dátum 00/2020
Egyéb Publisher: Authors
DOI 10.23962/10539/29194
Hozzáférés 2021. 07. 15. 9:42:19
Könyvtár Katalógus SciELO
Hozzáadás dátuma 2021. 08. 09. 8:43:53
Módosítás dátuma 2021. 08. 09. 8:43:53

A System for Visualizing and Analyzing the Evolution of the Web with a Time Series of Graphs

Típus Dolgozat
Szerző Masashi Toyoda
Szerző Masaru Kitsuregawa
URL http://doi.acm.org/10.1145/1083356.1083387
Hely New York, NY, USA
Kiadó ACM
Oldalszám 151-160
ISBN 1-59593-168-6
Dátum 2005
Egyéb Series Title: HYPERTEXT '05
Citation Key: Toyoda:2005:SVA:1083356.1083387
DOI 10.1145/1083356.1083387
Kivonat We propose WebRelievo, a system for visualizing and analyzing the evolution of the web structure based on a large Web archive with a series of snapshots. It visualizes the evolution with a time series of graphs, in which nodes are web pages, and edges are relationships between pages. Graphs can be clustered to show the overview of changes in graphs. WebRelievo aligns these graphs according to their time, and automatically determines their layout keeping positions of nodes synchronized over time, so that the user can keep track pages and clusters. This visualization enables us to understand when pages appeared, how their relationships have evolved, and how clusters are merged and split over time. Current implementation of WebRelievo is based on six Japanese web archives crawled from 1999 to 2003. The user can interactively browse those graphs by changing the focused page and by changing layouts of graphs. Using WebRelievo we can answer historical questions, and to investigate changes in trends on the Web. We show the feasibility of WebRelievo by applying it to tracking trends in P2P systems and search engines for mobile phones, and to investigating link spamming.
Kiadvány címe Proceedings of the Sixteenth ACM Conference on Hypertext and Hypermedia
Hozzáadás dátuma 2021. 08. 09. 8:43:06
Módosítás dátuma 2021. 08. 09. 8:43:06

Címkék:

  • visualization
  • link analysis
  • evolution
  • link spamming
  • Web graph

A Time-aware Random Walk Model for Finding Important Documents in Web Archives

Típus Dolgozat
Szerző Tu Ngoc Nguyen
Szerző Nattiya Kanhabua
Szerző Claudia Niederée
Szerző Xiaofei Zhu
URL http://dl.acm.org/citation.cfm?doid=2766462.2767832
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 915-918
ISBN 978-1-4503-3621-5
Dátum 2015
DOI 10.1145/2766462.2767832
Kivonat Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, web archives are emerging as gold-mines for content analytics of many sorts. However, supporting search, which goes beyond navigational search via URLs, is a very challenging task in these unique structures with huge, redundant and noisy temporal content. In this paper, we address the search needs of expert users such as journalists, economists or historians for discovering a topic in time: Given a query, the top-k returned results should give the best representative documents that cover most interesting time-periods for the topic. For this purpose, we propose a novel random walk-based model that integrates relevance, temporal authority, diversity and time in a unified framework. Our preliminary experimental results on the large-scale real-world web archival collection shows that our method significantly improves the state-of-the-art algorithms (i.e., PageRank) in ranking temporal web pages.
Kiadvány címe Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval – SIGIR '15
Hozzáadás dátuma 2021. 08. 09. 8:43:34
Módosítás dátuma 2021. 08. 09. 8:43:34

Címkék:

  • Algorithms
  • Temporal Ranking
  • Authority
  • Diversity
  • Web Archive
  • Experimentation
  • Performance

A Topic Transition Map for Query Expansion: A Semantic Analysis of Click-Through Data and Test Collections

Típus Könyvfejezet
Szerző Kyung-min Kim
Szerző Yuchul Jung
Szerző Sung-Hyon Myaeng
Szerkesztő Byeong Ho Kang
Szerkesztő Quan Bai
URL http://link.springer.com/10.1007/978-3-319-50127-7_57
Hely Cham
Kiadó Springer International Publishing
Oldalszám 648-664
ISBN 978-3-319-50127-7
Dátum 2016
Egyéb DOI: 10.1007/978-3-319-50127-7_57
Kivonat Term mismatching between queries and documents has long been recognized as a key problem in information retrieval (IR). Based on our analysis of a large-scale web query log and relevant documents in standard test collections, we attempt to detect topic transitions between the topical categories of a query and those of relevant documents (or clicked pages) and create a Topic Transition Map (TTM) that captures how query topic categories are linked to those of relevant or clicked documents. TTM, a kind of click-graph at the semantic level, is then used for query expansion by suggesting the terms associated with the document categories strongly related to the query category. Unlike most other query expansion methods that attempt to either interpret the semantics of queries based on a thesaurus-like resource or use the content of a small number of relevant documents, our method proposes to retrieve documents in the semantic affinity of multiple categories of the documents relevant for the queries of a similar kind. Our experiments show that the proposed method is superior in effectiveness to other representative query expansion methods such as standard relevance feedback, pseudo relevance feedback, and thesaurus-based expansion of queries.
Könyv címe AI 2016: Advances in Artificial Intelligence
Hozzáadás dátuma 2021. 08. 09. 8:43:28
Módosítás dátuma 2021. 08. 09. 8:43:28

Címkék:

  • Query expansion
  • Relevance feedback
  • Semantic categorization of terms
  • Topic Transition Map

A UWS Case for 200-Style Memento Negotiations ; Bulletin of IEEE Technical Committee on Digital Libraries

Típus Folyóiratcikk
Szerző Zhiwu Xie
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Dátum 2015
Egyéb Publisher: IEEE Technical Committee on Digital Libraries
Place: United States, North America
Kivonat Uninterruptible web service (UWS) is a web archiving application that handles server errors using the most recently archived representation of the requested web resource. The application is developed as an Apache module. It leverages the transactional web archiving tool SiteStory, which archives all previously accessed representations of web resources originating from a website. This application helps to improve the websites quality of service by temporarily masking server errors from the end user and gaining precious time for the system administrator to debug and recover from server failures. By providing value-added support to website operations, we aim to reduce the resistance to transactional web archiving, which in turn may lead to a better coverage of web history.
Hozzáadás dátuma 2021. 08. 09. 8:42:58
Módosítás dátuma 2021. 08. 09. 8:42:58

Címkék:

  • Web archiving
  • Memento
  • Uninterruptible web service

A Web Archiving Method for Preserving Content Integrity by Using Blockchain

Típus Dolgozat
Szerző Hyun Cheon Hwang
Szerző Ji Su Park
Szerző Byung Rae Lee
Szerző Jin Gon Shon
Szerkesztő James J. Park
Szerkesztő Simon James Fong
Szerkesztő Yi Pan
Szerkesztő Yunsick Sung
Sorozat Lecture Notes in Electrical Engineering
Hely Singapore
Kiadó Springer
Oldalszám 341-347
ISBN 9789811593437
Dátum 2021
DOI 10.1007/978-981-15-9343-7_47
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat A web archive system has become an essential topic for preserving historical information for descendants with the explosive growth of web data. The reference model for an Open Archival Information System (OAIS) has been providing an excellent guide for a long-term archiving system, and most of web archive systems follow this guide. However, there is still a weak point in terms of content integrity due to the archival web data could be altered by unauthorized manner. In this paper, we proposed the BCLinked (Blockchain Linked) web archiving method which uses blockchain technology and an extended WARC (Web ARChive) file format to ensure the content integrity. Furthermore, we confirmed the proposed method ensures content integrity through the experiment.
Kiadvány címe Advances in Computer Science and Ubiquitous Computing
Hozzáadás dátuma 2021. 08. 09. 8:43:53
Módosítás dátuma 2021. 08. 09. 8:43:53

Címkék:

  • WARC
  • OAIS
  • Web archive
  • BCLinked web archiving method
  • Blockchain

A webarchiválás nemzetközi környezete. Mozaikok az IIPC 2019 kongresszusról

Típus Folyóiratcikk
Szerző Márton Németh
URL http://epa.oszk.hu/01300/01367/00309/pdf/EPA01367_3K_2018_12_023-027.pdf
Kötet 27
Szám 12
Oldalszám 23-27
Kiadvány Könyv, Könyvtár, Könyvtáros
ISSN 2732-0375
Dátum 2018. december
Egyéb Number: 12
Folyóirat rövid neve 3K
Nyelv magyar
Hozzáadás dátuma 2021. 08. 09. 8:44:42
Módosítás dátuma 2021. 08. 09. 8:44:42

A webarchiválás oktatása = The education of web-archiving

Típus Könyvfejezet
Szerző László Drótos
Szerző Márton Németh
URL https://doi.org/10.31915/NWS.2018.4
Jogok cc_by
Hely Budapest
Kiadó HUNGARNET Egyesület
Oldalszám 31-37
Dátum 2018
Hozzáférés 2020. 08. 17. 16:25:46
Könyvtár Katalógus real.mtak.hu
Nyelv hu
Kivonat The article is focusing on three main issues. At first, an overview is being offered about an online research seminar for PhD students and web-archiving professionals organized by the NETLAB Research group, Aarhus University, Denmark. Secondly, the recently established Education and Training Working Group of the IIPC consortium is being introduced. A quick overview is being offered about a brief survey on best web archiving education practices and future. Thirdly, a Hungarian web-archiving training concept is being described. The training will be organized by the Library Institute for any kind of cultural heritage professionals that want to get basic skills and competences in this field.
Könyv címe NETWORKSHOP 2018 konferenciakiadvány
Hozzáadás dátuma 2021. 08. 09. 8:43:44
Módosítás dátuma 2021. 08. 09. 8:43:44

A webarchiválásról történeti megközelítésben

Típus Folyóiratcikk
Szerző Márton Németh
URL http://ki2.oszk.hu/3k/2018/06/a-webarchivalasrol-torteneti-megkozelitesben/
Kötet 27
Szám 2
Oldalszám 48-52
Kiadvány Könyv, könyvtár, könyvtáros
ISSN 1216-6804
Dátum 2018
Egyéb Number: 2
Kivonat A tanulmánykötet esettanulmányok formájában az elsők között tesz kísérletet arra, hogy felvillantsa a webarchiválás történeti, illetve széles társadalomtudományi kontextusának számos fontos elemét. A szerkesztők előszava is kitér rá, hogy eddig inkább az volt a jellemző, hogy magáról a webarchiválási folyamatról, annak technikai részleteiről, a világháló archiválásához kapcsolódó kurátori tevékenységekről szóltak az összefoglalók. A szerkesztők az előszóban ez alkalommal is a legfrissebb szakirodalom segítségével vázolják fel a webarchiválás általánosabb kontextusát, eddigi történetének kronológiáját és főbb szereplőit. Jellemzik a főbb intézményeket, melyek e tevékenységeket végzik. Az Internet Archive úttörő szerepe mellett rámutatnak arra, hogy míg egyes országokban egyetlen vezető intézmény köré csoportosul e tevékenység (például Dániában), addig máshol intézményi koordináció tapasztalható világosan elkülönülő szerepkörökkel (pl. Franciaország, Nagy-Britannia). Rövid tájékoztatást kapunk arról, hogy milyen szoftverháttérrel történik az anyagok begyűjtése, s milyen módszerekkel lehet az eltárolt webes információk visszakeresését biztosítani (pl. az Internet Archive által fejlesztett Wayback Machine szoftverrel URL címekre kereshetünk, ezt egészíti ki a teljesszövegű index szolgáltatás, már amelyik gyűjteményben éppen elérhető).
Hozzáadás dátuma 2021. 08. 09. 8:43:29
Módosítás dátuma 2021. 08. 09. 8:43:29

Címkék:

  • adattudomány
  • könyvrecenzió
  • webtörténelem

a2o: Access to Archives from the National Archives of Singapore

Típus Folyóiratcikk
Szerző Sarah Beasley
Szerző Candice Kail
URL http://www.tandfonline.com/doi/abs/10.1080/19322900902896531
Kötet 3
Szám 2
Oldalszám 149-155
Kiadvány Journal of Web Librarianship
ISSN 1932-2909
Dátum 2009-06-23
Egyéb Number: 2
DOI 10.1080/19322900902896531
Kivonat The article offers information about a2o that was created by the National Archives of Singapore in 2009. Accordingly, a2o is taken after the chemical symbol of water, which is considered as an essential element of life. It provides access to various databases, photographs, maps and plans, oral history audio files, and other audiovisual recordings in multiple ways. It also offers a variety of online exhibitions, including "Colours Behind Barbed Wires: A Prisoner of War's Story through Haxworth's Sketches" and "Colours in the Wind: Hill Street Police Station in Retrospect."
Hozzáadás dátuma 2021. 08. 09. 8:42:47
Módosítás dátuma 2021. 08. 09. 8:42:47

Academic Social Networking Sites are Smaller, Denser Networks Conducive to Formal Identity Management, Whereas Academic Twitter is Larger, More Diffuse, and Affords More Space for Novel Connections

Típus Folyóiratcikk
Szerző Scott Goldstein
URL http://search.ebscohost.com/login.aspx?direct=true&db=lxh&AN=142859999&lang=hu&site=ehost-live
Kötet 15
Szám 1
Oldalszám 226-228
Kiadvány Evidence Based Library & Information Practice
ISSN 1715720X
Dátum January 2020
Egyéb Number: 1
Folyóirat rövid neve Evidence Based Library & Information Practice
DOI 10.18438/eblip29687
Hozzáférés 2021. 07. 16. 11:14:25
Könyvtár Katalógus EBSCOhost
Kivonat Objective – To examine the structure of academics' online social networks and how academics understand and interpret them. Design – Mixed methods consisting of network analysis and semi-structured interviews. Setting – Academics based in the United Kingdom. Subjects – 55 U.K.-based academics who use an academic social networking site and Twitter, of whom 18 were interviewed. Methods – For each subject, ego-networks were collected from Twitter and either ResearchGate or Academia.edu. Twitter data were collected primarily via the Twitter API, and the social networking site data were collected either manually or using a commercial web scraping program. Edge tables were created in Microsoft Excel spreadsheets and imported into Gephi for analysis and visualization. A purposive subsample of subjects was interviewed via Skype using a semi-structured format intended to illuminate further the network analysis findings. Transcripts were deductively coded using a grounded theory-based approach. Main Results – Network analysis replicated earlier findings in the literature. A large number of academics have relatively few connections to others in the network, while a small number have relatively many connections. In terms of reciprocity (the proportion of mutual ties or pairings out of all possible pairings that could exist in the network), arts and humanities disciplines were significantly more reciprocal. Communities (measured using the modularity algorithm, which looks at the density of links within and between different subnetworks) are more frequently defined by institutions and research interests on academic social networking sites and by research interests and personal interests on Twitter. The overall picture was reinforced by the qualitative analysis. According to interview participants, academic social networking sites reflect pre-existing professional relationships and do not foreground social interaction, serving instead as a kind of virtual CV. By contrast, Twitter is analogized to a conference coffee break, where users can form new connections.
Hozzáadás dátuma 2021. 08. 09. 8:44:37
Módosítás dátuma 2021. 08. 09. 8:44:37

Címkék:

  • Social media
  • Online social networks
  • Academic librarians

Access and Scholarly Use of Web Archives

Típus Folyóiratcikk
Szerző Helen Hockx-Yu
URL https://search.proquest.com/docview/1623365740?accountid=27464
Kötet 25
Szám 1/2
Oldalszám 113-127
Kiadvány Alexandria
ISSN 0955-7490
Dátum 2014
Egyéb Number: 1/2
PMID: 1623365740
Publisher: Sage Publications Ltd.
Place: London
Nyelv English
Hozzáadás dátuma 2021. 08. 09. 8:41:40
Módosítás dátuma 2021. 08. 09. 8:41:40

Címkék:

  • Library And Information Sciences

Access Patterns for Robots and Humans in Web Archives

Típus Dolgozat
Szerző Yasmin A AlNoamany
Szerző Michele C Weigle
Szerző Michael L Nelson
URL http://doi.acm.org/10.1145/2467696.2467722
Hely New York, NY, USA
Kiadó ACM
Oldalszám 339-348
ISBN 978-1-4503-2077-1
Dátum 2013
Egyéb Series Title: JCDL '13
Citation Key: AlNoamany:2013:APR:2467696.2467722
DOI 10.1145/2467696.2467722
Kivonat Although user access patterns on the live web are well-understood, there has been no corresponding study of how users, both humans and robots, access web archives. Based on samples from the Internet Archive's public Wayback Machine, we propose a set of basic usage patterns: Dip (a single access), Slide (the same page at different archive times), Dive (different pages at approximately the same archive time), and Skim (lists of what pages are archived, i.e., TimeMaps). Robots are limited almost exclusively to Dips and Skims, but human accesses are more varied between all four types. Robots outnumber humans 10:1 in terms of sessions, 5:4 in terms of raw HTTP accesses, and 4:1 in terms of megabytes transferred. Robots almost always access TimeMaps (95% of accesses), but humans predominately access the archived web pages themselves (82% of accesses). In terms of unique archived web pages, there is no overall preference for a particular time, but the recent past (within the last year) shows significant repeat accesses.
Kiadvány címe Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:18
Módosítás dátuma 2021. 08. 09. 8:43:18

Címkék:

  • web archiving
  • user access patterns
  • web robot detection
  • web server logs
  • web usage mining

Accessing Web Archives: Integrating an Archive-It Collection into EBSCO Discovery Service

Típus Folyóiratcikk
Szerző Christina A. Beis
Szerző Kayla Nicole Harris
Szerző Stephanie L. Shreffler
URL https://www.tandfonline.com/doi/full/10.1080/19322909.2019.1625844
Kötet 13
Szám 3
Oldalszám 246-259
Kiadvány Journal of Web Librarianship
ISSN 1932-2909
Dátum 2019-07-03
Egyéb Number: 3
DOI 10.1080/19322909.2019.1625844
Kivonat Effective collaboration between archives and technical services can increase the discoverability of special collection materials. Archivists at the University of Dayton Libraries began using Archive-It to capture websites relevant to their collecting policies in 2015. However, the collections were only made available to users from the University of Dayton page on the Archive-It website. Content was isolated in a separate platform and was not promoted to users. Working together, the team of archivists and technical services librarians incorporated the web archive collections into the Libraries' EBSCO Discovery Service (EDS) discovery layer. A local data dictionary was created based on OCLC's Descriptive Metadata for Web Archiving report (2018), and metadata was added at the seed and collection levels. The result was indexed content on a single, user-friendly platform. The web archive collections were then marketed to the University of Dayton community, and statistics were generated on their use. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:43:40
Módosítás dátuma 2021. 08. 09. 8:43:40

Címkék:

  • web archiving
  • academic libraries
  • Archive-It
  • collaboration
  • discovery layer: EBSCO Discovery Services: metadat
  • social media
  • special collections
  • web-scale discovery

Accountability and accessibility: ensuring the evidence of e‐governance in Australia

Típus Folyóiratcikk
Szerző Adrian Cunningham
Szerző Margaret Phillips
Szerkesztő Caroline Auty
URL https://www.emerald.com/insight/content/doi/10.1108/00012530510612059/full/html
Kötet 57
Szám 4
Oldalszám 301-317
Kiadvány Aslib Proceedings
ISSN 0001-253X
Dátum 08/2005
Egyéb Number: 4
Folyóirat rövid neve AP
DOI 10.1108/00012530510612059
Hozzáférés 2020. 08. 20. 10:21:34
Könyvtár Katalógus DOI.org (Crossref)
Nyelv en
Rövid cím Accountability and accessibility
Hozzáadás dátuma 2021. 08. 09. 8:43:50
Módosítás dátuma 2021. 08. 09. 8:43:50

Acquiring Web Content From In-Memory Cache

Típus Dolgozat
Szerző Abhinav Kumar
Szerző Zhiwu Xie
URL http://dl.acm.org/citation.cfm?doid=3197026.3203868
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 359-360
ISBN 978-1-4503-5178-2
Dátum 2018
DOI 10.1145/3197026.3203868
Kivonat Web content acquisition forms the foundation of value extraction of web data. Two main categories of acquisition methods are crawler based methods and transactional web archiving or server-side acquisition methods. In this poster, we propose a new method to acquire web content from web caches. Our method provides improvement in terms of reduced penalty on HTTP transaction, flexibility to accommodate peak web server loads and minimal involvement of System Administrator to set up the system.
Kiadvány címe Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries – JCDL '18
Hozzáadás dátuma 2021. 08. 09. 8:42:40
Módosítás dátuma 2021. 08. 09. 8:42:40

Címkék:

  • Web archiving
  • in-memory cache
  • Memcached

Adaptive search systems for web archive research

Típus Dolgozat
Szerző Hugo C. Huurdeman
URL http://dl.acm.org/citation.cfm?doid=2637002.2637063
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 354-356
ISBN 978-1-4503-2976-7
Dátum 2014
DOI 10.1145/2637002.2637063
Kivonat The wealth of digital information available in our time has become indispensable for a rich variety of tasks. We use data on the Web for work, leisure, and research, aided by various search systems, allowing us to find small needles in giant haystacks. Despite recent advances in personalization and contextualization, however, various types of tasks, ranging from simple lookup tasks to complex, exploratory and ana- lytical ventures, are mainly supported in elementary, “one- size-fits-all” search interfaces. Web archives, keepers of our future cultural heritage, have gathered petabytes of valuable Web data, which characterize our times for future generations. Access to these archives, however, is surprisingly limited: online Web archives usu- ally provide a URL-based Wayback Machine interface, some- times extended with rudimentary search options. As a re- sult of limited access, Web archives have not been widely used for research so far. For emerging research using Web archives, there is a need to move beyond URL-based and simple search access, towards providing support for complex (re)search tasks. In my thesis, I am exploring ways to move beyond the “one-size-fits-all” approach for search systems, and I work on systems which can support the flow of complex search, also in the context of archived Web data. Rich models of search and research can be incorporated into adaptive search systems, supporting search strategies in various stages of complex search tasks. Concretely, I look at the use case of the Humanities researcher, for which the large, Terabyte- scale Web archives can be a valuable addition to existing sources utilized to perform research
Kiadvány címe Proceedings of the 5th Information Interaction in Context Symposium on – IIiX '14
Hozzáadás dátuma 2021. 08. 09. 8:41:50
Módosítás dátuma 2021. 08. 09. 8:41:50

Agent-based Approach to WEB Exploration Process

Típus Folyóiratcikk
Szerző Andrzej Opalinski
Szerző Edward Nawarecki
Szerző Stanislawa Kluska-Nawarecka
URL http://10.0.3.248/j.procs.2015.05.263
Kötet 51
Szám International Conference On Computational Science, ICCS 2015
Oldalszám 1052-1061
Kiadvány Procedia Computer Science
ISSN 1877-0509
Dátum 2015-01-01
Egyéb Number: International Conference On Computational Science, ICCS 2015
Publisher: Elsevier B.V.
DOI 10.1016/j.procs.2015.05.263
Kivonat The paper contains the concept of agent-based search system and monitoring of Web pages. It is oriented at the exploration of limited problem area, covering a given sector of industry or economy. The proposal of agent-based (modular) structure of the system is due to the desire to ease the introduction of modifications or enrichment of its functionality. Commonly used search engines do not offer such a feature. The second part of the article presents a pilot version of the WEB mining system, represent- ing a simplified implementation of the previously presented concept. Testing of the implemented application was executed by referring to the problem area of foundry industry.
Hozzáadás dátuma 2021. 08. 09. 8:41:53
Módosítás dátuma 2021. 08. 09. 8:41:53

An analytical system for evaluating academia units based on metrics provided by academic social network

Típus Folyóiratcikk
Szerző Lukasz Wiechetek
Szerző Kongkiti Phusavat
Szerző Zbigniew Pastuszak
URL http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=145756320&lang=hu&site=ehost-live
Kötet 159
Oldalszám N.PAG-N.PAG
Kiadvány Expert Systems with Applications
ISSN 09574174
Dátum November 30, 2020
Folyóirat rövid neve Expert Systems with Applications
DOI 10.1016/j.eswa.2020.113608
Hozzáférés 2021. 07. 16. 11:13:21
Könyvtár Katalógus EBSCOhost
Kivonat • Data from social networks can be used for researchers and research units evaluation. • Formal and natural sciences researchers more often use RG and have higher metrics. • Different types of researchers (position, field) shouldn't be directly compared. • Free software allows developing of analytical tools for fast scientists evaluation. Social networks are becoming more and more popular, not only among young people looking for entertainment, but also among specialists, experts and researchers who wish to establish professional networks, develop business or research projects. They may be useful also for the comparison and evaluation of scientists and research organizations. This study aims to show how to build a framework of an analytical system for evaluation of researchers and research units using the data retrieved from an academic social network. Acquired data are used to find out the main differences between ResarchGate (RG) usage and values of metrics owned by scientists of different gender, scientific title and field of study to find out if various groups of employees can be directly compared. The authors apply web scraping technique for collecting data from university web page (2847 employees) and use R scripts to acquire the metrics form RG portal. Also, data of 1497 researchers and teaching workers from 11 faculties at Polish university were explored. The descriptive statistics, Chi square test, ANOVA and logistic regression were used to analyse the main RG metrics: RG Score, number of publications, reads and citations. Analysis shows the significant differences both in terms of popularity of ResearchGate and values of its main metrics. The research confirmed that 1) the rvest package allows for fast data acquisition from RG, 2) RG metrics can be used by university managers to compare achievements and progress of single researchers, research labs, departments or faculties, 3) Researchers employed at the faculties of formal and natural sciences use RG portal more frequently, possess higher values of RG metrics, therefore different types of workers and various branches of science shouldn't be compared directly.
Hozzáadás dátuma 2021. 08. 09. 8:44:37
Módosítás dátuma 2021. 08. 09. 8:44:37

Címkék:

  • WEBSITES
  • Web scraping
  • Academic social network
  • ACQUISITION of data
  • Analytical system
  • CHI-squared test
  • Comparative analysis
  • DESCRIPTIVE statistics
  • FREEWARE (Computer software)
  • ResearchGate
  • SOCIAL networks
  • University evaluation

An Efficient Clustering Algorithm for Large-scale Topical Web Pages

Típus Dolgozat
Szerző Lei Wang
Szerző Peng Chen
Szerző Lian'en Huang
URL http://doi.acm.org/10.1145/1645953.1646247
Hely New York, NY, USA
Kiadó ACM
Oldalszám 1851-1854
ISBN 978-1-60558-512-3
Dátum 2009
Egyéb Series Title: CIKM '09
Citation Key: Wang:2009:ECA:1645953.1646247
DOI 10.1145/1645953.1646247
Kivonat The clustering of topic-related web pages has been recognized as a foundational work in exploiting large sets of web pages such as the cases in search engines and web archive systems, which collect and preserve billions of web pages. However, this task faces great challenges both in efficiency and accuracy. In this paper we present a novel clustering algorithm for large scale topical web pages which achieves high efficiency together with considerately high accuracy. In our algorithm, a two-phase divide and conquer framework is developed to solve the efficiency problem, in which both link analysis and content analysis are utilized in mining the topical similarity between pages to achieve a high accuracy. A comprehensive experiment was conducted to evaluate our method in terms of its effectiveness, efficiency, and quality of result.
Kiadvány címe Proceedings of the 18th ACM Conference on Information and Knowledge Management
Hozzáadás dátuma 2021. 08. 09. 8:43:11
Módosítás dátuma 2021. 08. 09. 8:43:11

Címkék:

  • content analysis
  • link analysis
  • clustering
  • topic model
  • topical similarity

An Empirical Comparison of Web Page Segmentation Algorithms

Típus Könyvfejezet
Szerző Johannes Kiesel
Szerző Lars Meyer
Szerző Florian Kneist
Szerző Benno Stein
Szerző Martin Potthast
Szerkesztő Djoerd Hiemstra
Szerkesztő Marie-Francine Moens
Szerkesztő Josiane Mothe
Szerkesztő Raffaele Perego
Szerkesztő Martin Potthast
Szerkesztő Fabrizio Sebastiani
URL http://link.springer.com/10.1007/978-3-030-72240-1_5
Kötet 12657
Hely Cham
Kiadó Springer International Publishing
Oldalszám 62-74
ISBN 978-3-030-72239-5 978-3-030-72240-1
Dátum 2021
Egyéb Series Title: Lecture Notes in Computer Science
DOI: 10.1007/978-3-030-72240-1_5
Hozzáférés 2021. 07. 15. 10:49:13
Könyvtár Katalógus DOI.org (Crossref)
Nyelv en
Kivonat Over the past two decades, several algorithms have been developed to segment a web page into semantically coherent units, a task with several applications in web content analysis. However, these algorithms have hardly been compared empirically and it thus remains unclear which of them—or rather, which of their underlying paradigms—performs best. To contribute to closing this gap, we report on the reproduction and comparative evaluation of five segmentation algorithms on a large, standardized benchmark dataset for web page segmentation: Three of the algorithms have been specifically developed for web pages and have been selected to represent paradigmatically different approaches to the task, whereas the other two approaches originate from the segmentation of photos and print documents, respectively. For a fair comparison, we tuned each algorithm’s parameters, if applicable, to the dataset. Altogether, the classic rule-based VIPS algorithm achieved the highest performance, closely followed by the purely visual approach of Cormier et al. For reproducibility, we provide our reimplementations of the algorithms along with detailed instructions.
Könyv címe Advances in Information Retrieval
Hozzáadás dátuma 2021. 08. 09. 8:44:14
Módosítás dátuma 2021. 08. 09. 8:44:14

An Evaluation of Caching Policies for Memento Timemaps

Típus Dolgozat
Szerző Justin F Brunelle
Szerző Michael L Nelson
URL http://doi.acm.org/10.1145/2467696.2467717
Hely New York, NY, USA
Kiadó ACM
Oldalszám 267-276
ISBN 978-1-4503-2077-1
Dátum 2013
Egyéb Series Title: JCDL '13
Citation Key: Brunelle:2013:ECP:2467696.2467717
DOI 10.1145/2467696.2467717
Kivonat As defined by the Memento Framework, TimeMaps are machine-readable lists of time-specific copies — called "mementos" — of an archived original resource. In theory, as an archive acquires additional mementos over time, a TimeMap should be monotonically increasing. However, there are reasons why the number of mementos in a TimeMap would decrease, for example: archival redaction of some or all of the mementos, archival restructuring, and transient errors of one or more archives. We study TimeMaps for 4,000 original resources over a three month period, note their change patterns, and develop a caching algorithm for TimeMaps suitable for a reverse proxy in front of a Memento aggregator. We show that TimeMap cardinality is constant or monotonically increasing for 80.2% of all TimeMap downloads in the observation period. The goal of the caching algorithm is to exploit the ideally monotonically increasing nature of TimeMaps and not cache responses with fewer mementos than the already cached TimeMap. This new caching algorithm uses conditional cache replacement and a Time To Live (TTL) value to ensure the user has access to the most complete TimeMap available. Based on our empirical data, a TTL of 15 days will minimize the number of mementos missed by users, and minimize the load on archives contributing to TimeMaps.
Kiadvány címe Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:13
Módosítás dátuma 2021. 08. 09. 8:43:13

Címkék:

  • web archiving
  • digital preservation
  • memento
  • http
  • timemaps
  • web architecture

An Exploratory Study of Advantages and Disadvantages of Website Preservation

Típus Folyóiratcikk
Szerző Rattahpinnusa Haresariu Handisa
URL https://e-journal3.unair.ac.id/index.php/rlj/article/view/113
Jogok Copyright (c) 2021 Rattahpinnusa Haresariu Handisa
Kötet 7
Szám 1
Oldalszám 1-6
Kiadvány Record and Library Journal
ISSN 2442-5168
Dátum 2021-06-29
Egyéb Number: 1
DOI 10.20473/rlj.v7i1.113
Hozzáférés 2021. 07. 15. 11:26:47
Könyvtár Katalógus e-journal3.unair.ac.id
Nyelv en
Hozzáadás dátuma 2021. 08. 09. 8:44:25
Módosítás dátuma 2021. 08. 09. 8:44:25

Címkék:

  • Accessible website

An Overview of Web Archiving

Típus Folyóiratcikk
Szerző Jinfang Niu
URL https://search.proquest.com/docview/1266143627?accountid=27464
Kötet 18
Szám 3-4
Kiadvány D-Lib Magazine
ISSN 1082-9873, 1082-9873
Dátum 2012-03
Egyéb Number: 3-4
Publisher: Corporation for National Research Initiatives, Reston, VA
Place: University of South Florida jinfang@usf.edu
DOI 10.1045/march2012-niu1
Nyelv English
Kivonat This overview is a study of the methods used at a variety of universities, and international government libraries and archives, to select, acquire, describe and access web resources for their archives. Creating a web archive presents many challenges, and library and information schools should ensure that instruction in web archiving methods and skills is made part of their curricula, to help future practitioners meet those challenges. In preparation for developing a web archiving course, the author conducted a comprehensive literature review. The findings are reported in this paper, along with the author's views on some of the methods in use, such as how traditional archive management concepts and theories can be applied to the organization and description of archived web resources. Adapted from the source document.
Hozzáadás dátuma 2021. 08. 09. 8:42:14
Módosítás dátuma 2021. 08. 09. 8:42:14

Címkék:

  • Web archiving
  • Digital preservation
  • Web archive
  • 9.15: TECHNICAL SERVICES – PRESERVATION
  • article
  • Methods
  • Government libraries
  • Universities
  • web archive methods
  • web resources

Analysing and Enriching Focused Semantic Web Archives for Parliament Applications

Típus Folyóiratcikk
Szerző Elena Demidova
Szerző Nicola Barbieri
Szerző Stefan Dietze
Szerző Adam Funk
Szerző Helge Holzmann
Szerző Diana Maynard
Szerző Nikolaos Papailiou
Szerző Wim Peters
Szerző Thomas Risse
Szerző Dimitris Spiliotopoulos
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Szám 3
Oldalszám 433
Kiadvány Future Internet, Vol 6, Iss 3, Pp 433-456 (2014) VO – 6
ISSN 1999-5903
Dátum 2014
Egyéb Number: 3
Publisher: MDPI AG
DOI 10.3390/fi6030433
Kivonat The web and the social web play an increasingly important role as an information source for Members of Parliament and their assistants, journalists, political analysts and researchers. It provides important and crucial background information, like reactions to political events and comments made by the general public. The case study presented in this paper is driven by two European parliaments (the Greek and the Austrian parliament) and targets an effective exploration of political web archives. In this paper, we describe semantic technologies deployed to ease the exploration of the archived web and social web content and present evaluation results.
Hozzáadás dátuma 2021. 08. 09. 8:43:01
Módosítás dátuma 2021. 08. 09. 8:43:01

Címkék:

  • web archiving
  • Information technology
  • T58.5-58.64
  • enrichment
  • entity and event extraction
  • parliament libraries
  • semantic content analysis
  • topic detection

Analyzing web archives through topic and event focused sub-collections

Típus Dolgozat
Szerző Gerhard Gossen
Szerző Elena Demidova
Szerző Thomas Risse
URL http://dl.acm.org/citation.cfm?doid=2908131.2908175
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 291-295
ISBN 978-1-4503-4208-7
Dátum 2016
DOI 10.1145/2908131.2908175
Kiadvány címe Proceedings of the 8th ACM Conference on Web Science – WebSci '16
Hozzáadás dátuma 2021. 08. 09. 8:41:50
Módosítás dátuma 2021. 08. 09. 8:41:50

Címkék:

  • Web archive
  • events
  • sub-collection
  • topics

API-based social media collecting as a form of web archiving

Típus Folyóiratcikk
Szerző Justin Littman
Szerző Daniel Chudnov
Szerző Daniel Kerchner
Szerző Christie Peterson
Szerző Yecheng Tan
Szerző Rachel Trent
Szerző Rajat Vij
Szerző Laura Wrubel
URL https://search.proquest.com/docview/2002183484?accountid=27464
Kötet 19
Szám 1
Oldalszám 21-38
Kiadvány International Journal on Digital Libraries
ISSN 14325012
Dátum 2018-03
Egyéb Number: 1
Publisher: Springer Science & Business Media
Place: GW Libraries, The George Washington University, Washington, DC, USA ; District Data Labs, Washington, DC, USA ; GW Libraries, The George Washington University, Washington, DC, USA
DOI http://dx.doi.org/10.1007/s00799-016-0201-7
Nyelv English
Kivonat Social media is increasingly a topic of study across a range of disciplines. Despite this popularity, current practices and open source tools for social media collecting do not adequately support today’s scholars or support building robust collections for future researchers. We are continuing to develop and improve Social Feed Manager (SFM), an open source application assisting scholars collecting data from Twitter’s API for their research. Based on our experience with SFM to date and the viewpoints of archivists and researchers, we are reconsidering assumptions about API-based social media collecting and identifying requirements to guide the application’s further development. We suggest that aligning social media collecting with web archiving practices and tools addresses many of the most pressing needs of current and future scholars conducting quality social media research. In this paper, we consider the basis for these new requirements, describe in depth an alignment between social media collecting and web archiving, outline a technical approach for effecting this alignment, and show how the technical approach has been implemented in SFM.
Hozzáadás dátuma 2021. 08. 09. 8:42:20
Módosítás dátuma 2021. 08. 09. 8:42:20

Címkék:

  • Web archiving
  • Web archives
  • Archives
  • Archiving
  • Library And Information Sciences–Computer Applica
  • Researchers
  • Alignment
  • Data collection – Twitter
  • Digital media
  • Freeware
  • Media
  • Social media
  • Social networks
  • Open source software
  • Acquisition of data
  • Application program interfaces

APPLICATION OF WEB ARCHIVING TECHNOLOGIES IN BNL AND NAB: A PROPOSED MODEL

Típus Folyóiratcikk
Szerző Rifat Mahmud
Szerző Raiyan Bin Reza
Kötet 25
Oldalszám 17
Dátum 2020
Könyvtár Katalógus Zotero
Nyelv en
Kivonat Web archiving has become a regular activity in many libraries and archival institutions. With the massive spread of internet, preserving the web contents are now being given importance by various countries as the websites contain various important legal, political, educational information. This paper investigates the issues related to web archiving that might be faced by the Bangladesh National Library (BNL) and National Archives of Bangladesh (NAB). The main aim of this paper is to describe current state of web archiving in BNL and NAB. This paper also tried to explore identifying the problems in archiving web contents and proving possible solutions to overcome the problems. Interview method was applied for this study. We interviewed officials from both BNL and NAB and explored relevant literatures to gather information for our work. Web archiving activities are found to be useful in many government libraries and archival institutions around the globe but it is yet to be done in BNL and NAB. The study found that there are many challenges for implementing web archiving in BNL and NAB such as technological difficulties, copyright issues, unskilled manpower, lack of logistical support, etc. should be taken into account while implementing any web archiving programme. Sufficient steps like proper planning, efficient training, logistical support, international cooperation and adequate financial support will help the authorities to establish a successful web archiving programme. Finally, we proposed an intuitive model for NAB and BNL so that it could be considered while taking any web archiving initiative.
Hozzáadás dátuma 2021. 08. 09. 8:43:56
Módosítás dátuma 2021. 08. 09. 8:43:56

Appraisal Talk in Web Archives

Típus Folyóiratcikk
Szerző Ed Summers
URL https://www.proquest.com/scholarly-journals/appraisal-talk-web-archives/docview/2518362480/se-2?accountid=15756
Kötet 89
Oldalszám 70-103
Kiadvány Archivaria
ISSN 03186954
Dátum Spring 2020
Pontos lelőhely 2518362480
Egyéb Place: Ottawa
Publisher: Association of Canadian Archivists
Nyelv English
Kivonat The Web is a vast and constantly changing information landscape that by its very nature seems to resist the idea of the archive. But for the last 20 years, archivists and technologists have worked together to build systems for doing just that. While technical infrastructures for performing web archiving have been well studied, surprisingly little is known about the interactions between archivists and these infrastructures. How do archivists decide what to archive from the Web? How do the tools for archiving the Web shape these decisions? This study analyzes a series of ethnographic interviews with web archivists to understand how their decisions about what to archive function as part of a community of practice. It uses critical discourse analysis to examine how the participants’ use of language enacts their appraisal decision-making processes. Findings suggest that the politics and positionality of the archive are reflected in the ways that archivists talk about their network of personal and organizational relationships. Appraisal decisions are expressive of the structural relationships of an archives as well as of the archivists’ identities, which form during mentoring relationships. Self-reflection acts as a key method for seeing the ways that interviewers and interviewees work together to construct the figured worlds of the web archive. These factors have implications for the ways archivists communicate with each other and interact with the communities that they document. The results help ground the encounter between archival practice and the architecture of the Web.Alternatív absztrakt:
Le Web est un paysage informationnel vaste et en changement constant qui, par sa nature même, semble s’opposer à l’idée de l’archive. Pourtant, depuis les vingt dernières années, les archivistes et technologues ont travaillé de concert afin de bâtir des systèmes qui feraient exactement ça. Bien que les infrastructures technologiques pour archiver le Web ont été abondamment étudiées, on en sait étonnamment peu à propos des interactions entre les archivistes et ces infrastructures. Comment les archivistes décident de ce qui sera archivé du Web? Comment les outils d’archivage du Web modèlent leurs décisions? La présente étude analyse une série d’entretiens ethnographique avec des archivistes du Web afin de comprendre comment leurs décisions concernant ce qui doit être archivé s’articulent en fonction d’une communauté de pratique. Elle utilise l’analyse critique du discours pour examiner comment l’utilisation du langage par les participants joue un rôle dans leurs processus de prise de décision d’évaluation. Les résultats suggèrent que les politiques et le positionnement des archives sont reflétés dans la manière dont les archivistes parlent de leurs réseaux de relations personnelles et organisationnelles. Les décisions d’évaluation sont l’expression des relations structurelles d’une archive et des identités de l’archiviste, qui sont forgées au cours des relations de mentorat. L’introspection agit comme méthode essentielle pour voir la façon dont les intervieweurs et les interviewés travaillent de concert pour construire les mondes façonnés des archives du Web. Ces facteurs ont des répercussions sur les façons dont les archivistes communiquent entre eux et interagissent avec les communautés qu’ils documentent. Ces résultats aident à ancrer la rencontre entre la pratique archivistique et l’architecture du Web.
Archívum ProQuest One Academic
Hozzáadás dátuma 2021. 08. 09. 8:44:40
Módosítás dátuma 2021. 08. 09. 8:44:40

Címkék:

  • Web archiving
  • Digital archives
  • Library And Information Sciences
  • Archivists
  • Infrastructure
  • Decision making

Archival Crawlers and JavaScript: Discover More Stuff but Crawl More Slowly

Típus Dolgozat
Szerző Justin F Brunelle
Szerző Michele C Weigle
Szerző Michael L Nelson
URL http://dl.acm.org/citation.cfm?id=3200334.3200336
Hely Piscataway, NJ, USA
Kiadó IEEE Press
Oldalszám 1-10
ISBN 978-1-5386-3861-3
Dátum 2017
Egyéb Series Title: JCDL '17
Citation Key: Brunelle:2017:ACJ:3200334.3200336
Kivonat The web is today's primary publication medium, making web archiving an important activity for historical and analytical purposes. Web pages are increasingly interactive, resulting in pages that are correspondingly difficult to archive. JavaScript enables interactions that can potentially change the client-side state of a representation. We refer to representations that load embedded resources via JavaScript as deferred representations. It is difficult to discover and crawl all of the resources in deferred representations and the result of archiving deferred representations is archived web pages that are either incomplete or erroneously load embedded resources from the live web. We propose a method of discovering and archiving deferred representations and their descendants (representation states) that are only reachable through client-side events. Our approach identified an average of 38.5 descendants per seed URI crawled, 70.9% of which are reached through an onclick event. This approach also added 15.6 times more embedded resources than Heritrix to the crawl frontier, but at a crawl rate that was 38.9 times slower than simply using Heritrix. If our method was applied to the July 2015 Common Crawl dataset, a web-scale archival crawler will discover an additional 7.17 PB (5.12 times more) of information per year. This illustrates the significant increase in resources necessary for more thorough archival crawls.
Kiadvány címe Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:20
Módosítás dátuma 2021. 08. 09. 8:43:20

Címkék:

  • web archiving
  • digital preservation
  • memento
  • web crawling

Archival HTTP redirection retrieval policies

Típus Dolgozat
Szerző Ahmed AlSum
Szerző Michael L. Nelson
Szerző Robert Sanderson
Szerző Herbert Van de Sompel
URL http://dl.acm.org/citation.cfm?doid=2487788.2488117
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 1051-1058
ISBN 978-1-4503-2038-2
Dátum 2013
DOI 10.1145/2487788.2488117
Kivonat When retrieving archived copies of web resources (mementos) from web archives, the original resource's URI-R is typically used as the lookup key in the web archive. This is straightforward until the resource on the live web issues a redirect: R ->R`. Then it is not clear if R or R` should be used as the lookup key to the web archive. In this paper, we report on a quantitative study to evaluate a set of policies to help the client discover the correct memento when faced with redirection. We studied the stability of 10,000 resources and found that 48% of the sample URIs tested were not stable, with respect to their status and redirection location. 27% of the resources were not perfectly reliable in terms of the number of mementos of successful responses over the total number of mementos, and 2% had a reliability score of less than 0.5. We tested two retrieval policies. The first policy covered the resources which currently issue redirects and successfully resolved 17 out of 77 URIs that did not have mementos of the original URI, but did of the resource that was being redirected to. The second policy covered archived copies with HTTP redirection and helped the client in 58% of the cases tested to discover the nearest memento to the requested datetime.
Kiadvány címe Proceedings of the 22nd International Conference on World Wide Web – WWW '13 Companion
Hozzáadás dátuma 2021. 08. 09. 8:43:22
Módosítás dátuma 2021. 08. 09. 8:43:22

Címkék:

  • Design
  • Experimentation
  • Standardization

Archival strategies for contemporary collecting in a world of big data: Challenges and opportunities with curating the UK web archive

Típus Folyóiratcikk
Szerző Nicola Jayne Bingham
Szerző Helena Byrne
URL https://doi.org/10.1177/2053951721990409
Kötet 8
Szám 1
Oldalszám 2053951721990409
Kiadvány Big Data & Society
ISSN 2053-9517
Dátum January 1, 2021
Egyéb Number: 1
Publisher: SAGE Publications Ltd
Folyóirat rövid neve Big Data & Society
DOI 10.1177/2053951721990409
Hozzáférés 2021. 07. 15. 10:07:11
Könyvtár Katalógus SAGE Journals
Nyelv en
Kivonat In this contribution, we will discuss the opportunities and challenges arising from memory institutions' need to redefine their archival strategies for contemporary collecting in a world of big data. We will reflect on this topic by critically examining the case study of the UK Web Archive, which is made up of the six UK Legal Deposit Libraries: the British Library, National Library of Scotland, National Library of Wales, Bodleian Libraries Oxford, Cambridge University Library and Trinity College Dublin. The UK Web Archive aims to archive, preserve and give access to the UK web space. This is achieved through an annual domain crawl, first undertaken in 2013, in addition to more frequent crawls of key websites and specially curated collections which date back as far as 2005. These collections reflect important aspects of British culture and events that shape society. This commentary will explore a number of questions including: what heritage is captured and what heritage is instead neglected by the UK Web archive? What heritage is created in the form of new data and what are its properties? What are the ethical issues that memory institutions face when developing these web archiving practices? What transformations are required to overcome such challenges and what institutional futures can we envisage?
Rövid cím Archival strategies for contemporary collecting in a world of big data
Hozzáadás dátuma 2021. 08. 09. 8:44:03
Módosítás dátuma 2021. 08. 09. 8:44:03

Címkék:

  • big data
  • Web archiving
  • legal deposit
  • ethics
  • heritage
  • researcher access

Archival strategies for contemporary collecting in a world of big data: Challenges and opportunities with curating the UK web archive – ProQuest

Típus Weboldal
URL https://www.proquest.com/docview/2546907231/B290DF7AC35D4B04PQ/11?accountid=15756
Dátum 2021-07-16 07:26:11
Hozzáférés 2021. 07. 16. 9:26:11
Nyelv hu
Kivonat Explore millions of resources from scholarly journals, books, newspapers, videos and more, on the ProQuest Platform.
Rövid cím Archival strategies for contemporary collecting in a world of big data
Hozzáadás dátuma 2021. 08. 09. 8:44:32
Módosítás dátuma 2021. 08. 09. 8:44:32

Archive This Moment D.C.: A Case Study of Participatory Collecting During COVID-19

Típus Folyóiratcikk
Szerző Julie Burns
Szerző Laura Farley
Szerző Siobhan C. Hagan
Szerző Paul Kelly
Szerző Lisa Warwick
URL https://journal.code4lib.org/articles/15534
Szám 50
Kiadvány The Code4Lib Journal
ISSN 1940-5758
Dátum 2021-02-10
Egyéb Number: 50
Hozzáférés 2021. 07. 15. 11:12:58
Könyvtár Katalógus Code4Lib Journal
Kivonat When the COVID-19 pandemic brought life in Washington, D.C. to a standstill in March 2020, staff at DC Public Library began looking for ways to document how this historic event was affecting everyday life. Recognizing the value of first-person accounts for historical research, staff launched Archive This Moment D.C. to preserve the story of daily life in the District during the stay-at-home order. Materials were collected from public Instagram and Twitter posts submitted through the hashtag #archivethismomentdc. In addition to social media, creators also submitted materials using an Airtable webform set up for the project and through email. Over 2,000 digital files were collected. , This article will discuss the planning, professional collaboration, promotion, selection, access, and lessons learned from the project; as well as the technical setup, collection strategies, and metadata requirements. In particular, this article will include a discussion of the evolving collection scope of the project and the need for clear ethical guidelines surrounding privacy when collecting materials in real-time.
Rövid cím Archive This Moment D.C.
Hozzáadás dátuma 2021. 08. 09. 8:44:21
Módosítás dátuma 2021. 08. 09. 8:44:21

Archive-It 2: Internet Archive Strives to Ensure Preservation and Accessibility

Típus Folyóiratcikk
Szerző Marji Mcclure
URL https://search.proquest.com/docview/213815870?accountid=27464
Kötet 29
Szám 8
Oldalszám 14-15
Kiadvány EContent
ISSN 15252531
Dátum 2006-10
Egyéb Number: 8
Publisher: Information Today, Inc.
Place: Wilton
Nyelv English
Kivonat Preserving seemingly ephemeral Web content is a daunting task. The problem is even more difficult because the content of Web pages changes and the pages themselves come and go with great frequency, which means simply collecting URLs is not enough to keep tabs on valuable content. To help make digital content preservation possible, Internet Archive, a San Francisco-based nonprofit has led a charge to effectively capture and store Web content. The project recently released Archive-It 2 in its continued effort to archive the Web. Version 2 of Archive-It offers several new features not available in Version 1. Subscribers can now conduct test crawls, which enable them to see the type of Web material that would populate a specific collection before it is archived permanently. There is also a metadata search capability, which allows metadata to be included in the text searches of materials in a collection.
Hozzáadás dátuma 2021. 08. 09. 8:42:09
Módosítás dátuma 2021. 08. 09. 8:42:09

Címkék:

  • Digital libraries
  • Archives & records
  • United States–US
  • 7500:Product planning & development
  • 8331:Internet services industry
  • 9190:United States
  • Business And Economics–Management
  • Service introduction
  • Web content delivery

Archive-It.

Típus Folyóiratcikk
Szerző Susan Leach-Murray
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 35
Szám 2
Oldalszám 214
Kiadvány Technical Services Quarterly
ISSN 07317131
Dátum 2018-04
Egyéb Number: 2
Kivonat The article reviews the website "Archive-It" located at https://archive-it.org, which is a subscription web archiving service that collects and assesses cultural heritage on the Internet.
Hozzáadás dátuma 2021. 08. 09. 8:42:08
Módosítás dátuma 2021. 08. 09. 8:42:08

Címkék:

  • WEB archiving
  • CULTURAL property — Computer network resources
  • WEBSITE reviews

ArchiveNow

Típus Dolgozat
Szerző Mohamed Aturban
Szerző Mat Kelly
Szerző Sawood Alam
Szerző John A. Berlin
Szerző Michael L. Nelson
Szerző Michele C. Weigle
URL http://dl.acm.org/citation.cfm?doid=3197026.3203880
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 321-322
ISBN 978-1-4503-5178-2
Dátum 2018
DOI 10.1145/3197026.3203880
Kivonat ArchiveNow is a Python module for preserving web pages in on- demand web archives. This module allows a user to submit a URI of a web page for archiving at several configured web archives. Once the web page is captured, ArchiveNow provides the user with links to the archived copies of the web page. ArchiveNow is initially configured to use four archives but is easily configurable to add or remove other archives. In addition to pushing web pages to public archives, ArchiveNow , through the use of Wget and Squidwarc , allows users to generate local WARC files, enabling them to create their own personal and private archives.
Kiadvány címe Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries – JCDL '18
Hozzáadás dátuma 2021. 08. 09. 8:43:23
Módosítás dátuma 2021. 08. 09. 8:43:23

Címkék:

  • Memento
  • WARC
  • Web Archiving

Archives of the Americas, (Mostly) Free Online

Típus Folyóiratcikk
Szerző Irene E McDermott
URL https://search.proquest.com/docview/1818627659?accountid=27464
Kötet 40
Szám 3
Oldalszám 27-29
Kiadvány Online Searcher
ISSN 23249684
Dátum 2016
Egyéb Number: 3
Publisher: Information Today, Inc.
Place: Medford
Nyelv English
Kivonat Established in 2008 to archive the transcribed texts of seminal documents in law, history, and diplomacy, the collection makes freely available important documents from ancient times, e.g., Agrarian Law, 111 BCE, right up to 2003, with "A Performance-Based Roadmap to a Permanent Two-State Solution to the Israeli-Palestinian Conflict." […]visit the Digital Public Library of America (dp.la). According to Maura Marx, director of the DPLA Secretariat, "The DPLA's goal is to bring the entire nation's rich cultural collections off the shelves and into the innovative environment of the Internet for people to discover, download, remix, reuse and build on in ways we haven't yet begun to imagine" (cyber. law.harvard.edu/node/95550).
Hozzáadás dátuma 2021. 08. 09. 8:42:34
Módosítás dátuma 2021. 08. 09. 8:42:34

Címkék:

  • Web archiving
  • Public libraries
  • Digital archives
  • Digitization
  • Computers–Internet
  • Internet
  • Library collections
  • American history
  • United States–US
  • Copyright
  • Museums
  • Photographs
  • Encyclopedias
  • Letters
  • Speeches
  • Treasuries

ArchiveSpark

Típus Dolgozat
Szerző Helge Holzmann
Szerző Vinay Goel
Szerző Avishek Anand
URL http://dl.acm.org/citation.cfm?doid=2910896.2910902
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 83-92
ISBN 978-1-4503-4229-2
Dátum 2016
DOI 10.1145/2910896.2910902
Kiadvány címe Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries – JCDL '16
Hozzáadás dátuma 2021. 08. 09. 8:41:49
Módosítás dátuma 2021. 08. 09. 8:41:49

Címkék:

  • Web Archives
  • Big Data
  • Data Extraction

ArchiveSpark – MS Independent Study Final Submission

Típus Jelentés
Szerző Andrej Galad
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Hely United States, North America
Dátum 2016
Intézmény Virginia Polytechnic Institute and State University
Kivonat This project expands upon the work at the Internet Archive of researcher Vinay Goel and of Jefferson Bailey (co-PI on two NSF-funded collaborative projects with Virginia Tech: IDEAL, GETAR) on the ArchiveSpark project – a framework for efficient Web archive access, extraction, and derivation. The main goal of the project is to quantitatively and qualitatively evaluate ArchiveSpark against mainstream Web archive processing solutions and extend it as necessary with regard to the processing of testing collections. This also relates to an IMLS funded project. This report describes the efforts and contributions made as part of this project. The primary focus of these efforts lies in the comprehensive evaluation of ArchiveSpark against existing archive-processing solutions (pure Apache Spark with pre-installed Warcbase tools and HBase) in a variety of environments and setups in order to comparatively analyze performance improvements that ArchiveSpark brings to the table as well as understand the shortcomings and tradeoffs of its usage under varying scenarios. ; IMLS LG-71-16-0037-16: Developing Library Cyberinfrastructure Strategy for Big Data Sharing and Reuse ; NSF IIS-1619028, III: Small: Collaborative Research: Global Event and Trend Archive Research (GETAR) ; NSF IIS – 1319578: III: Small: Integrated Digital Event Archiving and Library (IDEAL) ; Included are the final report (PDF + Word), the final presentation (PPTX + PDF), the ArchiveSpark demo in the form of Jupyter Notebook, and the software developed during this project.
Hozzáadás dátuma 2021. 08. 09. 8:42:37
Módosítás dátuma 2021. 08. 09. 8:42:37

Címkék:

  • Internet Archive
  • WARC
  • Web Archiving
  • Big data
  • ArchiveSpark
  • CDX
  • GETAR
  • HBase
  • IDEAL
  • ILMS
  • Spark

ArchiveWeb: Collaboratively Extending and Exploring Web Archive Collections

Típus Könyvfejezet
Szerző Zeon Trevor Fernando
Szerző Ivana Marenzi
Szerző Wolfgang Nejdl
Szerző Rishita Kalyani
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Oldalszám 107-118
Dátum 2016-01
Egyéb DOI: 10.1007/978-3-319-43997-6_9
ISSN: 9783319439969
Könyv címe Research & Advanced Technology for Digital Libraries: 20th International Conference on Theory & Practice of Digital Libraries, TPDL 2016, Hannover, Germany, September 5-9, 2016, Proceedings
Hozzáadás dátuma 2021. 08. 09. 8:41:55
Módosítás dátuma 2021. 08. 09. 8:41:55

Archiving and Analysing Techniques of the Ultra-Large-Scale Web-Based Corpus Project of NINJAL, Japan

Típus Folyóiratcikk
Szerző Masayuki Asahara
Szerző Kikuo Maekawa
Szerző Mizuho Imada
Szerző Sachi Kato
Szerző Hikari Konishi
URL http://10.0.28.59/ALX.0024
Kötet 25
Szám 1-2
Oldalszám 129-148
Kiadvány Alexandria: The Journal of National and International Library and Information Issues
ISSN 0955-7490
Dátum 2014-08
Egyéb Number: 1-2
DOI 10.7227/ALX.0024
Kivonat In 2011, the National Institute for Japanese Language and Linguistics (NINJAL) launched a corpus compilation project to construct a web corpus for linguistic research comprising ten billion words by 2016. The project is divided into four categories: Page Collection, Linguistic Annotation, Release and Preservation. For Page Collection, web crawlers are employed to collect web text by crawling 100 million pages every three months and retaining several versions of the text for three-month periods. For Linguistic Annotation, the linguistic studies web corpus contains annotated linguistic information. To improve the usability of these linguistic resources, normalization tasks such as tag removal, word segmentation, dependency parsing, and register estimation are performed. For Release, word lists and n-gram data are published based on the crawled and annotated text corpus. In addition, applications are being developed to enable searching for morphosyntax patterns in the ten-billion-word corpus. For Preservation, crawled web pages are preserved in chronological order as web archives primarily to support the survey of ongoing linguistic changes. In this paper, we present the basic design of the four categories. Additionally, we report the current status of the corpus using basic statistics of the crawled data and discuss the importance of deduplicating sentences. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:45
Módosítás dátuma 2021. 08. 09. 8:42:45

Címkék:

  • Web archiving
  • Corpora (Linguistics)
  • crawling
  • web archive
  • Japan
  • Japanese language
  • Japanese language resources
  • Language digital resources
  • linguistic annotation
  • web corpus

Archiving before Loosing Valuable Data? Development of Web Archiving in Europe

Típus Folyóiratcikk
Szerző France Lasfargues
Szerző Chloé Martin
Szerző Leïla Medjkoune
URL https://search.proquest.com/docview/1532083850?accountid=27464
Kötet 36
Szám 1
Oldalszám 117-124
Kiadvány Bibliothek Forschung und Praxis
ISSN 1865-7648
Dátum 2012-01
Egyéb Number: 1
PMID: 1532083850
Publisher: Walter de Gruyter GmbH
Place: Berlin
DOI 10.1515/bfp-2012-0014
Nyelv English
Kivonat Web content is, by nature, ephemeral: sites are updated regularly and disappear, which involves the loss of unique value information. The importance of this media grows continuously in our society and institutions are developing websites with a variety of content creating a large media-centric Web sphere. Like any media, it is essential to preserve it as a key part of our heritage.
Hozzáadás dátuma 2021. 08. 09. 8:41:59
Módosítás dátuma 2021. 08. 09. 8:41:59

Címkék:

  • Web archiving
  • Library And Information Sciences
  • preservation
  • state of the art

Archiving Catholic Faith on the Web During the COVID-19 Pandemic

Típus Folyóiratcikk
Szerző Kayla Harris
Szerző Stephanie Shreffler
Kötet 91
Szám 3
Oldalszám 7
Dátum 2021
Egyéb Number: 3
Könyvtár Katalógus Zotero
Nyelv en
Hozzáadás dátuma 2021. 08. 09. 8:44:00
Módosítás dátuma 2021. 08. 09. 8:44:00

Archiving in the Age of Digital Conversion: Notes for a Politics of "Remains."

Típus Folyóiratcikk
Szerző Éric Méchoulan
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 40
Szám 2
Oldalszám 92-104
Kiadvány Substance: A Review of Theory & Literary Criticism
ISSN 00492426
Dátum 2011-05
Egyéb Number: 2
Kivonat The article focuses on archiving in the digital age. The author notes that caught in between materiality of the means of preservation and communication of documents and the relationships of power and of the institutions of the past is archiving. The archive is a form of social transmission, a process that transforms a text, image or sound into a document, an authorization to endure beyond ephemerality. This article was translated by Roxanne Lapidus.
Hozzáadás dátuma 2021. 08. 09. 8:42:51
Módosítás dátuma 2021. 08. 09. 8:42:51

Címkék:

  • DIGITAL preservation
  • WEB archiving
  • ARCHIVES
  • ELECTRONIC information resources
  • INFORMATION resources
  • LAPIDUS
  • Roxanne

Archiving in the networked world: betting on the future

Típus Folyóiratcikk
Szerző Michael Seadle
Szerkesztő Judith Wusteman
URL https://www.emeraldinsight.com/doi/10.1108/07378830910968326
Kötet 27
Szám 2
Oldalszám 319-325
Kiadvány Library Hi Tech
ISSN 0737-8831
Dátum 2009-06-12
Egyéb Number: 2
DOI 10.1108/07378830910968326
Kivonat Purpose – The goal of this column is not to argue the pros and cons of digital archiving, or to propose solutions to its problems, but to describe it as a research subject and a social phenomenon. Design/methodology/approach – This column relies on cultural anthropology, in particular the approach that Clifford Geertz championed, and for cultural anthropology, language and its social context matter. Findings – Archiving systems abound with competing claims about effectiveness. Transparency and evidence of public testing is rare, with a few exceptions. The lack of public testing does not mean that systems do less than they claim, but it does mean that libraries, archives and museums need to press for proof if they want to have confidence in the product. Originality/value – When betting on the future, these cannot be certainty, but bets placed should be based on knowledge.
Hozzáadás dátuma 2021. 08. 09. 8:42:48
Módosítás dátuma 2021. 08. 09. 8:42:48

Címkék:

  • Digital libraries
  • Museums
  • Library and information networks

Archiving in the networked world: preserving plagiarized works

Típus Folyóiratcikk
Szerző Michael Seadle
URL http://10.0.4.84/07378831111189750
Kötet 29
Szám 4
Oldalszám 655-662
Kiadvány Library Hi Tech
ISSN 0737-8831
Dátum 2011-11-22
Egyéb Number: 4
DOI 10.1108/07378831111189750
Kivonat Purpose – Plagiarism has become a salient issue for universities and thus for university libraries in recent years. This paper aims to discuss three interrelated aspects of preserving plagiarized works: collection development issues, copyright problems, and technological requirements. Too often these three are handled separately even though in fact each has an influence on the other. Design/methodology/approach – The paper looks first at the ingest process (called the Submission Information Package or SIP), then at storage management in the archive (the AIP or Archival Information Package), and finally at the retrieval process (the DIP or Distribution Information Package). Findings – The chief argument of this paper is that works of plagiarism and the evidence exposing them are complex objects, technically, legally and culturally. Merely treating them like any other work needing preservation runs the risk of encountering problems on one of those three fronts. Practical implications – This is a problem, since currently many public preservation strategies focus on ingesting large amounts of self-contained content that resembles print on paper, rather than on online works that need special handling. Archival systems also often deliberately ignore the cultural issues that affect future usability. Originality/value – The paper discusses special handling and special considerations for archiving works of plagiarism. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:54
Módosítás dátuma 2021. 08. 09. 8:42:54

Címkék:

  • Web archiving
  • Digital libraries
  • Digital preservation
  • Archiving
  • Preservation
  • Germany
  • Information retrieval
  • Collection development in libraries
  • Collections management
  • Copyright & digital preservation
  • Information resources management
  • Intellectual property
  • Plagiarism

Archiving Interactive Narratives at the British Library

Típus Dolgozat
Szerző Lynda Clark
Szerző Giulia Carla Rossi
Szerző Stella Wisdom
Szerkesztő Anne-Gwenn Bosser
Szerkesztő David E. Millard
Szerkesztő Charlie Hargood
Sorozat Lecture Notes in Computer Science
Hely Cham
Kiadó Springer International Publishing
Oldalszám 300-313
ISBN 978-3-030-62516-0
Dátum 2020
DOI 10.1007/978-3-030-62516-0_27
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat This paper describes the creation of the Interactive Narratives collection in the UK Web Archive, as part of the UK Legal Deposit Libraries Emerging Formats Project. The aim of the project is to identify, collect and preserve complex digital publications that are in scope for collection under UK Non-Print Legal Deposit Regulations. This article traces the process of building the Interactive Narratives collection, analysing the different tools and methods used and placing the collection within the wider context of Emerging Formats work and engagement activities at the British Library.
Kiadvány címe Interactive Storytelling
Hozzáadás dátuma 2021. 08. 09. 8:44:05
Módosítás dátuma 2021. 08. 09. 8:44:05

Címkék:

  • Web archiving
  • Digital preservation
  • Digital storytelling
  • Emerging Formats
  • Interactive Narratives collection
  • New media collection management

Archiving of Comprehensive Annual Financial Reports (CAFRs) on State Government Web Sites

Típus Folyóiratcikk
Szerző Joel B Thornton
URL https://search.proquest.com/docview/1550992606?accountid=27464
Kötet 31
Szám 2
Oldalszám 87-95
Kiadvány Behavioral & Social Sciences Librarian
ISSN 0163-9269, 0163-9269
Dátum 2012-04
Egyéb Number: 2
Publisher: Taylor & Francis, Philadelphia PA
Place: Texas A&M University Libraries, College Station, Texas lzcbv2@tamu.edu
DOI http://dx.doi.org/10.1080/01639269.2012.686244
Nyelv English
Kivonat Rising cost and declining revenues have hampered the financial affairs of state governments, forcing many to curtail services, reduce employee benefits, and trim the workforce, calling into question the fiscal sustainability of many state governments. As a result, stakeholders are demanding greater accountability and increased transparency into state government finances. An important link or communication tool between state governments and stakeholders is the comprehensive annual financial report. The comprehensive annual financial report (CAFR), produced by state governments, provides some insight into how taxpayer dollars are spent and the benefits derived therefrom. This article analyzes the extent to which the states electronically archive the CAFR on their websites and the accessibility of the reports to users searching state government websites. Adapted from the source document.
Hozzáadás dátuma 2021. 08. 09. 8:42:13
Módosítás dátuma 2021. 08. 09. 8:42:13

Címkék:

  • Web archiving
  • archives
  • Government information
  • article
  • 5.2: MATERIALS BY SUBJECTS
  • Access to information
  • CAFR
  • comprehensive annual financial report
  • Finance
  • Reports
  • State government
  • state government publications
  • web-based government publications

Archiving Social Media: The Case of Twitter

Típus Könyvfejezet
Szerző Zeynep Pehlivan
Szerző Jérôme Thièvre
Szerző Thomas Drugeon
Szerkesztő Daniel Gomes
Szerkesztő Elena Demidova
Szerkesztő Jane Winters
Szerkesztő Thomas Risse
URL https://doi.org/10.1007/978-3-030-63291-5_5
Hely Cham
Kiadó Springer International Publishing
Oldalszám 43-56
ISBN 978-3-030-63291-5
Dátum 2021
Egyéb DOI: 10.1007/978-3-030-63291-5_5
Hozzáférés 2021. 07. 15. 9:52:26
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat Around the world, billions of people use social media like Twitter and Facebook every day, to find, discuss and share information. Social media, which has transformed people from content readers to publishers, is not only an important data source for researchers in social science but also a “must archive” object for web archivists for future generations. In recent years, various communities have discussed the need to archive social media and have debated the issues related to its archiving. There are different ways of archiving social media data, including using traditional web crawlers and application programming interfaces (APIs) or purchasing from official company firehoses. It is important to note that the first two methods bring some issues related to capturing the dynamic and volatile nature of social media, in addition to the severe restrictions of APIs. These issues have an impact on the completeness of collections and in some cases return only a sample of the whole. In this chapter, we present these different methods and discuss the challenges in detail, using Twitter as a case study to better understand social media archiving and its challenges, from gathering data to long-term preservation.
Könyv címe The Past Web: Exploring Web Archives
Rövid cím Archiving Social Media
Hozzáadás dátuma 2021. 08. 09. 8:43:58
Módosítás dátuma 2021. 08. 09. 8:43:58

Archiving Software Surrogates on the Web for Future Reference.

Típus Folyóiratcikk
Szerző Helge Holzmann
Szerző Wolfram Sperber
Szerző Mila Runnwerth
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Oldalszám 215
Kiadvány Research & Advanced Technology for Digital Libraries: 20th International Conference on Theory & Practice of Digital Libraries, TPDL 2016, Hannover, Germany, September 5-9, 2016, Proceedings
ISSN 9783319439969
Dátum 2016-01
Kivonat Software has long been established as an essential aspect of the scientific process in mathematics and other disciplines. However, reliably referencing software in scientific publications is still challenging for various reasons. A crucial factor is that software dynamics with temporal versions or states are difficult to capture over time. We propose to archive and reference surrogates instead, which can be found on the Web and reflect the actual software to a remarkable extent. Our study shows that about a half of the webpages of software are already archived with almost all of them including some kind of documentation.
Hozzáadás dátuma 2021. 08. 09. 8:41:53
Módosítás dátuma 2021. 08. 09. 8:41:53

Címkék:

  • Web Archives
  • Analysis
  • Scientific Software Management

Archiving the Internet – Web pages of political parties

Típus Folyóiratcikk
Szerző M Peach
URL https://search.proquest.com/docview/57443754?accountid=27464
Kötet 15
Szám 4
Oldalszám 54-58
Kiadvány Assignation
ISSN 0265-2587, 0265-2587
Dátum 1998-07
Egyéb Number: 4
Nyelv English
Kivonat The Internet has great potential as a source of grey literature. Describes the efforts of the Centro de Estudios Avanzados en Ciencias Sociales (CEACS) of the Instituto Juan March in Madrid, Spain, to take advantage of that potential as a source for researchers present and future. Discusses the following: public use of the Internet in Spain; profile of the CEACS project; nature of political party pages; current status of the project; problems and technical needs; and project expansion.
Hozzáadás dátuma 2021. 08. 09. 8:42:14
Módosítás dátuma 2021. 08. 09. 8:42:14

Címkék:

  • Politics
  • Grey literature
  • Instituto Juan March, Spain Centro de Estudios Ava
  • Online information retrieval
  • Spain

Archiving the Pandemic: UTA Libraries project preserves community experiences with COVID-19

Típus Újságcikk
URL https://www.proquest.com/wire-feeds/archiving-pandemic-uta-libraries-project/docview/2425490542/se-2?accountid=15756
Hely Carlsbad
Kiadvány University Wire
Dátum 2020 Jul 20
Pontos lelőhely 2425490542
Nyelv English
Archívum ProQuest One Academic
Hozzáadás dátuma 2021. 08. 09. 8:44:40
Módosítás dátuma 2021. 08. 09. 8:44:40

Címkék:

  • Web archiving
  • Archives & records
  • Library collections
  • Web sites
  • Students
  • Archivists
  • COVID-19
  • Coronaviruses
  • General Interest Periodicals–United States
  • Interviews
  • Mindfulness
  • Oral history
  • Pandemics
  • Quarantine

Archiving the relaxed consistency web

Típus Dolgozat
Szerző Zhiwu Xie
Szerző Herbert de Sompel
Szerző Jinyang Liu
Szerző Johann van Reenen
Szerző Ramiro Jordan
URL http://doi.acm.org/10.1145/2505515.2505551
Hely New York, NY, USA
Kiadó ACM
Oldalszám 2119-2128
ISBN 978-1-4503-2263-8
Dátum 2013
Egyéb Series Title: CIKM '13
Citation Key: Xie:2013:ARC:2541176.2505551
DOI 10.1145/2505515.2505551
Kivonat The historical, cultural, and intellectual importance of archiving the web has been widely recognized. Today, all countries with high Internet penetration rate have established high-profile archiving initiatives to crawl and archive the fast-disappearing web content for long-term use. As web technologies evolve, established web archiving techniques face challenges. This paper focuses on the potential impact of the relaxed consistency web design on crawler driven web archiving. Relaxed consistent websites may disseminate, albeit ephemerally, inaccurate and even contradictory information. If captured and preserved in the web archives as historical records, such information will degrade the overall archival quality. To assess the extent of such quality degradation, we build a simplified feed-following application and simulate its operation with synthetic workloads. The results indicate that a non-trivial portion of a relaxed consistency web archive may contain observable inconsistency, and the inconsistency window may extend significantly longer than that observed at the data store. We discuss the nature of such quality degradation and propose a few possible remedies.
Kiadvány címe Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hozzáadás dátuma 2021. 08. 09. 8:43:36
Módosítás dátuma 2021. 08. 09. 8:43:36

Címkék:

  • web archiving
  • digital preservation
  • consistency
  • social network

Archiving the Russian and East European Lesbian, Gay, Bisexual, and Transgender Web, 2013: A Pilot Project

Típus Folyóiratcikk
Szerző Liladhar R Pendse
URL http://10.0.4.56/15228886.2014.930973
Kötet 15
Szám 3
Oldalszám 182-196
Kiadvány Slavic & East European Information Resources
ISSN 1522-8886
Dátum 2014-07-03
Egyéb Number: 3
DOI 10.1080/15228886.2014.930973
Kivonat This article focuses on the conceptualization and implementation of a web archiving pilot project of selected Russian and East European lesbian, gay, bisexual, and transgender (LGBT) websites by the University of California, Berkeley. It introduces the use of the Web Archiving Services (WAS) platform developed by the California Digital Library. While identifying the criteria used to harvest these websites, the paper also describes various complexities associated with the viability of projects related to such complex social and political issues as the Russian and Eastern European LGBT rights movements. The article does not take an ideological stance with respect to legal issues, but rather strives to preserve information for academic research. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:59
Módosítás dátuma 2021. 08. 09. 8:42:59

Címkék:

  • Web archiving
  • WEB archives
  • websites
  • Russia
  • INFORMATION storage & retrieval systems
  • Berkeley
  • bisexual and transgender web
  • California Digital Library
  • CATALOGING of archival materials
  • East Europe
  • Eastern Europe
  • gay
  • lesbian
  • LGBT
  • LGBT websites
  • PILOT projects
  • University of California
  • UNIVERSITY research

Archiving the web using page changes patterns: a case study

Típus Folyóiratcikk
Szerző Myriam Ben Saad
Szerző Stéphane Gançarski
URL https://search.proquest.com/docview/1197168439?accountid=27464
Kötet 13
Szám 1
Oldalszám 33-49
Kiadvány International Journal on Digital Libraries
ISSN 1432-5012
Dátum 2012-12-06
Egyéb Number: 1
PMID: 1197168439
Publisher: Springer Science & Business Media
Place: Heidelberg
DOI 10.1007/s00799-012-0094-z
Nyelv English
Kivonat Issue Title: Focused Issue on Joint Conference on Digital Libraries (JCDL) 2011 A pattern is a model or a template used to summarize and describe the behavior (or the trend) of data having generally some recurrent events. Patterns have received a considerable attention in recent years and were widely studied in the data mining field. Various pattern mining approaches have been proposed and used for different applications such as network monitoring, moving object tracking, financial or medical data analysis, scientific data processing, etc. In these different contexts, discovered patterns were useful to detect anomalies, to predict data behavior (or trend) or, more generally, to simplify data processing or to improve system performance. However, to the best of our knowledge, patterns have never been used in the context of Web archiving. Web archiving is the process of continuously collecting and preserving portions of the World Wide Web for future generations. In this paper, we show how patterns of page changes can be useful tools to efficiently archive Websites. We first define our pattern model that describes the importance of page changes. Then, we present the strategy used to (i) extract the temporal evolution of page changes, (ii) discover patterns, to (iii) exploit them to improve Web archives. The archive of French public TV channels France Télévisions is chosen as a case study to validate our approach. Our experimental evaluation based on real Web pages shows the utility of patterns to improve archive quality and to optimize indexing or storing.[PUBLICATION ABSTRACT]
Hozzáadás dátuma 2021. 08. 09. 8:42:03
Módosítás dátuma 2021. 08. 09. 8:42:03

Címkék:

  • Models
  • Library And Information Sciences–Computer Applica
  • World Wide Web
  • Archives & records
  • Data mining
  • Case studies

Archiving the Web Using Page Changes Patterns: A Case Study

Típus Dolgozat
Szerző Myriam Ben Saad
Szerző Stéphane Gançarski
URL http://doi.acm.org/10.1145/1998076.1998098
Hely New York, NY, USA
Kiadó ACM
Oldalszám 113-122
ISBN 978-1-4503-0744-4
Dátum 2011
Egyéb Series Title: JCDL '11
Citation Key: BenSaad:2011:AWU:1998076.1998098
DOI 10.1145/1998076.1998098
Kivonat A pattern is a model or a template used to summarize and describe the behavior (or the trend) of a data having generally some recurrent events. Patterns have received a considerable attention in recent years and were widely studied in the data mining field. Various pattern mining approaches have been proposed and used for different applications such as network monitoring, moving object tracking, financial or medical data analysis, scientific data processing, etc. In these different contexts, discovered patterns were useful to detect anomalies, to predict data behavior (or trend), or more generally, to simplify data processing or to improve system performance. However, to the best of our knowledge, patterns have never been used in the context of web archiving. Web archiving is the process of continuously collecting and preserving portions of the World Wide Web for future generations. In this paper, we show how patterns of page changes can be useful tools to efficiently archive web sites. We first define our pattern model that describes the changes of pages. Then, we present the strategy used to (i) extract the temporal evolution of page changes, to (ii) discover patterns and to (iii) exploit them to improve web archives. We choose the archive of French public TV channels « France Télévisions » as a case study in order to validate our approach. Our experimental evaluation based on real web pages shows the utility of patterns to improve archive quality and to optimize indexing or storing.
Kiadvány címe Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:36
Módosítás dátuma 2021. 08. 09. 8:43:36

Címkék:

  • web archiving
  • pattern
  • web page changes

Archiving the Web, A Service Construct TT – Archiver le Web, un service en construction

Típus Folyóiratcikk
Szerző France Lasfargues
Szerző Leila Medjkoune
URL https://search.proquest.com/docview/1283633770?accountid=27464
Kötet 49
Szám 3
Oldalszám 8-9
Kiadvány Documentaliste – Sciences de l'Information
ISSN 0012-4508, 0012-4508
Dátum 2012-09
Egyéb Number: 3
Publisher: Association Francais des Documentalistes et des Bibliothecaires Speciales, Paris, France
Place: Archivage Web et Data mining, Internet Memory france.lasfargues@internetmemory.net
Nyelv French
Kivonat Archiving the Web is an old problem that is taking shape, accompanied by new businesses contours. This article gives a few reminders of technical, historical and legal issues of web archiving before discussing the tasks entrusted to a Web archivist. The article outlines the context of Web archiving. Several duties involved in the work of web archivists are outlined: enriching collections, managing a budget, controlling the quality of the collection, giving access to the archive and preserving web content. Adapted from the source document.
Hozzáadás dátuma 2021. 08. 09. 8:42:17
Módosítás dátuma 2021. 08. 09. 8:42:17

Címkék:

  • Web archiving
  • article
  • 2.14: LIS – TYPES OF STAFF
  • Professional responsibilities
  • Role

Archiving Web Content: An Online Searcher Roundtable

Típus Folyóiratcikk
Szerző James Careless
URL https://search.proquest.com/docview/1417518328?accountid=27464
Kötet 37
Szám 2
Oldalszám 44-46
Kiadvány Online Searcher
ISSN 2324-9684, 2324-9684
Dátum 2013-03
Egyéb Number: 2
Publisher: Information Today Inc, Medford, NJ
Nyelv English
Kivonat In a roundtable discussion, several executives shared their views about archiving web content. Library of Congress' Office of Strategic Initiatives leader Abbie Grotke said the Library's web archiving project preserves web content around events, such as the US National Elections or September 11, or related themes such as public policy topics or the US Congress. They also archive their own Web site at loc.gov. Las Vegas-Clark County Library District virtual library manager Lauren Stokes said they archive their video and audio content in a variety of media. They use local server storage, portable hard drive backups as well as CD backups. Server storage is also backed up on tapes rotated into cold storage. Boston Public Library's director of administration and technology David Leonard said their digitization efforts are focused on accessibility. Web portal accessibility — whether as part of their own web presence or the positing of materials to other Internet sites as well as some social media sites — all help with accessibility. Adapted from the source document.
Hozzáadás dátuma 2021. 08. 09. 8:42:10
Módosítás dátuma 2021. 08. 09. 8:42:10

Címkék:

  • Digital preservation
  • Web sites
  • Libraries
  • 9.15: TECHNICAL SERVICES – PRESERVATION
  • article
  • Methods
  • Storage

Archiving Web Site Resources: A Records Management View

Típus Dolgozat
Szerző Maureen Pennock
Szerző Brian Kelly
URL http://doi.acm.org/10.1145/1135777.1135978
Hely New York, NY, USA
Kiadó ACM
Oldalszám 987-988
ISBN 1-59593-323-9
Dátum 2006
Egyéb Series Title: WWW '06
Citation Key: Pennock:2006:AWS:1135777.1135978
DOI 10.1145/1135777.1135978
Kivonat In this paper, we propose the use of records management principles to identify and manage Web site resources with enduring value as records. Current Web archiving activities, collaborative or organisational, whilst extremely valuable in their own right, often do not and cannot incorporate requirements for proper records management. Material collected under such initiatives therefore may not be reliable or authentic from a legal or archival perspective, with insufficient metadata collected about the object during its active life, and valuable materials destroyed whilst ephemeral items are maintained. Education, training, and collaboration between stakeholders are integral to avoiding these risks and successfully preserving valuable Web-based materials.
Kiadvány címe Proceedings of the 15th International Conference on World Wide Web
Hozzáadás dátuma 2021. 08. 09. 8:43:38
Módosítás dátuma 2021. 08. 09. 8:43:38

Címkék:

  • best practices
  • archiving web sites
  • records management

Archiving Websites in the Nordic Countries TT – Archiwizowanie stron internetowych w krajach nordyckich

Típus Folyóiratcikk
Szerző Lilianna Nalewajska
URL https://search.proquest.com/docview/1266143226?accountid=27464
Szám 1
Kiadvány Biuletyn EBIB
ISSN 1507-7187, 1507-7187
Dátum 2012
Egyéb Number: 1
Publisher: Stowarzyszenie Bibliotekarzy Polskich, Warsaw, Poland
Place: University of Warsaw Library
Nyelv Polish
Kivonat The Nordic countries (Norway, Sweden, Finland, Denmark and Iceland) are the pioneers of web archiving. The process of collecting materials from the web requires arrangements concerning technical-technological, legal and organization issues, was started in these countries in the late 1990s or in the beginning of the 21st century. Archiving is being carried out mainly in national libraries, which also cooperate with International Internet Preservation Consortium and co-create Nordic Web Archive. The way of functioning and the difficulties which occur during archiving in Nordic countries show the complexity of the process and point out how important long-term planning is. Adapted from the source document.
Hozzáadás dátuma 2021. 08. 09. 8:42:11
Módosítás dátuma 2021. 08. 09. 8:42:11

Címkék:

  • Web archiving
  • National libraries
  • 9.15: TECHNICAL SERVICES – PRESERVATION
  • article
  • Cooperation
  • Internet archiving, Web archiving, Web archive
  • Nordic countries

Archiwizacja internetu – wnioski i rekomendacje z kilku raportów TT – Internet archiving – conclusions and recommendations from several reports

Típus Folyóiratcikk
Szerző Lidia Derfert-Wolf
URL https://search.proquest.com/docview/1951539109?accountid=27464
Szám 172
Oldalszám 1
Kiadvány Elektroniczny Biuletyn Informacyjny Bibliotekarzy : EBIB
Dátum 2017
Egyéb Number: 172
Publisher: Stowarzyszenie Bibliotekarzy Polskich
Place: Biblioteka Główna Uniwersytetu Technologiczno-Przyrodniczego w Bydgoszczy ; Biblioteka Główna Uniwersytetu Technologiczno-Przyrodniczego w Bydgoszczy
Nyelv Polish
Kivonat W artykule omówiono trzy zagraniczne raporty dotyczące archiwizacji internetu. W materiale Web-Archiving z 2013 r. przedstawiono kluczowe problemy archiwizacji internetu, z punktu widzenia instytucji realizujących tego typu projekty, bez względu na to czy zlecają prace zewnętrznym firmom czy wykonują je we własnym zakresie. Raport Preserving Social Media, opracowany w 2016 r., dotyczy zabezpieczania zasobów mediów społecznościowych. Web Archiving Environmental Scan – stanowi analizę środowiskową, która przeprowadzono w 2015 r. na zlecenie Biblioteki Uniwersytetu Harvarda. Badaniem objęto 23 instytucje z całego świata, realizujące aktualnie tego typu projekty. W artykule przedstawiono również elementy dokumentu normalizacyjnego ISO/TR 14873:2013 Information and Documentation – Statistics and quality issues for web archiving. Na zakończenie nawiązano do prognoz dotyczących rozwoju archiwizacji internetu zaprezentowanych w raporcie Web Archives: The Future(s), opublikowanym w 2011 r.
Hozzáadás dátuma 2021. 08. 09. 8:42:31
Módosítás dátuma 2021. 08. 09. 8:42:31

Címkék:

  • Web archiving
  • Digital archives
  • Library And Information Sciences
  • 3.2:ARCHIVES

Archiwizacja internetu jako usługa naukowa TT – Internet archiving as a scientific service

Típus Folyóiratcikk
Szerző Anna Kugler
Szerző Tobias Beinert
Szerző Astrid Schoger
URL https://search.proquest.com/docview/1951541162?accountid=27464
Szám 172
Oldalszám 1
Kiadvány Elektroniczny Biuletyn Informacyjny Bibliotekarzy : EBIB
Dátum 2017
Egyéb Number: 172
Publisher: Stowarzyszenie Bibliotekarzy Polskich
Place: Munich Digitization Center/Digital Library Bavarian State Library ; Munich Digitization Center/Digital Library Bavarian State Library
Nyelv Polish
Kivonat Gromadzenie i archiwizowanie stron internetowych istotnych dla nauki to jak dotąd bardzo zaniedbana sfera aktywności bibliotek niemieckich. Aby zapobiec groźnym stratom oraz zapewnić pracownikom naukowym stały dostęp do stron internetowych ponad dwa lata temu Bavarian State Library (BSB) stworzyła system archiwizacji stron internetowych. Głównym celem projektu zaakceptowanym przez German Research Foundation (DFG) był rozwój i realizacja kooperacyjnego modelu usługowego. Usługa ta ma wspierać inne instytucje dziedzictwa kulturowego w ich aktywności archiwizacyjnej i ułatwiać budowanie rozproszonego niemieckiego systemu archiwizacji naukowych stron internetowych. Dzięki temu projek­towi biblioteka bawarska chce poprawić zarówno ilość, jak i jakość zarchiwizowanych treści oraz promować ich wy­korzystanie w obszarze nauki.
Hozzáadás dátuma 2021. 08. 09. 8:42:23
Módosítás dátuma 2021. 08. 09. 8:42:23

Címkék:

  • Web archiving
  • Library And Information Sciences
  • 3.2:ARCHIVES
  • 3.11:NATIONAL LIBRARIES AND STATE LIBRARIES
  • Germany
  • State libraries

ArcLink

Típus Dolgozat
Szerző Ahmed AlSum
Szerző Michael L. Nelson
URL http://dl.acm.org/citation.cfm?doid=2467696.2467751
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 377-378
ISBN 978-1-4503-2077-1
Dátum 2013
DOI 10.1145/2467696.2467751
Kiadvány címe Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries – JCDL '13
Hozzáadás dátuma 2021. 08. 09. 8:41:51
Módosítás dátuma 2021. 08. 09. 8:41:51

Címkék:

  • Design
  • Experimentation

Arcomem Crawling Architecture

Típus Folyóiratcikk
Szerző Vassilis Plachouras
Szerző Florent Carpentier
Szerző Muhammad Faheem
Szerző Julien Masanès
Szerző Thomas Risse
Szerző Pierre Senellart
Szerző Patrick Siehndel
Szerző Yannis Stavrakas
URL http://www.mdpi.com/1999-5903/6/3/518
Kötet 6
Szám 3
Oldalszám 518-541
Kiadvány Future Internet
ISSN 1999-5903
Dátum 2014-08-19
Egyéb Number: 3
DOI 10.3390/fi6030518
Kivonat The World Wide Web is the largest information repository available today. However, this information is very volatile and Web archiving is essential to preserve it for the future. Existing approaches to Web archiving are based on simple definitions of the scope of Web pages to crawl and are limited to basic interactions with Web servers. The aim of the ARCOMEM project is to overcome these limitations and to provide flexible, adaptive and intelligent content acquisition, relying on social media to create topical Web archives. In this article, we focus on ARCOMEM’s crawling architecture. We introduce the overall architecture and we describe its modules, such as the online analysis module, which computes a priority for the Web pages to be crawled, and the Application-Aware Helper which takes into account the type of Web sites and applications to extract structure from crawled content. We also describe a large-scale distributed crawler that has been developed, as well as the modifications we have implemented to adapt Heritrix, an open source crawler, to the needs of the project. Our experimental results from real crawls show that ARCOMEM’s crawling architecture is effective in acquiring focused information about a topic and leveraging the information from social media.
Hozzáadás dátuma 2021. 08. 09. 8:42:49
Módosítás dátuma 2021. 08. 09. 8:42:49

Címkék:

  • web archiving
  • content acquisition
  • crawling architecture

Arcomem: From Collect-all ARchives to COmmunity MEMories

Típus Dolgozat
Szerző Thomas Risse
Szerző Wim Peters
URL http://doi.acm.org/10.1145/2187980.2188027
Hely New York, NY, USA
Kiadó ACM
Oldalszám 275-278
ISBN 978-1-4503-1230-1
Dátum 2012
Egyéb Series Title: WWW '12 Companion
Citation Key: Risse:2012:ACA:2187980.2188027
DOI 10.1145/2187980.2188027
Kivonat The ARCOMEM project is about memory institutions like archives, museums and libraries in the age of the Social Web. Social media are becoming more and more pervasive in all areas of life. ARCOMEM's aim is to help to transform archives into collective memories that are more tightly integrated with their community of users and to exploit Web 2.0 and the wisdom of crowds to make Web archiving a more selective and meaning-based process. ARCOMEM (FP7-IST-270239) is an Integrating Project in the FP7 program of the European Commission, which involves twelve partners from academia, industry and public sector. The project will run from January 1, 2011 to December 31, 2013.
Kiadvány címe Proceedings of the 21st International Conference on World Wide Web
Hozzáadás dátuma 2021. 08. 09. 8:43:16
Módosítás dátuma 2021. 08. 09. 8:43:16

Címkék:

  • web archiving
  • web crawler
  • architecture
  • text analysis
  • social web

Arhivarea Paginilor Web – Initiative Relevante de Pastrare a Patrimoniului Digital European

Típus Folyóiratcikk
Szerző Adriana Elena Boruna
Szerző Nicoleta Rahme
URL https://search.proquest.com/docview/1443688144?accountid=27464
Kötet 4
Oldalszám 39-52,
Kiadvány Biblioteca Nationala a Romaniei. Informare si Documentare
ISSN 20651058
Dátum 2011
Egyéb PMID: 1443688144
Publisher: National Library of Romania [Biblioteca Nationale a Romaniei]
Place: Bucharest
Nyelv Romanian
Hozzáadás dátuma 2021. 08. 09. 8:41:58
Módosítás dátuma 2021. 08. 09. 8:41:58

Címkék:

  • Sciences: Comprehensive Works

Assembling the Living Archive: A Media-Archaeological Excavation of Occupy Wall Street

Típus Folyóiratcikk
Szerző Jason W Buel
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 30
Szám 2
Oldalszám 283-303
Kiadvány Public Culture
ISSN 0899-2363
Dátum 2018-05-01
Egyéb Number: 2
DOI 10.1215/08992363-4310930
Kivonat The article discusses the issues behind the social protest called Occupy Wall Street (OWS) that was staged in Zuccotti Park, Manhattan, New York in September 2011. Also cited are the efforts to archive the movement to preserve its history in a decentralized online archive, as well as the efforts by the OWS Archives Working Group in the archival process.
Hozzáadás dátuma 2021. 08. 09. 8:42:08
Módosítás dátuma 2021. 08. 09. 8:42:08

Címkék:

  • DIGITAL preservation
  • WEB archiving
  • OCCUPY protest movement
  • OCCUPY Wall Street protest movement
  • SOCIAL movements

Assessing the loss of Western Canadian digital heritage

Típus Folyóiratcikk
Szerző Tasbire Saiyera
Szerző Brenda Reyes Ayala
Szerző Qiufeng Du
URL https://journals.library.ualberta.ca/ojs.cais-acsi.ca/index.php/cais-asci/article/view/1218
Jogok Copyright (c) 2021 Tasbire Saiyera, Brenda Reyes Ayala, Qiufeng Du
Kiadvány Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI
ISSN 2562-7589
Dátum 2021-05-31
DOI 10.29173/cais1218
Hozzáférés 2021. 07. 15. 10:51:26
Könyvtár Katalógus journals.library.ualberta.ca
Nyelv en
Hozzáadás dátuma 2021. 08. 09. 8:44:14
Módosítás dátuma 2021. 08. 09. 8:44:14

Címkék:

  • web archives

Automatic Generation of Timelines for Past-Web Events

Típus Könyvfejezet
Szerző Ricardo Campos
Szerző Arian Pasquali
Szerző Adam Jatowt
Szerző Vítor Mangaravite
Szerző Alípio Mário Jorge
Szerkesztő Daniel Gomes
Szerkesztő Elena Demidova
Szerkesztő Jane Winters
Szerkesztő Thomas Risse
URL https://doi.org/10.1007/978-3-030-63291-5_18
Hely Cham
Kiadó Springer International Publishing
Oldalszám 225-242
ISBN 978-3-030-63291-5
Dátum 2021
Egyéb DOI: 10.1007/978-3-030-63291-5_18
Hozzáférés 2021. 07. 15. 11:24:26
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat Despite significant advances in web archive infrastructures, the problem of exploring the historical heritage preserved by web archives is yet to be solved. Timeline generation emerges in this context as one possible solution for automatically producing summaries of news over time. Thanks to this, users can gain a better sense of reported news events, entities, stories or topics over time, such as getting a summary of the most important news about a politician, an organisation or a locality. Web archives play an important role here by providing access to a historical set of preserved information. This particular characteristic of web archives makes them an irreplaceable infrastructure and a valuable source of knowledge that contributes to the process of timeline generation. Accordingly, the authors of this chapter developed “Tell me Stories” (http://archive.tellmestories.pt), a news summarisation system, built on top of the infrastructure of Arquivo.pt—the Portuguese web-archive—to automatically generate a timeline summary of a given topic. In this chapter, we begin by providing a brief overview of the most relevant research conducted on the automatic generation of timelines for past-web events. Next, we describe the architecture and some use cases for “Tell me Stories”. Our system demonstrates how web archives can be used as infrastructures to develop innovative services. We conclude this chapter by enumerating open challenges in this field and possible future directions in the general area of temporal summarisation in web archives.
Könyv címe The Past Web: Exploring Web Archives
Hozzáadás dátuma 2021. 08. 09. 8:44:24
Módosítás dátuma 2021. 08. 09. 8:44:24

Avoiding Courseware With Slack

Típus Folyóiratcikk
Szerző Jessamyn West
URL https://search.proquest.com/docview/1830247744?accountid=27464
Kötet 36
Szám 8
Oldalszám 14-15
Kiadvány Computers in Libraries
ISSN 10417915
Dátum 2016-10
Egyéb Number: 8
Publisher: Information Today, Inc.
Place: Westport
Nyelv English
Kivonat Slack is a cloud-based software tool for team collaboration. The author used it as the primary tool to teach an asynchronous graduate level course called Tools for Community Advocacy at the University of Hawaii's library and information science (UHLIS) program, and it went well. UHLIS uses courseware that is some of the best out there — Laulima, based on Sakai — but similar to all courseware, it has a steep learning curve and some limitations. As an adjunct who was teaching a single 6-week class, she didn't have the time available to learn to use the tool well. She decided to stick with what she knew — which was Web sites, Google Docs, Skype, and Slack — using Slack as the activity hub. Slack's pricing model is also attractive, which is why she mention it as a real option for libraries.
Hozzáadás dátuma 2021. 08. 09. 8:41:43
Módosítás dátuma 2021. 08. 09. 8:41:43

Címkék:

  • Collaboration
  • Web archiving
  • Library And Information Sciences–Computer Applica
  • Archives & records
  • Internet
  • Social networks
  • Libraries
  • Library and information science
  • Chat rooms
  • Educational software
  • Students

Avoiding spoilers: wiki time travel with Sheldon Cooper

Típus Folyóiratcikk
Szerző Shawn M Jones
Szerző Michael L Nelson
Szerző Herbert Van de Sompel
URL https://search.proquest.com/docview/2002183210?accountid=27464
Kötet 19
Szám 1
Oldalszám 77-93
Kiadvány International Journal on Digital Libraries
ISSN 14325012
Dátum 2018-03
Egyéb Number: 1
Publisher: Springer Science & Business Media
Place: Los Alamos National Laboratory, Los Alamos, NM, USA ; Old Dominion University, Norfolk, VA, USA ; Los Alamos National Laboratory, Los Alamos, NM, USA
DOI http://dx.doi.org/10.1007/s00799-016-0200-8
Nyelv English
Kivonat A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if fans are behind in their viewing they run the risk of encountering “spoilers”—information that gives away key plot points before the intended time of the show’s writers. Because the wiki history is indexed by revisions, finding specific dates can be tedious, especially for pages with hundreds or thousands of edits. A wiki’s history interface does not permit browsing across historic pages without visiting current ones, thus revealing spoilers in the current page. Enterprising fans can resort to web archives and navigate there across wiki pages that were live prior to a specific episode date. In this paper, we explore the use of Memento with the Internet Archive as a means of avoiding spoilers in fan wikis. We conduct two experiments: one to determine the probability of encountering a spoiler when using Memento with the Internet Archive for a given wiki page, and a second to determine which date prior to an episode to choose when trying to avoid spoilers for that specific episode. Our results indicate that the Internet Archive is not safe for avoiding spoilers, and therefore we highlight the inherent capability of fan wikis to address the spoiler problem internally using existing, off-the-shelf technology. We use the spoiler use case to define and analyze different ways of discovering the best past version of a resource to avoid spoilers. We propose Memento as a structural solution to the problem, distinguishing it from prior content-based solutions to the spoiler problem. This research promotes the idea that content management systems can benefit from exposing their version information in the standardized Memento way used by other archives. We support the idea that there are use cases for which specific prior versions of web resources are invaluable.
Hozzáadás dátuma 2021. 08. 09. 8:42:20
Módosítás dátuma 2021. 08. 09. 8:42:20

Címkék:

  • Web archiving
  • Archives
  • Digital preservation
  • Library And Information Sciences–Computer Applica
  • Digital archives
  • Internet
  • Browsing
  • Content management systems
  • HTTP
  • Information management
  • Information resources
  • Internet resources
  • Management systems
  • Resource versioning
  • Spoilers
  • Time travel
  • Web sites
  • Wikis

Az első nyilvános webarchívum az Egyesült Királyságban

Típus Folyóiratcikk
Szerző Steve Bailey
Szerző Dale Thomson
Szerző Gabriella Szalóki
URL http://tmt-archive.omikk.bme.hu/show_news.html?id=4555&issue_id=476
Kötet 53
Szám 10
Kiadvány Tudományos és műszaki tájékoztatás
Dátum 2006
Egyéb Number: 10
Kivonat Sokak számára a web az elsődleges információforrás, eddig mégis kevés figyelmet fordítottak a weboldalak hosszú távú megőrzésére, ami azzal a veszéllyel jár, hogy felbecsülhetetlen tudományos és kulturális értékek vesznek el a jövő generációi számára. A probléma megoldására hat vezető brit intézmény dolgozik közösen egy tesztelési környezet kidolgozásán, amely alapján kiválaszthatók az archiválni kívánt weboldalak. A hat intézmény: Brit Nemzeti Levéltár, Brit Nemzeti Könyvtár, Közös Információs Rendszerek Bizottsága (JISC), a skót és a walesi nemzeti könyvtárak és a Wellcome Könyvtár, megalakította az Egyesült Királyság Webarchiválási Konzorciumát (UK Web Archiving Consortium = UKWAC). Az archiválásra az Ausztrál Nemzeti Könyvtár által kifejlesztett PANDAS (PANDORA Digital Archival System = Pandora Digitális Archiváló Rendszer) szoftvert használják. A partnerek az adott intézmény szakterületéhez kapcsolódó oldalakat mentik el.
Hozzáadás dátuma 2021. 08. 09. 8:43:29
Módosítás dátuma 2021. 08. 09. 8:43:29

Címkék:

  • webarchiválás
  • cikkreferátum
  • Nagy-Britannia

Az idő fogságában Ki őrzi meg a közösségi médiát?

Típus Folyóiratcikk
Szerző László Drótos
URL https://tmt.omikk.bme.hu/tmt/article/view/13062
Kötet 68
Szám 7
Oldalszám 428-439
Kiadvány Tudományos és Műszaki Tájékoztatás
ISSN 1586-2984
Dátum 2021 július
Egyéb Number: 7
Folyóirat rövid neve TMT
Hozzáférés 2021. 08. 04. 2:00:00
Nyelv magyar
Archívum TMT OJS Archívum
Hozzáadás dátuma 2021. 08. 09. 8:44:41
Módosítás dátuma 2021. 08. 09. 8:44:41

Az internet archiválása mint könyvtári feladat

Típus Folyóiratcikk
Szerző László Drótos
Kötet 64
Szám 7-8
Oldalszám 361-371
Kiadvány Tudományos és műszaki tájékoztatás
ISSN 0041-3917
Dátum 2017
Egyéb Number: 7-8
Kivonat A nyilvános internetről minden nap tömeges méretekben letörölt vagy máshová költöző dokumentumok és egyéb információforrások egyre nagyobb problémát jelentenek a tudományos publikációkban és a tananyagokban való hivatkozhatóság szempontjából, de az átlagos internetező is állandóan belefut az eltűnt weboldalakat jelző 404-es hibákba. A világháló alapvetően egy jelen idejű médium, de legalább egy részét érdemes lenne megőrizni és kutathatóvá tenni a jövő generációi számára. Ez a cikk arra a kérdésre keresi a választ, hogy ki, mit, hogyan, mivel és miért mentsen az internetről, és hol van itt a könyvtárak és a könyvtárosok feladata és felelőssége? Bemutat néhány hasznos eszközt és szolgáltatást, majd röviden ismerteti a nemzetközi helyzetet és az OSZK-ban 2017 tavaszán elindult kísérleti webarchiválási projektet.
Hozzáadás dátuma 2021. 08. 09. 8:43:29
Módosítás dátuma 2021. 08. 09. 8:43:29

Címkék:

  • internet
  • archiválás
  • honlaptérkép
  • OSZK

Az Országos Széchényi Könyvtár Webarchívumának 2020-as újdonságai.

Típus Folyóiratcikk
Szerző László Drótos
URL http://search.ebscohost.com/login.aspx?direct=true&db=lxh&AN=149857717&lang=hu&site=ehost-live
Szám 1
Oldalszám 31-38
Kiadvány New features of the National Széchényi Library’s web archive in 2020.
ISSN 00233773
Dátum Március 2021
Egyéb Number: 1
Folyóirat rövid neve Library Review / Konyvtari Figyelo
Könyvtár Katalógus EBSCOhost
Kivonat The paper describes the latest developments of the web archiving project launched in 2017 in the National Széchényi Library (NSZL) and the organizational, legal and infrastructural changes affecting the project. It also covers the results achieved in preserving different types of websites, archiving problems, and the software used. It summarizes the efforts to promote the topic, presents the international contact points, and finally lists the goals set for 2021 by the staff of the newly established Web Archiving Department in the NSZL. [ABSTRACT FROM AUTHOR]
Archívum Library, Information Science & Technology Abstracts
Hozzáadás dátuma 2021. 08. 09. 8:44:41
Módosítás dátuma 2021. 08. 09. 8:44:41

Címkék:

  • Web archiving
  • Web archives
  • Internet
  • National libraries
  • Websites
  • Preservation
  • Development plan
  • Goal (Psychology)
  • Hungary
  • International cooperation
  • National library
  • Web development

Az OSZK web-archiváló kísérleti (pilot) projektjének eredményei és egy üzemszerűen működő magyar webarchívum terve

Típus Folyóiratcikk
Szerző László Drótos
Szerző István Moldován
URL http://epa.oszk.hu/00100/00143/00355/pdf/EPA00143_konyvtari_figyelo_2019_01_038-051.pdf
Kötet 65
Szám 1
Oldalszám 38-51
Kiadvány Könyvtári Figyelő
ISSN 00233773
Dátum 2019 január
Egyéb Number: 1
Folyóirat rövid neve KF
Hozzáférés 2021. 08. 04. 2:00:00
Nyelv magyar
Hozzáadás dátuma 2021. 08. 09. 8:44:42
Módosítás dátuma 2021. 08. 09. 8:44:42

Az OSZK webaratás pilot projektjének gyűjtőköri tervezete

Típus Jelentés
Szerző OSZK Webarchiválási munkacsoport
Dátum 2017
Intézmény Országos Széchényi Könyvtár
Hozzáadás dátuma 2021. 08. 09. 8:43:38
Módosítás dátuma 2021. 08. 09. 8:43:38

Az OSZK Webarchívum új honlapjának felépítése és szolgáltatásai

Típus Folyóiratcikk
Szerző Márton Németh
URL https://epa.oszk.hu/01300/01367/00329/pdf/EPA01367_3K_2020_06_016-026.pdf
Kötet 29.
Szám 6.
Oldalszám 16-26
Kiadvány Könyv, könyvtár, könyvtáros
Dátum 2020
Egyéb Number: 6.
Folyóirat rövid neve 3K
Hozzáférés 2020. 08. 18. 2:00:00
Könyvtár Katalógus Zotero
Nyelv hu
Hozzáadás dátuma 2021. 08. 09. 8:43:48
Módosítás dátuma 2021. 08. 09. 8:43:48

Az OSZK webarchívumának újdonságai.

Típus Folyóiratcikk
Szerző László Drótos
URL http://search.ebscohost.com/login.aspx?direct=true&db=lxh&AN=143065215&lang=hu&site=ehost-live
Szám 1
Oldalszám 67-73
Kiadvány Library Review / Konyvtari Figyelo
ISSN 00233773
Dátum Március 2020
Egyéb Number: 1
Folyóirat rövid neve Library Review / Konyvtari Figyelo
Könyvtár Katalógus EBSCOhost
Archívum Library, Information Science & Technology Abstracts
Hozzáadás dátuma 2021. 08. 09. 8:44:41
Módosítás dátuma 2021. 08. 09. 8:44:41

Az OSZK-ban folyó kísérleti webarchiválási projekt első évének tapasztalatai

Típus Folyóiratcikk
Szerző László Drótos
Szerző Márton Németh
URL http://tmt.omikk.bme.hu/tmt/article/view/7153/8156
Kötet 65
Szám 7-8
Oldalszám 389-400
Kiadvány Tudományos és műszaki tájékoztatás
ISSN 0041-3917
Dátum 2018
Egyéb Number: 7-8
Kivonat Az Országos Széchényi Könyvtárban az OKR (Országos Könyvtári Rendszer) kifejlesztése keretében 2017−2018 között zajlik egy kísérleti projekt azzal céllal, hogy Magyarországon is megteremtsük a nyilvános webhelyek tömeges archiválásának és hosszú távú megőrzésének feltételeit, elsősorban az ehhez a munkához szükséges informatikai infrastruktúrát és szakértelmet. Ezen a téren több mint 20 éves lemaradást kell ledolgoznunk, mert például az amerikai nonprofit szervezet, az Internet Archive (IA) már 1996 óta foglalkozik ezzel, és azóta példáját számos országban követték, létrehoztak nemzeti, kormányzati vagy intézményi webarchívumokat, gyakran könyvtári, levéltári irányítással vagy közreműködéssel. Az OSZK-ban a 2000-es évek közepén merült fel egy magyar internet archívum (MIA) ötlete, de az ezt előkészítő munka feltételei csak 2017 tavaszán kezdtek megteremtődni. Az egri Networkshop első napján rendezett műhelymunka vitaindító előadásában a 2018 áprilisáig eltelt egy év fejleményeiről számoltunk be, s ezeket az eredményeket és tapasztalatokat foglaljuk össze ebben a cikkben.
Hozzáadás dátuma 2021. 08. 09. 8:43:29
Módosítás dátuma 2021. 08. 09. 8:43:29

Behind the Scenes of the Global Information Society: Libraries and Big-time Politics

Típus Folyóiratcikk
Szerző Evgeniy I. Kuzmin
URL http://bibliotekovedenie.rsl.ru/jour/article/view/848
Szám 2
Oldalszám 13-18
Kiadvány Bibliotekovedenie [Library and Information Science (Russia)]
ISSN 2587-7372
Dátum 2013-04-23
Egyéb Number: 2
DOI 10.25281/0869-608X-2013-0-2-13-18
Kivonat The paper examines the challenges facing libraries in the new information environment. Accessibility and preservation of information, information ethics, promotion of media and information literacy and reading, the promotion of multilingualism and diversity in cyberspace are a reflection of the global problems, solving them libraries contribute to the creation of the information society.
Hozzáadás dátuma 2021. 08. 09. 8:42:51
Módosítás dátuma 2021. 08. 09. 8:42:51

Behind the Scenes of Web Archiving: Metadata of Harvested Websites

Típus Könyvfejezet
Szerző Emmanuel Di Pretoro
Szerző Friedel Geeraert
Szerkesztő R. Depoortere
Szerkesztő T. Gheldof
Szerkesztő D. Styven
Szerkesztő Der Van, J.Eycken
URL https://hal.archives-ouvertes.fr/hal-02124714
Hely Brussels
Kiadó Archives et Bibliothèques de Belgique – Archief- en Bibliotheekwezen in België
Oldalszám 63-74
Dátum 2019
Könyv címe Press, Trust and Understanding: the value of metadata in a digitally joined-up world
Hozzáadás dátuma 2021. 08. 09. 8:43:40
Módosítás dátuma 2021. 08. 09. 8:43:40

Big data experiments with the archived Web: Methodological reflections on studying the development of a nation's Web

Típus Folyóiratcikk
Szerző Niels Brügger
Szerző Janne Nielsen
Szerző Ditte Laursen
URL https://journals.uic.edu/ojs/index.php/fm/article/view/10384
Jogok Copyright (c) 2020 First Monday
Kiadvány First Monday
ISSN 1396-0466
Dátum 2020-02-10
DOI 10.5210/fm.v25i3.10384
Hozzáférés 2021. 07. 15. 11:01:35
Könyvtár Katalógus journals.uic.edu
Nyelv en
Kivonat This article outlines how the 'digital geography' of a nation can be studied, that is the online presence of one nation. The entire Danish Web domain and its development from 2006 to 2015 is used as a case, based on the holdings in the Danish national Web archive. The following research questions guide the investigation: What has the Danish Web domain looked like in the past, and how has it developed in the period 2006-2015? Methodologically, we investigate to what extent one can delimit 'a nation' on the Web, and what characterizes the archived Web as a historical source for academic studies, as well as the general characteristics of our specific data source. Analytically, the article introduces a design for how this type of big data analyses of an entire national Web domain can be performed. Our findings show some of the ways in which a nation's digital landscape can be mapped, ie. on size, content types and hyperlinks. On a broader canvas, this study demonstrates that with hard- and software as well as human competencies from different disciplines it is possible to perform large-scale historical studies of one of the biggest media sources of today, the World Wide Web.
Rövid cím Big data experiments with the archived Web
Hozzáadás dátuma 2021. 08. 09. 8:44:17
Módosítás dátuma 2021. 08. 09. 8:44:17

Címkék:

  • big data
  • historiography
  • Web history
  • geography
  • the World Wide Web

Big Data Processing of School Shooting Archives

Típus Dolgozat
Szerző Mohamed Farag
Szerző Pranav Nakate
Szerző Edward A. Fox
URL http://dl.acm.org/citation.cfm?doid=2910896.2925466
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 271-272
ISBN 978-1-4503-4229-2
Dátum 2016
DOI 10.1145/2910896.2925466
Kivonat Web archives about school shootings consist of webpages that may or may not be relevant to the events of interest. There are 3 main goals of this work; first is to clean the webpages, which involves getting rid of the stop words and non-relevant parts of a webpage. The second goal is to select just webpages relevant to the events of interest. The third goal is to upload the cleaned and relevant webpages to Apache Solr so that they are easily accessible. We show the details of all the steps required to achieve these goals. The results show that representative Web archives are noisy, with 2% – 40% relevant content. By cleaning the archives, we aid researchers to focus on relevant content for their analysis.
Kiadvány címe Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries – JCDL '16
Hozzáadás dátuma 2021. 08. 09. 8:41:51
Módosítás dátuma 2021. 08. 09. 8:41:51

Címkék:

  • Big Data Proce
  • Classification
  • Digital
  • Libraries.
  • ssing
  • Web Archives

Big is small, and changes slowly in Hungary

Típus Dolgozat
Szerző György Kampis
Szerző László Gulyás
Dátum 2013
Kivonat The Internet Archive is incomplete and national archives are necessary. We report a pilot study in Hungary, targeting the archiving of the public internet content of academic research institutions, and present some early analysis results, indicating that the internet based “big data” is unexpectedly small for Hungary, and furthermore that this dataset changes at a low rate. We suggest that differences in the productivity of the institutions can be safely correlated with the differences in content refreshment in their internet presence.
Kiadvány címe Coginfo 2013 Conference
Hozzáadás dátuma 2021. 08. 09. 8:41:47
Módosítás dátuma 2021. 08. 09. 8:41:47

Bit Rosie: A Case Study in Transforming Web-Based Multimedia Research into Digital Archives

Típus Folyóiratcikk
Szerző Adele Fournet
URL https://doi.org/10.17723/0360-9081-84.1.119
Kötet 84
Szám 1
Oldalszám 119-138
Kiadvány The American Archivist
ISSN 0360-9081
Dátum June 24, 2021
Egyéb Number: 1
Folyóirat rövid neve The American Archivist
DOI 10.17723/0360-9081-84.1.119
Hozzáférés 2021. 07. 15. 10:42:45
Könyvtár Katalógus Silverchair
Kivonat This article is a case study in transforming web-based multimedia research initiatives into digital institutional archives to safeguard against the unstable nature of the Internet as a long-term historical medium. The study examines the Bit Rosie digital archives at the New York University Fales Library, which was created as a collaboration between a doctoral researcher in ethnomusicology and the head music librarian at the Avery Fisher Center for Music and Media. The article analyzes how the Bit Rosie archives implements elements of both feminist and activist archival practice in a born-digital context to integrate overlooked women music producers into the archives of the recorded music industry. The case study illustrates how collaboration between cultural creators, researchers, and archivists can give legitimacy and longevity to projects and voices of cultural resistance in the internet era. To conclude, the article suggests that more researchers and university libraries can use this case study as a model in setting up institutional archival homes for the increasing number of multimedia initiatives and projects blossoming throughout the humanities and social sciences.
Rövid cím Bit Rosie
Hozzáadás dátuma 2021. 08. 09. 8:44:11
Módosítás dátuma 2021. 08. 09. 8:44:11

Book of Abstracts: #EWAVirtual 2020

Típus Jelentés
Szerző #EWA Conference Organisers
URL https://zenodo.org/record/4058013
Jogok Creative Commons Attribution 4.0 International, Open Access
Dátum 2020-10-07
Egyéb DOI: 10.5281/ZENODO.4058013
Hozzáférés 2021. 07. 15. 10:28:18
Intézmény Zenodo
Könyvtár Katalógus DOI.org (Datacite)
Nyelv en
Kivonat <strong>Engaging with Web Archives</strong>: ‘Opportunities, Challenges and Potentialities’, (#EWAVirtual), 21-22 September 2020, Maynooth University Arts and Humanities Institute, Co. Kildare, Ireland. The first international Engaging with Web Archives conference sought to: raise awareness for the use of web archives and the archived web for research and education across a broad range of disciplines and professions in the Arts, Humanities, Social Sciences, Political Science, Media Studies, Information Science, Computer Science and more; foster collaborations between web archiving initiatives, researchers, educators and IT professionals; highlight how the development of the internet and the web is intricately linked to the history of the 1990s. This is a Book of Abstracts from the two-day virtual conference, which took place in September 2020 after the original physical conference in April 2020 was postponed due to COVID-19.
Rövid cím Book of Abstracts
Hozzáadás dátuma 2021. 08. 09. 8:44:08
Módosítás dátuma 2021. 08. 09. 8:44:08

Címkék:

  • web archiving
  • web archives
  • archived web
  • research engagement
  • research of web archives
  • research with web archives

Bootstrapping Web Archive Collections from Micro-Collections in Social Media – ProQuest

Típus Weboldal
URL https://www.proquest.com/openview/80407e8fdb55962153496efb7f9dee24/1?pq-origsite=gscholar&cbl=18750&diss=y
Dátum 2021-07-15 09:29:35
Hozzáférés 2021. 07. 15. 11:29:35
Nyelv hu
Kivonat Explore millions of resources from scholarly journals, books, newspapers, videos and more, on the ProQuest Platform.
Hozzáadás dátuma 2021. 08. 09. 8:44:27
Módosítás dátuma 2021. 08. 09. 8:44:27

Bootstrapping Web Archive Collections from Social Media

Típus Dolgozat
Szerző Alexander C. Nwala
Szerző Michele C. Weigle
Szerző Michael L. Nelson
URL http://dl.acm.org/citation.cfm?doid=3209542.3209560
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 64-72
ISBN 978-1-4503-5427-1
Dátum 2018
DOI 10.1145/3209542.3209560
Kivonat Human-generated collections of archived web pages are expensive to create, but provide a critical source of information for researchers studying historical events. Hand-selected collections of web pages about events shared by users on social media offer the opportunity for bootstrapping archived collections. We investigated if collections generated automatically and semi-automatically from social media sources such as Storify, Reddit, Twitter, and Wikipedia are similar to Archive-It human-generated collections. This is a challenging task because it requires comparing collections that may cater to different needs. It is also challenging to compare collections since there are many possible measures to use as a baseline for collection comparison: how does one narrow down this list to metrics that reflect if two collections are similar or dissimilar? We identified social media sources that may provide similar collections to Archive-It human-generated collections in two main steps. First, we explored the state of the art in collection comparison and defined a suite of seven measures (Collection Characterizing Suite – CCS) to describe the individual collections. Second, we calculated the distances between the CCS vectors of Archive-It collections and the CCS vectors of collections generated automatically and semi-automatically from social media sources, to identify social media collections most similar to Archive-It collections. The CCS distance comparison was done for three topics: "Ebola Virus," "Hurricane Harvey," and "2016 Pulse Nightclub Shooting." Our results showed that social media sources such as Reddit, Storify, Twitter, and Wikipedia produce collections that are similar to Archive-It collections. Consequently, curators may consider extracting URIs from these sources in order to begin or augment collections about various news topics.
Kiadvány címe Proceedings of the 29th on Hypertext and Social Media – HT '18
Hozzáadás dátuma 2021. 08. 09. 8:42:39
Módosítás dátuma 2021. 08. 09. 8:42:39

Címkék:

  • web archiving
  • social media
  • Web Archiving
  • Collection evaluation
  • News
  • Social Media
  • news
  • collection evaluation

Born Digital Legal Deposit Policies and Practices

Típus Dolgozat
Szerző Frederick Zarndt
Szerző Dorothy Carner
Szerző Edward McCain
URL http://library.ifla.org/1905/
Hely Wrocław
Kiadó IFLA — International Federation of Library Associations and Institutions
Dátum 2017
Egyéb Citation Key: ifla1905
Kivonat In 2014, the authors surveyed the born digital content legal deposit policies and practices in 17 different countries and presented the results of the survey at the 2015 International News Media Conference hosted by the National Library of Sweden in Stockholm, Sweden, April 15-16, 2015. Three years later, the authors expanded their team and updated the survey in order to assess progress in creating or improving national policies and in implementing practices for preserving born digital content. The 2017 survey reach has been broadened to include countries that did not participate in the 2014 survey. To optimise survey design, and allow for comparability of results with previous surveys, the authors briefly review 17 efforts over the last 12 years to understand the state of digital legal deposit and broader digital preservation policies (a deeper analysis will be provided in a future paper), and then set out the logic behind the current survey.
Kiadvány címe IFLA WLIC 2017 – Wrocław, Poland – Libraries. Solidarity. Society. in Session S18 – Satellite Meeting: News Media Section.
Hozzáadás dátuma 2021. 08. 09. 8:43:31
Módosítás dátuma 2021. 08. 09. 8:43:31

Címkék:

  • web archiving
  • digital preservation
  • E-legal deposit
  • survey

Born-digital archives

Típus Folyóiratcikk
Szerző Thorsten Ries
Szerző Gábor Palkó
URL https://doi.org/10.1007/s42803-019-00011-x
Kötet 1
Szám 1
Oldalszám 1-11
Kiadvány International Journal of Digital Humanities
ISSN 2524-7840
Dátum 2019
Egyéb Number: 1
DOI 10.1007/s42803-019-00011-x
Hozzáadás dátuma 2021. 08. 09. 8:43:39
Módosítás dátuma 2021. 08. 09. 8:43:39

Bots, Seeds and People: Web Archives As Infrastructure

Típus Dolgozat
Szerző Ed Summers
Szerző Ricardo Punzalan
URL http://doi.acm.org/10.1145/2998181.2998345
Hely New York, NY, USA
Kiadó ACM
Oldalszám 821-834
ISBN 978-1-4503-4335-0
Dátum 2017-11-08
Egyéb Series Title: CSCW '17
Citation Key: Summers:2017:BSP:2998181.2998345
DOI 10.1145/2998181.2998345
Kivonat The field of web archiving provides a unique mix of human and automated agents collaborating to achieve the preservation of the web. Centuries old theories of archival appraisal are being transplanted into the sociotechnical environment of the World Wide Web with varying degrees of success. The work of the archivist and bots in contact with the material of the web present a distinctive and understudied CSCW shaped problem. To investigate this space we conducted semi-structured interviews with archivists and technologists who were directly involved in the selection of content from the web for archives. These semi-structured interviews identified thematic areas that inform the appraisal process in web archives, some of which are encoded in heuristics and algorithms. Making the infrastructure of web archives legible to the archivist, the automated agents and the future researcher is presented as a challenge to the CSCW and archival community.
Kiadvány címe Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing
Hozzáadás dátuma 2021. 08. 09. 8:43:32
Módosítás dátuma 2021. 08. 09. 8:43:32

Címkék:

  • collaboration
  • archive
  • Computer Science – Digital Libraries
  • design
  • H.3.7
  • K.4.3
  • practice
  • web

Breaking in to the mainstream: demonstrating the value of internet (and web) histories

Típus Folyóiratcikk
Szerző Jane Winters
URL http://www.tandfonline.com/doi/abs/10.1080/24701475.2017.1305713
Kötet 1
Szám 1-2
Oldalszám 173-179
Kiadvány Internet Histories
ISSN 2470-1475
Dátum 2017-01-02
Egyéb Number: 1-2
Publisher: Routledge
DOI 10.1080/24701475.2017.1305713
Hozzáadás dátuma 2021. 08. 09. 8:41:45
Módosítás dátuma 2021. 08. 09. 8:41:45

Bringing Your Physical Books to Digital Learners via the Open Library Project

Típus Folyóiratcikk
Szerző Ramune Kubilius
URL https://search.proquest.com/docview/2077576451?accountid=27464
Kötet 30
Szám 2
Oldalszám 63
Kiadvány Against the Grain
ISSN 1043-2094
Dátum 2018-04
Egyéb Number: 2
Publisher: Against the Grain, LLC
Place: Northwestern University, Galter Health Sciences Library ; Northwestern University, Galter Health Sciences Library
Nyelv English
Kivonat Kahle, the founder and digital librarian of Internet Archive, is a visionary, to be sure, and his plenary presentation in Charleston was sincere and enthusiastic. It was quite impressive to hear how many patrons visit Internet Archive each day (3-4 million), that there are 170 staff, and 500 libraries and university partners. It is not hard to believe that the average life of a web page is (only) 100 days before it is deleted or changed.
Hozzáadás dátuma 2021. 08. 09. 8:42:19
Módosítás dátuma 2021. 08. 09. 8:42:19

Címkék:

  • Web archiving
  • Library And Information Sciences
  • Archives & records

Building a Future for Our Digital Memory: A Collaborative Infrastructure for Permanent Access to Digital Heritage in The Netherlands

Típus Folyóiratcikk
Szerző Marcel Ras
Szerző Barbara Sierman
URL http://www.tandfonline.com/doi/full/10.1080/13614576.2015.1114828
Kötet 20
Szám 1-2
Oldalszám 219-228
Kiadvány New Review of Information Networking
ISSN 1361-4576
Dátum 2015-07-03
Egyéb Number: 1-2
DOI 10.1080/13614576.2015.1114828
Kivonat This article describes the developments in The Netherlands to establish a national Network for Digital Heritage. This network is based on three pillars: to make the digital heritage visible, usable, and sustainably preserved. Three working programs will have their own but integrated set of dedicated actions in order to create a national infrastructure in The Netherlands, based on an optimal use of existing facilities. In this article the focus is on the activities related to the sustainable preservation of the Dutch national digital heritage.
Hozzáadás dátuma 2021. 08. 09. 8:41:54
Módosítás dátuma 2021. 08. 09. 8:41:54

Building a Living, Breathing Archive:A Review of Appraisal Theories and Approaches for Web Archives

Típus Folyóiratcikk
Szerző Colin Post
URL https://search.proquest.com/docview/1940603266?accountid=27464
Kötet 46
Szám 2
Oldalszám 69-77
Kiadvány Preservation, Digital Technology & Culture
ISSN 21952957
Dátum 2017
Egyéb Number: 2
Publisher: Walter de Gruyter GmbH
Place: Berlin
DOI http://dx.doi.org/10.1515/pdtc-2016-0031
Nyelv English
Kivonat The paper provides a review of published literature on the collection and development of Web archives, focusing specifically on the theories, techniques, tools, and approaches used to appraise Web-based materials for inclusion in collections. Facing an enormous amount of Web-based materials, archival institutions and other cultural heritage institutions need to devise methods to actively select Webpages for preservation, creating Web archives that constitute a cultural record of the Web for the benefit of users. This review outlines the challenges of collecting and appraising Web-based materials, places the theories and activities of collecting Web-based materials within the broader discourse of archival appraisal, and points out directions for future research and critical discourse for Web archives.
Hozzáadás dátuma 2021. 08. 09. 8:43:01
Módosítás dátuma 2021. 08. 09. 8:43:01

Címkék:

  • Web archiving
  • web archiving
  • web archives
  • Archives
  • Library And Information Sciences
  • 3.2:ARCHIVES
  • appraisal
  • Archival appraisal
  • Cultural resources
  • Literature reviews

Building a story tracer out of a web archive

Típus Dolgozat
Szerző Lian'en Huang
Szerző Jonathan J. H. Zhu
Szerző Xiaoming Li
URL http://portal.acm.org/citation.cfm?doid=1378889.1379000
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 455
ISBN 978-1-59593-998-2
Dátum 2008
DOI 10.1145/1378889.1379000
Kivonat There are quite a few web archives around the world, such as Internet Archive and Web InfoMall (http://www.infomall.cn). Nevertheless, we have not seen substantial mechanism built on top of the archives to render the value of the data beyond what the Wayback machine offers. One of the reasons for this situation is the lack of a system vision and design which encompasses the oceanic data in a meaningful and cost-effective way. This paper describes an effort in this direction.
Kiadvány címe Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries – JCDL '08
Hozzáadás dátuma 2021. 08. 09. 8:43:23
Módosítás dátuma 2021. 08. 09. 8:43:23

Címkék:

  • Web archive
  • Text mining

Building and querying semantic layers for web archives (extended version)

Típus Folyóiratcikk
Szerző Pavlos Fafalios
Szerző Helge Holzmann
Szerző Vaibhav Kasturia
Szerző Wolfgang Nejdl
Kötet 19
Szám 1
Oldalszám 1-19
Kiadvány International Journal on Digital Libraries
ISSN 14321300
Dátum 2018
Egyéb Number: 1
ISBN: 9781538638613
DOI 10.1007/s00799-018-0251-0
Kivonat © 2017 IEEE. Web archiving is the process of collecting portions of the Web to ensure that the information is preserved for future exploitation. However, despite the increasing number of web archives worldwide, the absence of efficient and meaningful exploration methods still remains a major hurdle in the way of turning them into a usable and useful information source. In this paper, we focus on this problem and propose an RDF/S model and a distributed framework for building semantic profiles (layers) that describe semantic information about the contents of web archives. A semantic layer allows describing metadata information about the archived documents, annotating them with useful semantic information (like entities, concepts and events), and publishing all this data on the Web as Linked Data. Such structured repositories offer advanced query and integration capabilities and make web archives directly exploitable by other systems and tools. To demonstrate their query capabilities, we build and query semantic layers for three different types of web archives. An experimental evaluation showed that a semantic layer can answer information needs that existing keyword-based systems are not able to sufficiently satisfy.
Hozzáadás dátuma 2021. 08. 09. 8:41:44
Módosítás dátuma 2021. 08. 09. 8:41:44

Címkék:

  • Web archives
  • Exploratory search
  • Linked data
  • Profiling
  • Semantic layer

Building community at distance: a datathon during COVID-19

Típus Folyóiratcikk
Szerző Samantha Fritz
Szerző Ian Milligan
Szerző Nick Ruest
Szerző Jimmy Lin
URL https://doi.org/10.1108/DLP-04-2020-0024
Kötet 36
Szám 4
Oldalszám 415-428
Kiadvány Digital Library Perspectives
ISSN 2059-5816
Dátum 2020-01-01
Egyéb Number: 4
Publisher: Emerald Publishing Limited
DOI 10.1108/DLP-04-2020-0024
Hozzáférés 2021. 07. 15. 11:30:58
Könyvtár Katalógus Emerald Insight
Kivonat Purpose This paper aims to use the experience of an in-person event that was forced to go virtual in the wake of COVID-19 as an entryway into a discussion on the broader implications around transitioning events online. It gives both practical recommendation to event organizers as well as broader reflections on the role of digital libraries during the COVID-19 pandemic and beyond. Design/methodology/approach The authors draw on their personal experiences with the datathon, as well as a comprehensive review of literature. The authors provide a candid assessment of what approaches worked and which ones did not. Findings A series of best practices are provided, including factors for assessing whether an event can be run online; the mixture of synchronous versus asynchronous content; and important technical questions around delivery. Focusing on a detailed case study of the shift of the physical team-building exercise, the authors note how cloud-based platforms were able to successfully assemble teams and jumpstart online collaboration. The existing decision to use cloud-based infrastructure facilitated the event’s transition as well. The authors use these examples to provide some broader insights on meaningful content delivery during the COVID-19 pandemic. Originality/value Moving an event online during a novel pandemic is part of a broader shift within the digital libraries’ community. This paper thus provides a useful professional resource for others exploring this shift, as well as those exploring new program delivery in the post-pandemic period (both due to an emphasis on climate reduction as well as reduced travel budgets in a potential period of financial austerity).
Rövid cím Building community at distance
Hozzáadás dátuma 2021. 08. 09. 8:44:27
Módosítás dátuma 2021. 08. 09. 8:44:27

Címkék:

  • Web archives
  • COVID-19
  • Datathon
  • Interdisciplinary
  • Online events
  • Team formation

Building Companionship Between Community and Personal Archiving: Strengthening Personal Digital Archiving Support in Community-Based Mobile Digitization Projects

Típus Folyóiratcikk
Szerző Ruohua Han
URL http://10.0.5.235/pdtc-2018-0014
Kötet 48
Szám 1
Oldalszám 6-16
Kiadvány Preservation, Digital Technology & Culture
ISSN 2195-2965
Dátum 2019-03-26
Egyéb Number: 1
Publisher: De Gruyter
DOI 10.1515/pdtc-2018-0014
Kivonat The interconnectedness between personal digital archiving (PDA) and community-based digital archiving provides an entry point for thinking about how to better bridge the two within single projects. Flexibility and sustainability are dimensions that warrant special consideration to support PDA within community-based digital archiving projects. This paper examines the flexibility and sustainability of two community-based mobile digitization projects (Culture in Transit and Georgia HomePLACE DigiKits) in supporting PDA. The assessment shows that the projects are in a good position to support PDA, with only some concerns about ensuring sustainable access to digitization equipment and sufficient guidance in long-term preservation. Drawing from this work, I propose three ways community-based mobile digitization projects can be redesigned to further strengthen their support for PDA without undermining their community-based objectives. The goal of this paper is to demonstrate the value in considering connections and differences between community and personal archiving needs in current and future projects, and calls for further coordination of efforts and collaboration to build better collaboration between community and personal archiving.
Hozzáadás dátuma 2021. 08. 09. 8:43:40
Módosítás dátuma 2021. 08. 09. 8:43:40

Címkék:

  • Collaboration
  • Community archiving
  • COMMUNITY involvement
  • DIGITAL preservation
  • DIGITIZATION of archival materials
  • Mobile digitization units
  • Personal digital archiving (PDA)
  • TECHNOLOGICAL innovations
  • WEB archiving

Building Entity-centric Event Collections

Típus Dolgozat
Szerző Federico Nanni
Szerző Simone Paolo Ponzetto
Szerző Laura Dietz
URL http://dl.acm.org/citation.cfm?id=3200334.3200356
Hely Piscataway, NJ, USA
Kiadó IEEE Press
Oldalszám 199-208
ISBN 978-1-5386-3861-3
Dátum 2017
Egyéb Series Title: JCDL '17
Citation Key: Nanni:2017:BEE:3200334.3200356
Kivonat Web archives preserve an unprecedented abundance of materials regarding major events and transformations in our society. In this paper, we present an approach for building event-centric sub-collections from such large archives, which includes not only the core documents related to the event itself but, even more importantly, documents describing related aspects (e.g., premises and consequences). This is achieved by 1) identifying relevant concepts and entities from a knowledge base, and 2) detecting their mentions in documents, which are interpreted as indicators for relevance. We extensively evaluate our system on two diachronic corpora, the New York Times Corpus and the US Congressional Record, and we test its performance on the TREC KBA Stream corpus, a large and publicly available web archive.
Kiadvány címe Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:07
Módosítás dátuma 2021. 08. 09. 8:43:07

Building NED: Open Access to Australia’s Digital Documentary Heritage

Típus Folyóiratcikk
Szerző Barbara Lemon
Szerző Kerry Blinco
Szerző Brendan Somes
URL https://www.mdpi.com/2304-6775/8/2/19
Jogok http://creativecommons.org/licenses/by/3.0/
Kötet 8
Szám 2
Oldalszám 19
Kiadvány Publications
Dátum 2020/6
Egyéb Number: 2
Publisher: Multidisciplinary Digital Publishing Institute
DOI 10.3390/publications8020019
Hozzáférés 2021. 07. 15. 11:27:40
Könyvtár Katalógus www.mdpi.com
Nyelv en
Kivonat This article charts the development of Australia&rsquo;s national edeposit service (NED), from concept to reality. A world-first collaboration between the national, state and territory libraries of Australia, NED was launched in 2019 and transformed our approach to legal deposits in Australia. NED is more than a repository, operating as a national online service for depositing, preserving and accessing Australian electronic publications, with benefits to publishers, libraries and the public alike. This article explains what makes NED unique in the context of global research repository infrastructure, outlining the ways in which NED member libraries worked to balance user needs with technological capacity and the variations within nine sets of legal deposit legislation.
Rövid cím Building NED
Hozzáadás dátuma 2021. 08. 09. 8:44:26
Módosítás dátuma 2021. 08. 09. 8:44:26

Címkék:

  • legal deposit
  • digital heritage
  • Australia
  • electronic publications
  • open repository

Building the Foundation: Creating an Electronic-Records Program at the University of Miami

Típus Folyóiratcikk
Szerző Laura Capell
URL https://search.proquest.com/docview/1755071188?accountid=27464
Kötet 35
Szám 9
Oldalszám 28-32
Kiadvány Computers in Libraries
ISSN 10417915
Dátum 2015-11
Egyéb Number: 9
Publisher: Information Today, Inc.
Place: Westport
Nyelv English
Kivonat Developing and implementing effective strategies to manage electronic records (e-records) is one of the biggest challenges facing the archives field today, as they acquire growing quantities of contemporary records generated by an increasingly digital society. However, jumping into e-records archiving can be a daunting task. As the author's continue to move through the pilot project and develop their policies and procedures for born-digital content, they're looking ahead at the next steps. First of all, they want to build more robust digital forensics workflows, including exploring methods for more extensive analysis of their digital content and developing workflows to handle a wider range of media and formats. Second, they want to use the results of their survey to start processing legacy media in their collections. Finally, they want to explore more options for providing access so that they can effectively make a wide range of born-digital content available for research.
Hozzáadás dátuma 2021. 08. 09. 8:42:16
Módosítás dátuma 2021. 08. 09. 8:42:16

Címkék:

  • Web archiving
  • Library And Information Sciences–Computer Applica
  • Academic libraries
  • Archives & records
  • Library collections
  • Social networks
  • Colleges & universities
  • Archivists
  • Digital video
  • Electronic records
  • Pilot projects
  • Special collections
  • Video recordings

Building Web Corpora for Minority Languages

Típus Dolgozat
Szerző Heidi Jauhiainen
Szerző Tommi Jauhiainen
Szerző Krister Lindén
URL https://aclanthology.org/2020.wac-1.4
Hely Marseille, France
Kiadó European Language Resources Association
Oldalszám 23–32
ISBN 979-10-95546-68-9
Dátum 2020-05
Hozzáférés 2021. 07. 15. 11:10:38
Könyvtár Katalógus ACLWeb
Nyelv English
Kivonat Web corpora creation for minority languages that do not have their own top-level Internet domain is no trivial matter. Web pages in such minority languages often contain text and links to pages in the dominant language of the country. When building corpora in specific languages, one has to decide how and at which stage to make sure the texts gathered are in the desired language. In the “Finno-Ugric Languages and the Internet” (Suki) project, we created web corpora for Uralic minority languages using web crawling combined with a language identification system in order to identify the language while crawling. In addition, we used language set identification and crowdsourcing before making sentence corpora out of the downloaded texts. In this article, we describe a strategy for collecting textual material from the Internet for minority languages. The strategy is based on the experiences we gained during the Suki project.
Kiadvány címe Proceedings of the 12th Web as Corpus Workshop
Hozzáadás dátuma 2021. 08. 09. 8:44:20
Módosítás dátuma 2021. 08. 09. 8:44:20

Can we find documents in web archives without knowing their contents?

Típus Dolgozat
Szerző Khoi Duy Vo
Szerző Tuan Tran
Szerző Tu Ngoc Nguyen
Szerző Xiaofei Zhu
Szerző Wolfgang Nejdl
URL http://dl.acm.org/citation.cfm?doid=2908131.2908165
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 173-182
ISBN 978-1-4503-4208-7
Dátum 2016
DOI 10.1145/2908131.2908165
Kivonat Recent advances of preservation technologies have led to an increasing number of Web archive systems and collections. These collections are valuable to explore the past of the Web, but their value can only be uncovered with effective access and exploration mechanisms. Ideal search and ranking methods must be robust to the high redundancy and the temporal noise of contents, as well as scalable to the huge amount of data archived. Despite several attempts in Web archive search, facilitating access to Web archive still remains a challenging problem. In this work, we conduct a first analysis on different ranking strategies that exploit evidences from metadata instead of the full content of documents. We perform a first study to compare the usefulness of non-content evidences to Web archive search, where the evidences are mined from the metadata of file headers, links and URL strings only. Based on these findings, we propose a simple yet surprisingly effective learning model that combines multiple evidences to distinguish "good" from "bad" search results. We conduct empirical experiments quantitatively as well as qualitatively to confirm the validity of our proposed method, as a first step towards better ranking in Web archives taking metadata into account.
Kiadvány címe Proceedings of the 8th ACM Conference on Web Science – WebSci '16
Hozzáadás dátuma 2021. 08. 09. 8:43:32
Módosítás dátuma 2021. 08. 09. 8:43:32

Címkék:

  • Feature Analysis
  • Temporal Ranking
  • Web Archive Search

Can we write a cultural history of the Internet? If so, how?

Típus Folyóiratcikk
Szerző Fred Turner
URL http://www.tandfonline.com/doi/abs/10.1080/24701475.2017.1307540
Kötet 1
Szám 1-2
Oldalszám 39-46
Kiadvány Internet Histories
ISSN 2470-1475
Dátum 2017-01-02
Egyéb Number: 1-2
Publisher: Routledge
DOI 10.1080/24701475.2017.1307540
Hozzáadás dátuma 2021. 08. 09. 8:41:46
Módosítás dátuma 2021. 08. 09. 8:41:46

Can web presence predict academic performance?

Típus Dolgozat
Szerző László Gulyás
Szerző Zsolt Jurányi
Szerző Sándor Soós
Szerző George Kampis
URL http://dl.acm.org/citation.cfm?doid=2567948.2579037
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 1183-1188
ISBN 978-1-4503-2745-9
Dátum 2014
DOI 10.1145/2567948.2579037
Kivonat This paper reports the preliminary results of a project that aims at incorporating the analysis of the web presence (content) of research institutions into the scientometric analysis of these research institutions. The problem is to understand and predict the dynamics of academic activity and resource allocation using web presence. The present paper approaches this problem in two parts. First we develop a crawler and an archive of the web contents obtained from academic institutions, and present an early analysis of the records. Second, we use (currently off-line records to analyze the dynamics of resource allocation. Combination of the two parts is an ambition of ongoing work. The motivation in this study is twofold. First, we strongly believe that independent archiving, indexing and searching of (past) web content is an important task, even with regards to academic web presence. We are particularly interested in studying the dynamics of the ”online scientific discourse”, based on the assumption that the changing traces of web presence is an important factor that documents the intensity of activity. Second, we maintain that the trend-analysis of scientific activity represents a hitherto unused potential. We illustrate this by a pilot where, using ’offline’ longitudinal datasets, we study whether past (i.e. cumulative) success can predict current (and future) activity in academia. Or, in short: do institutions invest and publish in areas where they have been successful? Answer to this question is, we believe, important to understanding and predicting research policies and their changes.
Kiadvány címe Proceedings of the 23rd International Conference on World Wide Web – WWW '14 Companion
Hozzáadás dátuma 2021. 08. 09. 8:41:47
Módosítás dátuma 2021. 08. 09. 8:41:47

Capture All the URLs: First Steps in Web Archiving

Típus Folyóiratcikk
Szerző Alexis Antracoli
Szerző Steven Duckworth
Szerző Judith Silva
Szerző Kristen Yarmey
URL https://search.proquest.com/docview/1634873262?accountid=27464
Kötet 2
Szám 2
Oldalszám 155-170
Kiadvány Pennsylvania Libraries
Dátum 2014
Egyéb Number: 2
Publisher: University Library System, University of Pittsburgh
Place: Pittsburgh
DOI http://dx.doi.org/10.5195/palrap.2014.67
Nyelv English
Kivonat As higher education embraces new technologies, university activities–including teaching, learning, and research–increasingly take place on university websites, on university-related social media pages, and elsewhere on the open Web. Despite perceptions that "once it's on the Web, it's there forever," this dynamic digital content is highly vulnerable to degradation and loss. In order to preserve and provide enduring access to this complex body of university records, archivists and librarians must rise to the challenge of Web archiving. As digital archivists at our respective institutions, the authors introduce the concept of Web archiving and articulate its importance in higher education. We provide our institutions' rationale for selecting subscription service Archive-It as a preservation tool, outline the progress of our institutional Web archiving initiatives, and share lessons learned, from unexpected stumbling blocks to strategies for raising funds and support from campus stakeholders.
Hozzáadás dátuma 2021. 08. 09. 8:42:13
Módosítás dátuma 2021. 08. 09. 8:42:13

Címkék:

  • Digital libraries
  • Library And Information Sciences
  • Academic libraries
  • Archives & records
  • URLs
  • Library science
  • Higher education

Capturing the Web at Large A Critique of Current Web Referencing Practices

Típus Dolgozat
Szerző Caroline Nyvang
Szerző Thomas Kromann Hvid
Szerző Eld Zierau
Dátum 2017
Kivonat The Internet and the cultural phenomena that exist online are increasingly attracting academic awareness, and e-materials both supplement and replace physical materials. These new opportunities come with a range of challenges. Websites are connected in new and unfamiliar ways, the amount of data easily surpasses what we have experienced previously, and we do not yet have an infrastructure that can lend prober support to the increased scholarly use of web resources [1-2]. This paper is an attempt to grapple with one of the core challenges, namely our ability to provide precise and persistent references to web material.1 The paper charts prevailing ideals and practices regarding web references within the Humanities. We highlight the challenges based on an analysis of web references in two case studies – a selection of Danish master’s theses from 2015 and academic books on contemporary Danish literature. We propose a new best practice that is consistent with good scientific practice in terms of both precision and persistency, which cannot be obtained following the existing standards.
Kiadvány címe “Researchers, pratictioners and their use of the archived web”, London, School of Advanced Study, University of London
Hozzáadás dátuma 2021. 08. 09. 8:41:47
Módosítás dátuma 2021. 08. 09. 8:41:47

Carbon Dating the Web: Estimating the Age of Web Resources

Típus Dolgozat
Szerző Hany M SalahEldeen
Szerző Michael L Nelson
URL http://doi.acm.org/10.1145/2487788.2488121
Hely New York, NY, USA
Kiadó ACM
Oldalszám 1075-1082
ISBN 978-1-4503-2038-2
Dátum 2013
Egyéb Series Title: WWW '13 Companion
Citation Key: SalahEldeen:2013:CDW:2487788.2488121
DOI 10.1145/2487788.2488121
Kivonat In the course of web research it is often necessary to estimate the creation datetime for web resources (in the general case, this value can only be estimated). While it is feasible to manually establish likely datetime values for small numbers of resources, this becomes infeasible if the collection is large. We present "carbon date", a simple web application that estimates the creation date for a URI by polling a number of sources of evidence and returning a machine-readable structure with their respective values. To establish a likely datetime, we poll bitly for the first time someone shortened the URI, topsy for the first time someone tweeted the URI, a Memento aggregator for the first time it appeared in a public web archive, Google's time of last crawl, and the Last-Modified HTTP response header of the resource itself. We also examine the backlinks of the URI as reported by Google and apply the same techniques for the resources that link to the URI. We evaluated our tool on a gold standard data set of 1200 URIs in which the creation date was manually verified. We were able to estimate a creation date for 75.90% of the resources, with 32.78% having the correct value. Given the different nature of the URIs, the union of the various methods produces the best results. While the Google last crawl date and topsy account for nearly 66% of the closest answers, eliminating the web archives or Last-Modified from the results produces the largest overall negative impact on the results. The carbon date application is available for download or use via a web API.
Kiadvány címe Proceedings of the 22Nd International Conference on World Wide Web
Hozzáadás dátuma 2021. 08. 09. 8:43:10
Módosítás dátuma 2021. 08. 09. 8:43:10

Címkék:

  • social media
  • memento
  • archiving
  • creation dates

Case Studies in Web Sustainability

Típus Folyóiratcikk
Szerző Scott Turner
URL https://search.proquest.com/docview/1680141236?accountid=27464
Szám 70
Kiadvány Ariadne
ISSN 1361-3200, 1361-3200
Dátum 2012-11
Egyéb Number: 70
Publisher: UK office for Library and Information Networking (UKOLN), University of Bath, United Kingdom
Nyelv English
Kivonat At the moment organisations often make significant investments in producing Web-based material, often funded through public money, for example from JISC. We are seeing cuts in funding or changes in governmental policy, which is resulting in the closure of some of these organisations. What happens to those Web resources when the organisations are no longer in existence? Public money has often been used to develop these resources – from that perspective it would be a shame to lose them. Moreover, the resources might be needed or someone may actually want to take over the maintenance of the site at a later date. JISC previously funded three projects to look at this area through a programme called Sustaining at risk online resources [1]. One of these projects, which ran at The University of Northampton, looked into rescuing one of the recently closed East Midlands Universities Associations online resources. This resource, called East Midlands Knowledge Network (EMKN), lists many of the knowledge transfer activities of 10 of the East Midlands universities. The project looked at options on how to migrate the site to a free hosting option to make it make it more sustainable even when it is no longer available on the original host's servers. This article looks at this work as a case study on Web sustainability and also included a case study of another project where Web sustainability was central. Adapted from the source document.
Hozzáadás dátuma 2021. 08. 09. 8:42:03
Módosítás dátuma 2021. 08. 09. 8:42:03

Címkék:

  • Web archiving
  • Preservation
  • 9.15: TECHNICAL SERVICES – PRESERVATION
  • article
  • Universities
  • Projects
  • Web hosting

Challenges and Opportunities within Personal Life Archives

Típus Dolgozat
Szerző Duc-Tien Dang-Nguyen
Szerző Michael Riegler
Szerző Liting Zhou
Szerző Cathal Gurrin
URL http://dl.acm.org/citation.cfm?doid=3206025.3206040
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 335-343
ISBN 978-1-4503-5046-4
Dátum 2018
DOI 10.1145/3206025.3206040
Kivonat Nowadays, almost everyone holds some form or other of a personal life archive. Automatically maintaining such an archive is an activity that is becoming increasingly common, however without automatic support the users will quickly be overwhelmed by the volume of data and will miss out on the potential benefits that lifelogs provide. In this paper we give an overview of the current status of lifelog research and propose a concept for exploring these archives. We motivate the need for new methodologies for indexing data, organizing content and supporting information access. Finally we will describe challenges to be addressed and give an overview of initial steps that have to be taken, to address the challenges of organising and searching personal life archives.
Kiadvány címe Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval – ICMR '18
Hozzáadás dátuma 2021. 08. 09. 8:42:41
Módosítás dátuma 2021. 08. 09. 8:42:41

Címkék:

  • Lifelogging
  • Personal Life Archive
  • Search Engine

Challenges for the national, regional and thematic Web Archiving and their Use

Típus Folyóiratcikk
Szerző Thomas Risse
Szerző Wolfgang Nejdl
Kötet 62
Oldalszám 160-171
Kiadvány Zeitschrift für Bibliothekswesen und Bibliographie
Dátum 2015
Folyóirat rövid neve Zeitschrift für Bibliothekswesen und Bibliographie
Könyvtár Katalógus ResearchGate
Kivonat The World Wide Web is well established as a global information and communication medium. New technologies regularly come along which expand the forms of use and permit even inexperienced users to publish content or take part in discussions. For this reason the Web can also be seen as a good documenter of present-day society. The dynamism of the Web means that its content is, by its very nature, transitory, and new technologies and forms of use regularly present new challenges for the collection of web content for web archiving. Static pages still dominated in the early days of web archiving, whereas many dynamic types of content have now arisen which integrate information from different sources. There is now growing interest from various research disciplines in conventional domain-oriented web harvesting, in thematic web collections and in their use and exploration. This article examines a number of challenges and possible methods of collecting thematic and dynamic content from the Web and social media. Current problems which have arisen in academic use are discussed, and it is shown how web archives and other temporal collections can be searched more effectively.
Hozzáadás dátuma 2021. 08. 09. 8:43:51
Módosítás dátuma 2021. 08. 09. 8:43:51

Challenges of archiving and preserving born-digital news applications

Típus Folyóiratcikk
Szerző Katherine Boss
Szerző Meredith Broussard
URL https://search.proquest.com/docview/1900646766?accountid=27464
Kötet 43
Szám 2
Oldalszám 150-157
Kiadvány IFLA Journal
ISSN 0340-0352
Dátum 2017-06
Egyéb Number: 2
Publisher: Sage Publications Ltd.
Place: New York University Libraries, USA ; New York University, Arthur L. Carter Journalism Institute, USA ; New York University Libraries, USA
DOI http://dx.doi.org/10.1177/0340035216686355
Nyelv English
Kivonat Born-digital news content is increasingly becoming the format of the first draft of history. Archiving and preserving this history is of paramount importance to the future of scholarly research, but many technical, legal, financial, and logistical challenges stand in the way of these efforts. This is especially true for news applications, or custom-built websites that comprise some of the most sophisticated journalism stories today, such as the "Dollars for Docs" project by ProPublica. Many news applications are standalone pieces of software that query a database, and this significant subset of apps cannot be archived in the same way as text-based news stories, or fully captured by web archiving tools such as Archive-It. As such, they are currently disappearing. This paper will outline the various challenges facing the archiving and preservation of born-digital news applications, as well as outline suggestions for how to approach this important work.
Hozzáadás dátuma 2021. 08. 09. 8:43:01
Módosítás dátuma 2021. 08. 09. 8:43:01

Címkék:

  • Web archiving
  • web archiving
  • Library And Information Sciences
  • 3.2:ARCHIVES
  • Born-digital news
  • Computer software
  • Journalism
  • news applications
  • News coverage
  • Preservation
  • Scholarly publishing
  • software preservation
  • TCP-IP

Changes in Web Content in First 20 NIRF Ranking Institutes During 2010-19: an Analysis

Típus Folyóiratcikk
Szerző Subrata Gangopadhyay
URL https://www.proquest.com/scholarly-journals/changes-web-content-first-20-nirf-ranking/docview/2447005010/se-2?accountid=15756
Oldalszám 1-9
Kiadvány Library Philosophy and Practice
Dátum May 2020
Pontos lelőhely 2447005010
Egyéb Place: Lincoln
Publisher: Library Philosophy and Practice
Nyelv English
Kivonat Web content is an important source for education and research. At present it is a mandatory requirement for higher learning institutes of India to present information on their institutional home page. Due to dynamic nature of web content and increase use of emerging technology, the new ways of presenting information on higher education web sites become complex. In this paper, we try to study the changes in web content during last decade in first 20 NIRF ranking institute. The Internet Archives Wayback Machine has been used to get the web site update dates and the content of archived web pages.
Archívum ProQuest One Academic; Publicly Available Content Database
Hozzáadás dátuma 2021. 08. 09. 8:44:39
Módosítás dátuma 2021. 08. 09. 8:44:39

Címkék:

  • Web archiving
  • World Wide Web
  • Digital archives
  • Library And Information Sciences
  • Internet
  • National libraries
  • Web sites
  • Students
  • Higher education
  • India
  • Information dissemination

Choosing a Sustainable Web Archiving Method: A Comparison of Capture Quality

Típus Folyóiratcikk
Szerző Gabriella Gray
Szerző Scott Martin
URL https://search.proquest.com/docview/1735638237?accountid=27464
Kötet 19
Szám 5-6
Kiadvány D-Lib Magazine
ISSN 1082-9873, 1082-9873
Dátum 2013-05
Egyéb Number: 5-6
Publisher: Corporation for National Research Initiatives, Reston, VA
Place: UCLA gsgray@library.ucla.edu
DOI http://dx.doi.org/10.1045/may2013-gray
Nyelv English
Kivonat The UCLA Online Campaign Literature Archive has been collecting websites from Los Angeles and California elections since 1998. Over the years the number of websites created for these campaigns has soared while the staff manually capturing the websites has remained constant. By 2012 it became apparent that we would need to find a more sustainable model if we were to continue to archive campaign websites. Our ideal goal was to find an automated tool that could match the high quality captures produced by the Archive's existing labor-intensive manual capture process. The tool we chose to investigate was the California Digital Library's Web Archiving Service (WAS). To test the quality of WAS captures we created a duplicate capture of the June 2012 California election using both WAS and our manual capture and editing processes. We then compared the results from the two captures to measure the relative quality of the two captures. This paper presents the results of our findings and contributes a unique empirical analysis of the quality of websites archived using two divergent web archiving methods and sets of tools. Adapted from the source document.
Hozzáadás dátuma 2021. 08. 09. 8:42:16
Módosítás dátuma 2021. 08. 09. 8:42:16

Címkék:

  • Web archiving
  • Web Archiving
  • Web sites
  • 9.15: TECHNICAL SERVICES – PRESERVATION
  • article
  • Methods
  • CDL Web Archiving Service
  • Comparisons
  • Politics
  • Quality
  • UCLA Online Campaign Literature Archive

Citizen Web Archiving: Empowering Undergraduates to Preserve the Internet

Típus Folyóiratcikk
Szerző Kayla Harris
Szerző Stephanie Shreffler
Szerző Christina A Beis
URL https://ecommons.udayton.edu/imri_faculty_presentations/24?utm_source=ecommons.udayton.edu%2Fimri_faculty_presentations%2F24&utm_medium=PDF&utm_campaign=PDFCoverPages
Szám 24
Oldalszám 2
Kiadvány Marian Library Faculty Presentations
Dátum 2021
Egyéb Number: 24
Hozzáférés 2021. 08. 06. 2:00:00
Könyvtár Katalógus Zotero
Nyelv en
Hozzáadás dátuma 2021. 08. 09. 8:43:56
Módosítás dátuma 2021. 08. 09. 8:43:56

Client-side Reconstruction of Composite Mementos Using Serviceworker

Típus Dolgozat
Szerző Sawood Alam
Szerző Mat Kelly
Szerző Michele C Weigle
Szerző Michael L Nelson
URL http://dl.acm.org/citation.cfm?id=3200334.3200361
Hely Piscataway, NJ, USA
Kiadó IEEE Press
Oldalszám 237-240
ISBN 978-1-5386-3861-3
Dátum 2017
Egyéb Series Title: JCDL '17
Citation Key: Alam:2017:CRC:3200334.3200361
Kivonat We use the ServiceWorker (SW) API to intercept HTTP requests for embedded resources and reconstruct Composite Mementos without the need for conventional URL rewriting typically per- formed by web archives. URL rewriting is a problem for archival replay systems, especially for URLs constructed by JavaScript, that frequently results in incorrect URI references. By intercept- ing requests on the client using SW, we are able to strategically reroute instead of rewrite. Our implementation moves rewrit- ing to clients, saving servers’ computing resources and allowing servers to return responses more quickly. In our experiments, re- trieving the original instead of rewritten pages from the archive resulted in a one-third reduction in time overhead and a one-fifth reduction in data overhead. Our system, reconstructive.js , prevents the live web from leaking into Composite Mementos while being easy to distribute and maintain.
Kiadvány címe Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:15
Módosítás dátuma 2021. 08. 09. 8:43:15

Címkék:

  • memento
  • archival replay
  • composite memento
  • serviceworker
  • web archive

Collaborative collection development: current perspectives leading to future initiatives

Típus Folyóiratcikk
Szerző Helen N. Levenson
Szerző Amanda Nichols Hess
URL https://www.sciencedirect.com/science/article/pii/S009913332030104X
Kötet 46
Szám 5
Oldalszám 102201
Kiadvány The Journal of Academic Librarianship
ISSN 0099-1333
Dátum September 1, 2020
Egyéb Number: 5
Folyóirat rövid neve The Journal of Academic Librarianship
DOI 10.1016/j.acalib.2020.102201
Hozzáférés 2021. 07. 15. 11:24:41
Könyvtár Katalógus ScienceDirect
Nyelv en
Kivonat As academic libraries continue to face acquisition budget challenges, collaborative collection development (CCD) offers greater opportunities to fulfill the core role of library collecting and collection management, namely, to provide enhanced access to the widest variety of relevant resources in the most cost-responsible manner possible. Libraries have successfully implemented CCD projects of various types, and as a result, have achieved these needed cost savings. The authors conducted survey research to investigate current CCD activities and librarians' perceptions of its benefits, drawbacks, elements contributing to successful CCD programs, and possible obstacles to success. Library collections consist of a variety of material formats and librarians have applied CCD models to maintain needed access to these resources, shifting from ownership to access, all in support of building collective collections. The survey results found that, although challenges can exist, application of CCD activities have realized substantial benefits, financial and otherwise, for academic libraries overall.
Rövid cím Collaborative collection development
Hozzáadás dátuma 2021. 08. 09. 8:44:24
Módosítás dátuma 2021. 08. 09. 8:44:24

Címkék:

  • Collaborative collection development
  • Collection management
  • Collective collections
  • Cooperative collection development
  • Coordinated collection development
  • Survey research

Collect, Preserve, Access: Applying the Governing Principles of the National Archives UK Government Web Archive to Social Media Content

Típus Folyóiratcikk
Szerző Suzy Espley
Szerző Florent Carpentier
Szerző Radu Pop
Szerző Leïla Medjkoune
URL https://search.proquest.com/docview/1623367977?accountid=27464
Kötet 25
Szám 1-2
Oldalszám 31-50
Kiadvány Alexandria: The Journal of National and International Library and Information Issues
ISSN 0955-7490
Dátum 2014-08
Egyéb Number: 1-2
PMID: 1623367977
Publisher: Sage Publications Ltd.
Place: London
DOI 10.7227/ALX.0019
Nyelv English
Kivonat It is The National Archives' responsibility to collect and secure the future of the public record in all its forms and to make it as accessible as possible. The UK Government Web Archive1 (UKGWA) effectively preserves the open digital record. This article will explore the challenges encountered, and the Application Programming Interface (API) based solutions developed, by The National Archives and the Internet Memory Foundation (IMF) in the completion of a pilot project to capture the record as it is published on the social media services Twitter and YouTube. An outline of the wider web archiving programme and its role within the management of the government web estate is provided. The legislative framework that guides web archiving at The National Archives is described as it has necessarily influenced the policy decisions that shaped the solutions developed. A brief overview of some comparative approaches taken by other organizations and commercial services to capturing Twitter content is also presented as context to the policy and technical solutions arrived at by the authors. The National Archives has sought to develop the building blocks of a collection whose growth can be sustained over time. The publication of this part of the archive will be followed by further evaluation and improvements to the initial approach taken.
Hozzáadás dátuma 2021. 08. 09. 8:42:02
Módosítás dátuma 2021. 08. 09. 8:42:02

Címkék:

  • web archives
  • social media
  • technology
  • Library And Information Sciences
  • government
  • public records

Collecting and preserving the Ukraine conflict (2014-2015): a web archive at University of California, Berkeley

Típus Folyóiratcikk
Szerző Liladhar R Pendse
URL https://search.proquest.com/docview/1829452180?accountid=27464
Kötet 35
Szám 3
Oldalszám 64-72
Kiadvány Collection Building
ISSN 01604953
Dátum 2016
Egyéb Number: 3
Publisher: Emerald Group Publishing Limited
Place: Bradford
Nyelv English
Kivonat Purpose The purpose of this paper is to highlight the web-archiving as a tool for possible collection development in a research level academic library. The paper highlights the web-archiving project that dealt with the contemporary Ukraine conflict. Currently, as the conflict in Ukraine drags on, the need for collecting and preserving the information from various web-based resources with different ideological orientations acquires a special importance. The demise of the Soviet Union in 1991 and the emergence of independent republics were heralded by some as a peaceful transition to the "free-market" style economies. This transition was nevertheless nuanced and not seamless. Besides the incomplete market liberalization, rent-seeking behaviors of different sort, it was also accompanied by the almost ubiquitous use of and access to the internet and the internet communication technologies. Now 24 years later, the ongoing conflict in Ukraine also appears to be unfolding on the World Wide Web. With the Russian annexation of Crimea and its unification to the Russian Federation, the governmental and non-governmental websites of the Ukrainian Crimea suddenly came to represent a sort of "an endangered archive". Design/methodology/approach The main purpose of this project was to make the information that is contained in Ukrainian and Russia websites available to the wider body of scholars and students over the longer period of time in a web archive. The author does not take any ideological stance on the legal status of Crimea or on the ongoing conflict in Ukraine. There are currently several projects that are devoted to the preservation of these websites. This article also focuses on providing a survey of the landscape of these projects and highlights the ongoing web-archiving project that is entitled, "the Ukraine Crisis: 2014-2015" at the UC Berkeley Library. Findings The UC Berkeley's Ukraine Conflict Archive was made available to public in March of 2015 after enough materials were archived. The initial purpose of the archive was to selectively harvest, and archive those websites that are bound to either disappear or change significantly during the evolution of Crimea's accession to Russia. However, in the aftermath of the Crimean conflict, the ensuing of military conflict in Ukraine had forced to reevaluate the web-archiving strategy. The project was never envisioned to be a competing project to the Ukraine Conflict project. Instead, it was supposed to capture complimentary data that could have been missed by other similar projects. This web archive has been made public to provide a glimpse of what was happening and what is happening in Ukraine. Research limitations/implications Now 24 years later, the ongoing conflict in Ukraine also appears to be unfolding on the World Wide Web. With the Russian annexation of Crimea and its unification to the Russian Federation, the governmental and non-governmental websites of the Ukrainian Crimea suddenly came to represent a sort of "an endangered archive". The impetus for archiving the selected Ukrainian websites came as a result of the changing geopolitical realities of Crimea. The daily changes to the websites and also loss of information that is contained within them is one of the many problems faced by the users of these websites. In some cases, the likelihood of these websites is relatively high. This in turn was followed by the author's desire to preserve the information about the daily lives in Ukraine's east in light of the unfolding violent armed conflict. Originality/value Upon close survey of the Library and Information Sciences currently published articles on Ukraine Conflict, no articles that are currently dedicated to archiving the Crimean and Ukrainian situations were found.
Hozzáadás dátuma 2021. 08. 09. 8:41:42
Módosítás dátuma 2021. 08. 09. 8:41:42

Címkék:

  • Web archiving
  • Digital archives
  • Library And Information Sciences
  • Academic libraries
  • Web sites
  • Social networks
  • Annual reports
  • Crimea
  • Institutional repositories
  • Library and information science
  • Russia
  • Ukraine

Collecting Digital Content at the Library of Congress.

Típus Folyóiratcikk
Szerző LOC Library Services Collection Development Office
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 5
Szám 11
Oldalszám 2
Kiadvány Digital Publishing Report
Dátum 2017-03-20
Egyéb Number: 11
Publisher: Library of Congress
Kivonat In January 2017, the Library of Congress adopted a set of strategic steps related to its future acquisition of digital content. The purpose of this document is to provide background information and a high-level description of the strategy. The Library has been steadily increasing its digital collecting capacity and capability over the past two decades. This has come as the product of numerous independent efforts pointed to the same goal – acquire as much selected digital content as technically possible and make that content as broadly accessible to users as possible. In the past few years, much progress has been made, and an impressive amount of content has been acquired through several acquisitions methods. Further expansion of the Library’s digital collecting program is seen as an essential part of the institution’s strategic goal to: Acquire, preserve, and provide access to a universal collection of knowledge and the record of America’s creativity. The scope of the newly-adopted strategy is limited to actions directly involved with acquisitions and collecting. It does not cover other related actions that are essential to a successful digital collections program. These primarily include the following. • Further development of the Library’s technical infrastructure • Development of various access policies and procedures appropriate to different categories of digital content • Preservation of acquired digital content • Training and development of staff • Eventual realignment of resources to match an environment where a greater portion of the Library’s collection building program focuses on digital materials The strategy also does not cover digitization, which is the process by which the Library’s physical collections materials (printed text, images, sound on tangible formats, etc.) are converted into digital formats that can be stored and accessed via a computer.
Hozzáadás dátuma 2021. 08. 09. 8:43:03
Módosítás dátuma 2021. 08. 09. 8:43:03

Címkék:

  • WEB archiving
  • DATA transmission systems
  • LIBRARIES & publishing
  • LIBRARY acquisitions
  • LIBRARY of Congress

Collecting Pennsylvania Political Twitter Data

Típus Folyóiratcikk
Szerző Andrew M. Dudash
Szerző John E. Russell
URL http://palrap.pitt.edu/ojs/index.php/palrap/article/view/249
Jogok Copyright (c) 2021 Andrew M. Dudash, John E. Russell
Kötet 9
Szám 1
Oldalszám 4-7
Kiadvány Pennsylvania Libraries: Research & Practice
ISSN 2324-7878
Dátum 2021-06-29
Egyéb Number: 1
DOI 10.5195/palrap.2021.249
Hozzáférés 2021. 07. 15. 11:17:42
Könyvtár Katalógus palrap.pitt.edu
Nyelv en
Kivonat During the two most recent elections we have seen the importance of social media, and Twitter in particular, for political discourse. This paper describes the effort of an academic library to collect election-related Twitter data from Pennsylvania-specific organizational accounts and hashtags for 2018 and 2020 in the run-up and aftermath of both election cycles. Because of its importance to understanding contemporary politics and its historic value, libraries need to consider the opportunity to collect and make this data accessible to Pennsylvanians.
Hozzáadás dátuma 2021. 08. 09. 8:44:22
Módosítás dátuma 2021. 08. 09. 8:44:22

Collection & community building through web archiving: engaging with faculty and students in a collaborative web archiving project

Típus Jelentés
Szerző Andrea Schuler
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Hely United States, North America
Dátum 2017
Intézmény Digital USD
Kivonat Tisch Library at Tufts University has recently begun a pilot web archiving project, aiming to deepen Tufts’ collections in areas of strategic importance and support more “traditional” library collection development activities, while collecting material that is not known to be comprehensively collected by other institutions. Additionally, the project offers an opportunity for collaborative collection building with faculty and students that serves as a unique way to deepen our community‘s engagement with the library. The initial pilot collection focuses on environmental justice, selected due to its relevance to the Tufts community and curriculum and to build on existing Tisch Library collection strengths. Two undergraduate courses related to environmental justice were identified and invited to partner in the pilot project. This partnership would leverage student research to expand the initial collection while introducing students to concepts of web archiving and information literacy around websites and providing them with the opportunity to contribute to shaping the scholarly record. Both courses added a brief assignment to their syllabus: while doing research on their chosen topics, students would identify 3-7 web sites they felt would benefit from preservation and submit the sites to the library, to be evaluated and added to the web archive as appropriate. This presentation discusses the process of beginning a subject-based web archiving project, focusing on the collaborative project with two undergraduate classes. It addresses decisions made when starting and scoping the project; collection development issues; the logistics, benefits, and outcomes of the student and faculty collaboration; and future directions.
Hozzáadás dátuma 2021. 08. 09. 8:42:57
Módosítás dátuma 2021. 08. 09. 8:42:57

Címkék:

  • web archiving
  • collaboration
  • outreach
  • digital collections
  • collection development
  • Library and Information Science
  • undergraduates

Collection plan for online materials 2021-2024

Típus Jelentés
Szerző Jari Heikkinen
Szerző Kaisa Kaunonen
Szerző Erik Lindholm
Szerző Mikko Merioksa
Szerző Matti Pitkälä
Szerző Aija Vahtola
Szerző Petteri Veikkolainen
URL https://www.doria.fi/bitstream/handle/10024/180970/Collection%20plan%20for%20online%20materials%202021%E2%80%932024.pdf?sequence=1
Hely Helsinki
Oldalszám 8
Dátum 2021
Hozzáférés 2021. 08. 06. 2:00:00
Intézmény National Library of Finland
Könyvtár Katalógus Zotero
Nyelv en
Hozzáadás dátuma 2021. 08. 09. 8:44:21
Módosítás dátuma 2021. 08. 09. 8:44:21

Community History in Minnesota during a Pandemic: What Comes Next?

Típus Folyóiratcikk
Szerző Adam Smith
Szerző Daardi Sizemore Mixon
Szerző Rebecca Ebnet Desens
Szerző Jenna Jacobs
URL https://www.iastatedigitalpress.com/macmeetings/article/id/12571/
Kötet 2021
Szám 1
Kiadvány MAC Annual Meeting Presentations
Dátum 2021-05-14 03:00
Egyéb Number: 1
Publisher: Iowa State University Digital Press
Hozzáférés 2021. 07. 15. 10:41:25
Könyvtár Katalógus www.iastatedigitalpress.com
Nyelv eng
Kivonat <p>Three Minnesota cultural heritage organizations developed distinctly different community history projects to document the COVID-19 Pandemic. Anoka County Historical Society distributed monthly surveys asking questions relevant to the community at the time while encouraging the public to submit documentation for the archives. Hennepin County Library rapidly expanded its nascent web archiving program to capture websites of Minneapolis and suburban community organizations affected by and responding to the pandemic. Minnesota State University, Mankato developed a community history project that incorporated the international student experience to explore how our students and their families responded to the pandemic throughout the summer.</p><p>This presentation will discuss the logistics of how they organized and conducted their community history projects and next steps for those collections. Presenters will discuss processing primarily born digital materials and making the collections available for researchers while navigating privacy issues to protect contributors. Each of these projects has spawned innovative thinking along with contributing to new directions and partnerships for the organizations including an emphasis on social justice initiatives.</p>
Rövid cím Community History in Minnesota during a Pandemic
Hozzáadás dátuma 2021. 08. 09. 8:44:11
Módosítás dátuma 2021. 08. 09. 8:44:11

Community, tools, and practices in web archiving: The state-of-the-art in relation to social science and humanities research needs

Típus Folyóiratcikk
Szerző Meghan Dougherty
Szerző Eric T Meyer
URL https://search.proquest.com/docview/1700661485?accountid=27464
Kötet 65
Szám 11
Oldalszám 2195-2209
Kiadvány Journal of the Association for Information Science and Technology
ISSN 2330-1635, 2330-1635
Dátum 2014-11
Egyéb Number: 11
Publisher: Wiley Subscription Services, Hoboken NJ
Place: Loyola University Chicago, School of Communication, 820 N. Michigan Ave, Chicago, IL, 60611.
DOI http://dx.doi.org/10.1002/asi.23099
Nyelv English
Kivonat The web encourages the constant creation and distribution of large amounts of information; it is also a valuable resource for understanding human behavior and communication. To take full advantage of the web as a research resource that extends beyond the consideration of snapshots of the present, however, it is necessary to begin to take web archiving much more seriously as an important element of any research program involving web resources. The ephemeral character of the web requires that researchers take proactive steps in the present to enable future analysis. Efforts to archive the web or portions thereof have been developed around the world, but these efforts have not yet provided reliable and scalable solutions. This article summarizes the current state of web archiving in relation to researchers and research needs. Interviews with researchers, archivists, and technologists identify the differences in purpose, scope, and scale of current web archiving practice, and the professional tensions that arise given these differences. Findings outline the challenges that still face researchers who wish to engage seriously with web content as an object of research, and archivists who must strike a balance reflecting a range of user needs. [Copyright Wiley Periodicals Inc.]
Hozzáadás dátuma 2021. 08. 09. 8:43:05
Módosítás dátuma 2021. 08. 09. 8:43:05

Címkék:

  • Web archiving
  • Digital preservation
  • Research
  • 9.15: TECHNICAL SERVICES – PRESERVATION
  • article

Compact Full-text Indexing of Versioned Document Collections

Típus Dolgozat
Szerző Jinru He
Szerző Hao Yan
Szerző Torsten Suel
URL http://doi.acm.org/10.1145/1645953.1646008
Hely New York, NY, USA
Kiadó ACM
Oldalszám 415-424
ISBN 978-1-60558-512-3
Dátum 2009
Egyéb Series Title: CIKM '09
Citation Key: He:2009:CFI:1645953.1646008
DOI 10.1145/1645953.1646008
Kivonat We study the problem of creating highly compressed full-text index structures for versioned document collections, that is, collections that contain multiple versions of each document. Important examples of such collections are Wikipedia or the web page archive maintained by the Internet Archive. A straightforward indexing approach would simply treat each document version as a separate document, such that index size scales linearly with the number of versions. However, several authors have recently studied approaches that exploit the significant similarities between different versions of the same document to obtain much smaller index sizes. In this paper, we propose new techniques for organizing and compressing inverted index structures for such collections. We also perform a detailed experimental comparison of new techniques and the existing techniques in the literature. Our results on an archive of the English version of Wikipedia, and on a subset of the Internet Archive collection, show significant benefits over previous approaches.
Kiadvány címe Proceedings of the 18th ACM Conference on Information and Knowledge Management
Hozzáadás dátuma 2021. 08. 09. 8:43:18
Módosítás dátuma 2021. 08. 09. 8:43:18

Címkék:

  • web archives
  • wikipedia
  • inverted index
  • versioned documents
  • inverted index compression
  • search engines

Comparing the Archival Rate of Arabic, English, Danish, and Korean Language Web Pages

Típus Folyóiratcikk
Szerző Lulwah M Alkwai
Szerző Michael L Nelson
Szerző Michele C Weigle
URL http://doi.acm.org/10.1145/3041656
Kötet 36
Szám 1
Oldalszám 1:1–1:34
Kiadvány ACM Trans. Inf. Syst.
ISSN 1046-8188
Dátum 2017
Egyéb Number: 1
Publisher: ACM
Citation Key: Alkwai:2017:CAR:3077622.3041656
Place: New York, NY, USA
DOI 10.1145/3041656
Kivonat It has long been suspected that web archives and search engines favor Western and English language webpages. In this article, we quantitatively explore how well indexed and archived Arabic language webpages are as compared to those from other languages. We began by sampling 15,092 unique URIs from three different website directories: DMOZ (multilingual), Raddadi, and Star28 (the last two primarily Arabic language). Using language identification tools, we eliminated pages not in the Arabic language (e.g., English-language versions of Aljazeera pages) and culled the collection to 7,976 Arabic language webpages. We then used these 7,976 pages and crawled the live web and web archives to produce a collection of 300,646 Arabic language pages. We compared the analysis of Arabic language pages with that of English, Danish, and Korean language pages. First, for each language, we sampled unique URIs from DMOZ; then, using language identification tools, we kept only pages in the desired language. Finally, we crawled the archived and live web to collect a larger sample of pages in English, Danish, or Korean. In total for the four languages, we analyzed over 500,000 webpages. We discovered: (1) English has a higher archiving rate than Arabic, with 72.04% archived. However, Arabic has a higher archiving rate than Danish and Korean, with 53.36% of Arabic URIs archived, followed by Danish and Korean with 35.89% and 32.81% archived, respectively. (2) Most Arabic and English language pages are located in the United States; only 14.84% of the Arabic URIs had an Arabic country code top-level domain (e.g., sa) and only 10.53% had a GeoIP in an Arabic country. Most Danish-language pages were located in Denmark, and most Korean-language pages were located in South Korea. (3) The presence of a webpage in a directory positively impacts indexing and presence in the DMOZ directory, specifically, positively impacts archiving in all four languages. In this work, we show that web archives and search engines favor English pages. However, it is not universally true for all Western-language webpages because, in this work, we show that Arabic webpages have a higher archival rate than Danish language webpages.
Hozzáadás dátuma 2021. 08. 09. 8:43:18
Módosítás dátuma 2021. 08. 09. 8:43:18

Címkék:

  • Web archiving
  • digital preservation
  • Arabic web
  • Danish web
  • English web
  • indexing
  • Korean web

Comparing Topic Coverage in Breadth-First and Depth-First Crawls Using Anchor Texts.

Típus Folyóiratcikk
Szerző Thaer Samar
Szerző Myriam C Traub
Szerző Jacco van Ossenbruggen
Szerző Arjen P de Vries
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Oldalszám 133
Kiadvány Research & Advanced Technology for Digital Libraries: 20th International Conference on Theory & Practice of Digital Libraries, TPDL 2016, Hannover, Germany, September 5-9, 2016, Proceedings
ISSN 9783319439969
Dátum 2016-01
Hozzáadás dátuma 2021. 08. 09. 8:41:54
Módosítás dátuma 2021. 08. 09. 8:41:54

Comparison of Web Services for Sentiment Analysis in Social Networking Sites

Típus Folyóiratcikk
Szerző Ain Balqis Md Nor Basmmi
Szerző Shahliza Abd Halim
Szerző Nor Azizah Saadon
URL https://doi.org/10.1088/1757-899x/884/1/012063
Kötet 884
Oldalszám 012063
Kiadvány IOP Conference Series: Materials Science and Engineering
ISSN 1757-899X
Dátum 2020-07
Egyéb Publisher: IOP Publishing
Folyóirat rövid neve IOP Conf. Ser.: Mater. Sci. Eng.
DOI 10.1088/1757-899X/884/1/012063
Hozzáférés 2021. 07. 15. 11:27:21
Könyvtár Katalógus Institute of Physics
Nyelv en
Kivonat With various type of web services available, it is hard to identify and compare which of the free access web services work best in analysing sentiment of extremist content in social networking sites. For that purpose, a generic approach by working with API of web service using PHP programming language is used to test each dataset that was extracted based on the keyword ‘extremism’. Data from both Twitter and Facebook has been used as these two are the most powerful platforms for expressing one’s feeling. The comparison for web service is done based on the analysis of its accuracy, precision, recall and f-measures in obtaining the lowest score of mean square error (MSE). Four sentiment analysis web services are used which are Sentiment Analyzer, Aylien, ParallelDots, and MonkeyLearn. From the comparison, MonkeyLearn obtained the best final results among all web services with the lowest MSE score of 14%. For the benefit of other researchers, the finding of this will reveal the suitable web service for analysing sentiment issues as critical as extremism.
Hozzáadás dátuma 2021. 08. 09. 8:44:26
Módosítás dátuma 2021. 08. 09. 8:44:26

Constituer un réseau d’accès aux archives de l’internet : l’exemple français

Típus Dolgozat
Szerző Ange Aniesa
Szerző Ariane Bouchard
URL http://library.ifla.org/1655/
Dátum 2017
Hozzáférés 2017. 06. 26. 2:00:00
Kivonat Depuis 2006, la BnF a pour mission de collecter l’internet français au titre du dépôt légal. Pour remplir cette mission au mieux, elle a progressivement mis en place un système d’archivage complet et ainsi collecté des milliards de pages web. Sur la base du décret d’application de la loi DADVSI, la BnF a cherché à rendre ses collections d’archives de l’internet, à l’origine uniquement consultables dans ses espaces Recherche, accessibles dans d’autres établissements en région. Cet article présente les différentes étapes de l’ouverture de ces accès : l’habilitation des bibliothèques de dépôt légal imprimeur ; les problématiques organisationnelles et techniques rencontrées et les solutions adoptées ; les enjeux au stade actuel du projet, alors que seize établissements sont déjà équipés d’un service d’accès aux archives de l’internet.
Kiadvány címe IFLA Congress 2017, Wroclaw, Poland
Hozzáadás dátuma 2021. 08. 09. 8:41:44
Módosítás dátuma 2021. 08. 09. 8:41:44

Content Selection and Curation for Web Archiving

Típus Dolgozat
Szerző Ian Milligan
Szerző Nick Ruest
Szerző Jimmy Lin
URL http://dl.acm.org/citation.cfm?doid=2910896.2910913
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 107-110
ISBN 978-1-4503-4229-2
Dátum 2016
DOI 10.1145/2910896.2910913
Kiadvány címe Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries – JCDL '16
Hozzáadás dátuma 2021. 08. 09. 8:41:50
Módosítás dátuma 2021. 08. 09. 8:41:50

Copyright and Preservation of Born-digital Materials: Persistent Challenges and Selected Strategies

Típus Folyóiratcikk
Szerző Katherine Fisher
URL https://doi.org/10.17723/0360-9081-83.2.238
Kötet 83
Szám 2
Oldalszám 238-267
Kiadvány The American Archivist
ISSN 0360-9081
Dátum March 8, 2021
Egyéb Number: 2
Folyóirat rövid neve The American Archivist
DOI 10.17723/0360-9081-83.2.238
Hozzáférés 2021. 07. 15. 11:37:23
Könyvtár Katalógus Silverchair
Kivonat This article surveys and analyzes archival literature and legal resources (primarily United States–focused) related to copyright considerations that archivists and other content managers must be aware of to effectively and legally maintain a collection of born-digital materials. These considerations include the centrality of copying to preservation actions, shifting definitions of ownership, unclear distinctions between published and unpublished content, digital rights management laws and technologies, and the layered copyrights that can exist in complex digital objects and their dependencies. Strategies for dealing with these challenges include securing rights ahead of time, adopting legal rationales related to orphan works and fair use, adapting practices from specialized digital preservation subfields, ensuring routine procedures adequately address copyright-related recordkeeping and risk management, and advocating for preservation-enabling copyright reforms. An examination of these issues and strategies in the context of current thinking about copyright suggests that while certain legal exceptions and existing rights frameworks can help to facilitate digital preservation activities, copyright will continue to be a barrier until significant reforms are enacted.
Rövid cím Copyright and Preservation of Born-digital Materials
Hozzáadás dátuma 2021. 08. 09. 8:44:28
Módosítás dátuma 2021. 08. 09. 8:44:28

Copyright Challenges of Legal Deposit and Web Archiving in the National Library of Singapore.

Típus Folyóiratcikk
Szerző Jhonny Antonio Pabón Cadavid
Szerző JHONNY ANTONIO PABÓN CADAVID
URL http://10.0.28.59/ALX.0017
Kötet 25
Szám 1/2
Oldalszám 1-19
Kiadvány Alexandria
ISSN 09557490
Dátum 2014-03
Egyéb Number: 1/2
PMID: 1623365662
Publisher: Sage Publications Ltd.
Place: London
Nyelv English
Kivonat This article discusses the development of web archiving in Singapore and its relationship to copyright law. The author describes legal deposit, its definition and historical development, the differences between voluntary and compulsory legal deposit, and the practices of such approaches within the National Library of Singapore. It highlights two main projects, the Singapore Memory Project and Web Archive Singapore (WAS). The paper analyses how the implementation of legal deposit for preserving web material creates a complex relationship between copyright and digital heritage, and describes difficulties that cover the information lifecycle of web archiving. Finally, the paper presents a set of conclusions and recommendations regarding the need for modifying copyright legislation to foster research activities within Singapore's knowledge economy. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:43:33
Módosítás dátuma 2021. 08. 09. 8:43:33

Címkék:

  • Web archiving
  • web archives
  • Web archives
  • legal deposit
  • Library And Information Sciences
  • etc.
  • Legal deposit of books
  • copyright
  • Copyright of digital media
  • national libraries
  • National Library (Singapore)
  • Singapore

Copyright in the networked world: digital legal deposit

Típus Folyóiratcikk
Szerző Michael Seadle
URL http://www.emeraldinsight.com/doi/10.1108/EUM0000000005893
Kötet 19
Szám 3
Oldalszám 299-303
Kiadvány Library Hi Tech
ISSN 0737-8831
Dátum 2001-09
Egyéb Number: 3
DOI 10.1108/EUM0000000005893
Kivonat Legal deposit is the requirement that particular types of material be deposited with a national library or designated research libraries. US law does not at present include any requirement for the deposit of works that exist solely in the form of Web pages. For digital materials, it makes no sense to write rules for legal deposit based on the medium. Nations and national libraries that ignore legal deposit for digital works will find themselves missing a significant and unrecoverable portion of their cultural heritage
Hozzáadás dátuma 2021. 08. 09. 8:42:47
Módosítás dátuma 2021. 08. 09. 8:42:47

Címkék:

  • copyright
  • publishing

Correspondence as the Primary Measure of Quality for Web Archives: A Grounded Theory Study

Típus Dolgozat
Szerző Brenda Reyes Ayala
Szerkesztő Mark Hall
Szerkesztő Tanja Merčun
Szerkesztő Thomas Risse
Szerkesztő Fabien Duchateau
Sorozat Lecture Notes in Computer Science
Hely Cham
Kiadó Springer International Publishing
Oldalszám 73-86
ISBN 978-3-030-54956-5
Dátum 2020
DOI 10.1007/978-3-030-54956-5_6
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat Creating an archived website that is as close as possible to the original, live website remains one of the most difficult challenges in the field of web archiving. Failing to adequately capture a website might mean an incomplete historical record or, worse, no evidence that the site ever even existed. This paper presents a grounded theory of quality for web archives created using data from web archivists. In order to achieve this, I analysed support tickets submitted by clients of the Internet Archive’s Archive-It (AIT), a subscription-based web archiving service that helps organisations build and manage their own web archives. Overall, 305 tickets were analysed, comprising 2544 interactions. The resulting theory is comprised of three dimensions of quality in a web archive: correspondence, relevance, and archivability. The dimension of correspondence, defined as the degree of similarity or resemblance between the original website and the archived website, is the most important facet of quality in web archives, and it is the main focus of this work. This paper’s contribution is that it presents the first theory created specifically for web archives and lays the groundwork for future theoretical developments in the field. Furthermore, the theory is human-centred and grounded in how users and creators of web archives perceive their quality. By clarifying the notion of quality in a web archive, this research will be of benefit to web archivists and cultural heritage institutions.
Kiadvány címe Digital Libraries for Open Knowledge
Rövid cím Correspondence as the Primary Measure of Quality for Web Archives
Hozzáadás dátuma 2021. 08. 09. 8:44:04
Módosítás dátuma 2021. 08. 09. 8:44:04

Címkék:

  • Web archiving
  • Grounded theory
  • Information quality
  • Quality Assurance

Correspondence as the Primary Measure of Quality for Web Archives: A Grounded Theory Study

Típus Weboldal
Szerző Brenda Reyes Ayala
URL https://era.library.ualberta.ca/items/b45b9bf6-424d-4052-85a2-3517c5512cd8
Dátum 2020-06-02
Egyéb DOI: 10.7939/r3-btx5-0s86
Hozzáférés 2021. 07. 15. 10:13:28
Nyelv en
Kivonat Creating an archived website that is as close as possible to the original, live website remains one of the most difficult challenges in…
Website címe ERA
Rövid cím Correspondence as the Primary Measure of Quality for Web Archives
Hozzáadás dátuma 2021. 08. 09. 8:44:04
Módosítás dátuma 2021. 08. 09. 8:44:04

Counter-archiving Facebook

Típus Folyóiratcikk
Szerző Anat Ben-David
Kötet 35
Szám 3
Oldalszám 249-264
Kiadvány European Journal of Communication
ISSN 02673231
Dátum June 2020
Egyéb Number: 3
Folyóirat rövid neve European Journal of Communication
DOI 10.1177/0267323120922069
Könyvtár Katalógus EBSCOhost
Kivonat The article proposes archival thinking as an analytical framework for studying Facebook. Following recent debates on data colonialism, it argues that Facebook dialectically assumes a role of a new archon of public records, while being unarchivable by design. It then puts forward counter-archiving – a practice developed to resist the epistemic hegemony of colonial archives – as a method that allows the critical study of the social media platform, after it had shut down researcher's access to public data through its application programming interface. After defining and justifying counter-archiving as a method for studying datafied platforms, two counter-archives are presented as proof of concept. The article concludes by discussing the shifting boundaries between the archivist, the activist and the scholar, as the imperative of research methods after datafication.
Hozzáadás dátuma 2021. 08. 09. 8:43:44
Módosítás dátuma 2021. 08. 09. 8:43:44

Címkék:

  • archive
  • SOCIAL media
  • APPLICATION program interfaces
  • Application programming interface
  • datafication
  • Facebook
  • FACEBOOK (Web resource)
  • methods
  • POLITICAL advertising
  • PUBLIC records

Counting the uncountable: statistics for web archives

Típus Folyóiratcikk
Szerző Clement Oury
Szerző Roswitha Poll
URL https://search.proquest.com/docview/1399615625?accountid=27464
Kötet 14
Szám 2
Oldalszám 132-141
Kiadvány Performance Measurement and Metrics
ISSN 14678047
Dátum 2013
Egyéb Number: 2
PMID: 1399615625
Publisher: Emerald Group Publishing Limited
Place: Bradford
DOI http://dx.doi.org/10.1108/PMM-05-2013-0014
Nyelv English
Kivonat Purpose – The purpose of this paper is to describe the aims and contents of the ISO Report ISO/TR 14873. Design/methodology/approach – For more than a decade, libraries have started to "collect the web". National libraries in particular select, collect and store publications and websites from their national domain, seeing this as a task similar to traditional legal deposit. The collection policies and collecting methods vary, so that it is difficult to compare the quantity and quality of the respective web archives. Findings – In order to harmonize the evaluation of web archives, ISO TC 46 SC 8 has produced a Technical Report that standardizes the terminology and statistics and offers tested indicators for assessing the quality of web archiving. Originality/value – This paper describes the shortly to be published ISO/TR 14873, a potentially vital guide to harmonize web archive collection internationally.
Hozzáadás dátuma 2021. 08. 09. 8:42:01
Módosítás dátuma 2021. 08. 09. 8:42:01

Címkék:

  • Library And Information Sciences
  • Archives & records
  • Library collections
  • Internet resources
  • Web sites
  • Software
  • Quality standards
  • Statistics

Creating a billion-scale searchable web archive

Típus Dolgozat
Szerző Daniel Gomes
Szerző Miguel Costa
Szerző David Cruz
Szerző João Miranda
Szerző Simão Fontes
URL http://dl.acm.org/citation.cfm?doid=2487788.2488118
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 1059-1066
ISBN 978-1-4503-2038-2
Dátum 2013
DOI 10.1145/2487788.2488118
Kivonat Web information is ephemeral. Several organizations around the world are struggling to archive information from the web before it vanishes. However, users demand efficient and effective search mechanisms to access the already vast collections of historical information held by web archives. The Portuguese Web Archive is the largest full-text searchable web archive publicly available. It supports search over 1.2 billion files archived from the web since 1996. This study contributes with an overview of the lessons learned while developing the Portuguese Web Archive, focusing on web data acquisition, ranking search results and user interface design. The developed software is freely available as an open source project. We believe that sharing our experience obtained while developing and operating a running service will enable other organizations to start or improve their web archives.
Kiadvány címe Proceedings of the 22nd International Conference on World Wide Web – WWW '13 Companion
Hozzáadás dátuma 2021. 08. 09. 8:43:22
Módosítás dátuma 2021. 08. 09. 8:43:22

Címkék:

  • Web
  • Preservation
  • Archive
  • Portuguese Web Archive
  • Temporal Search
  • Search

Creating and Consuming Metadata from Transcribed Historical Vital Records for Ingestion in a Long-Term Digital Preservation Platform

Típus Könyvfejezet
Szerző Dolores Grant
Szerző Christophe Debruyne
Szerző Rebecca Grant
Szerző Sandra Collins
URL http://link.springer.com/10.1007/978-3-319-26138-6_47
Oldalszám 445-450
ISBN 978-3-319-26138-6
Dátum 2015
Egyéb DOI: 10.1007/978-3-319-26138-6_47
Kivonat In the Irish Record Linkage 1864-1913 (IRL) project, digital archivists transcribe digitized register pages containing vital records into a database, which is then used to generate RDF triples. Historians then use those triples to answer some specific research questions on the IRL platform. Though the triples themselves are a highly valuable asset that can be adopted by many, the digitized records and their RDF representations need to be adequately stored and preserved according to best standards and guidelines to ensure those do not get lost over time. This was a problem currently not investigated within this project. This paper reports on the creation of Qualified Dublin Core from those triples for ingestion with the digitized register pages in an adequate long-term digital preservation platform and repository. Rather than creating RDF only for the purpose of this project, we demonstrate how we can distill artifacts from the RDF that is fit for discovery, access, and even reuse via that repository and how we elicit and conserve the knowledge and memories about Ireland, its history and culture contained in those register pages.
Könyv címe Confederated International Workshops: OTM Academy, OTM Industry Case Studies Program, EI2N, FBM, INBAST, ISDE, META4eS, and MSC 2015 Rhodes, Greece, October 26–30, 2015, Proceedings
Hozzáadás dátuma 2021. 08. 09. 8:41:49
Módosítás dátuma 2021. 08. 09. 8:41:49

Címkék:

  • Linked data
  • Metadata
  • Mapping
  • Vital records

Creating Event-Centric Collections from Web Archives

Típus Könyvfejezet
Szerző Elena Demidova
Szerző Thomas Risse
Szerkesztő Daniel Gomes
Szerkesztő Elena Demidova
Szerkesztő Jane Winters
Szerkesztő Thomas Risse
URL https://doi.org/10.1007/978-3-030-63291-5_6
Hely Cham
Kiadó Springer International Publishing
Oldalszám 57-67
ISBN 978-3-030-63291-5
Dátum 2021
Egyéb DOI: 10.1007/978-3-030-63291-5_6
Hozzáférés 2021. 07. 15. 9:52:26
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat Web archives are an essential information source for research on historical events. However, the large scale and heterogeneity of web archives make it difficult for researchers to access relevant event-specific materials. In this chapter, we discuss methods for creating event-centric collections from large-scale web archives. These methods are manifold and may require manual curation, adopt search or deploy focused crawling. In this chapter, we focus on the crawl-based methods that identify relevant documents in and across web archives and include link networks as context in the resulting collections.
Könyv címe The Past Web: Exploring Web Archives
Hozzáadás dátuma 2021. 08. 09. 8:43:58
Módosítás dátuma 2021. 08. 09. 8:43:58

Critical Web Archive Research

Típus Könyvfejezet
Szerző Anat Ben-David
Szerkesztő Daniel Gomes
Szerkesztő Elena Demidova
Szerkesztő Jane Winters
Szerkesztő Thomas Risse
URL https://doi.org/10.1007/978-3-030-63291-5_14
Hely Cham
Kiadó Springer International Publishing
Oldalszám 181-188
ISBN 978-3-030-63291-5
Dátum 2021
Egyéb DOI: 10.1007/978-3-030-63291-5_14
Hozzáférés 2021. 07. 15. 9:52:26
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat Following the familiar distinction between software and hardware, this chapter argues that web archives deserve to be treated as a third category—memoryware: specific forms of preservation techniques which involve both software and hardware, but also crawlers, bots, curators, and users. While historically the term memoryware refers to the art of cementing together bits and pieces of sentimental objects to commemorate loved ones, understanding web archives as a complex socio-technical memoryware moves beyond their perception as bits and pieces of the live Web. Instead, understanding web archives as memoryware hints at the premise of the web’s exceptionalism in media and communication history and calls for revisiting some of the concepts and best practices in web archiving and web archive research that have consolidated over the years. The chapter, therefore, presents new challenges for web archive research by turning a critical eye on web archiving itself and on the specific types of histories that are constructed with web archives.
Könyv címe The Past Web: Exploring Web Archives
Hozzáadás dátuma 2021. 08. 09. 8:43:59
Módosítás dátuma 2021. 08. 09. 8:43:59

Crook, Edgar: Webarchiválás a webkettes világban

Típus Folyóiratcikk
Szerző Drótos László
Szerző Edgar Crook
Kötet 57
Szám 2
Oldalszám 78-81
Kiadvány Tudományos és műszaki tájékoztatás
ISSN 0041-3917
Dátum 2010
Egyéb Number: 2
Kivonat A National Library of Australia vezető szerepet játszik az ausztrál web begyűjtésében és megőrzésében 1996, a PANDORA archívum (pandora. nla.gov.au) létrehozása óta. Emellett léteznek más, szűkebb körű projektek is, mint például a tasmániai Our Digital Island (odi.statelibrary. tas.gov.au), vagy a kontinens Northern Territory nevű részén működő Territory Stories (territorystories.nt.gov.au). A nemzeti könyvtár jelenleg már háromféle módon archivál: a PANDORA gyűjteménybe szelektíven válogat online forrásokat, továbbá az Internet „Archive” segítségével a teljes .au domént learatja, valamint elkezdte használni az „Archive-It” szolgáltatást is. Elmondható tehát, hogy az ausztrál online tartalom jelentős részét sikerül így megmenteni a jövő számára. De a technológiai változások miatt a könyvtárnak folyamatosan alkalmazkodnia kell: fejleszteni az archiváló eszközeit, bővíteni a gyűjtött tartalmak körét és újabb partnerekkel szövetkezni, hogy eredményesen tudja folytatni ezt a fontos munkát.
Hozzáadás dátuma 2021. 08. 09. 8:43:28
Módosítás dátuma 2021. 08. 09. 8:43:28

Címkék:

  • webarchiválás
  • Ausztrália

Cross-lingual Web Spam Classification

Típus Dolgozat
Szerző András Garzó
Szerző Bálint Daróczy
Szerző Tamás Kiss
Szerző Dávid Siklósi
Szerző András A Benczúr
URL http://doi.acm.org/10.1145/2487788.2488139
Hely New York, NY, USA
Kiadó ACM
Oldalszám 1149-1156
ISBN 978-1-4503-2038-2
Dátum 2013
Egyéb Series Title: WWW '13 Companion
Citation Key: Garzo:2013:CWS:2487788.2488139
DOI 10.1145/2487788.2488139
Kivonat While Web spam training data exists in English, we face an expensive human labeling procedure if we want to filter a Web domain in a different language. In this paper we overview how existing content and link based classification techniques work, how models can be "translated" from English into another language, and how language-dependent and independent methods combine. In particular we show that simple bag-of-words translation works very well and in this procedure we may also rely on mixed language Web hosts, i.e. those that contain an English translation of part of the local language text. Our experiments are conducted on the ClueWeb09 corpus as the training English collection and a large Portuguese crawl of the Portuguese Web Archive. To foster further research, we provide labels and precomputed values of term frequencies, content and link based features for both ClueWeb09 and the Portuguese data.
Kiadvány címe Proceedings of the 22Nd International Conference on World Wide Web
Hozzáadás dátuma 2021. 08. 09. 8:43:10
Módosítás dátuma 2021. 08. 09. 8:43:10

Címkék:

  • content analysis
  • cross-lingual text processing
  • link analysis
  • web classification
  • web spam

Cuéntalo: the path between archival activism and the social archive(s)

Típus Folyóiratcikk
Szerző Vicenç Ruiz Gómez
Szerző Aniol Maria Vallès
URL https://www.tandfonline.com/doi/full/10.1080/01576895.2020.1802306
Kötet 48
Szám 3
Oldalszám 271-290
Kiadvány Archives and Manuscripts
ISSN 0157-6895, 2164-6058
Dátum 2020-09-01
Egyéb Number: 3
Folyóirat rövid neve Archives and Manuscripts
DOI 10.1080/01576895.2020.1802306
Hozzáférés 2021. 07. 15. 11:02:37
Könyvtár Katalógus DOI.org (Crossref)
Nyelv en
Rövid cím #Cuéntalo
Hozzáadás dátuma 2021. 08. 09. 8:44:17
Módosítás dátuma 2021. 08. 09. 8:44:17

Current research on theory and practice of digital libraries: best papers from TPDL 2017

Típus Folyóiratcikk
Szerző Giannis Tsakonas
Szerző Jaap Kamps
URL http://link.springer.com/10.1007/s00799-020-00278-4
Kötet 21
Szám 1
Oldalszám 1-3
Kiadvány International Journal on Digital Libraries
ISSN 1432-5012, 1432-1300
Dátum 03/2020
Egyéb Number: 1
Folyóirat rövid neve Int J Digit Libr
DOI 10.1007/s00799-020-00278-4
Hozzáférés 2021. 07. 15. 11:26:02
Könyvtár Katalógus DOI.org (Crossref)
Nyelv en
Kivonat This volume presents a special issue on the 2017 edition of the Theory and Practice of Digital Libraries (TPDL) conference, held in Thessaloniki, Greece. We provide a brief overview of TPDL 2017 and introduce the selected papers that make up the rest of this volume. The papers cover different aspects of current digital library research, highlighting the important and multidisciplinary nature of the field.
Rövid cím Current research on theory and practice of digital libraries
Hozzáadás dátuma 2021. 08. 09. 8:44:25
Módosítás dátuma 2021. 08. 09. 8:44:25

Data Management of Web Archive Research Data

Típus Dolgozat
Szerző Jurik Bolette
Szerző Eld Zierau
Dátum 2017
Kivonat This paper will provide recommendations to overcome various challenges for data management of web materials. The recommendations are based on results from two independent Danish research projects with different requirements to data management: The first project focuses on high precision on a par with traditional references for analogue material and with web materials found in different web archives. The second project focuses on large corpora (collections) of archived web references as basis for analysis.
Kiadvány címe “Researchers, pratictioners and their use of the archived web”, London, School of Advanced Study, University of London
Hozzáadás dátuma 2021. 08. 09. 8:41:47
Módosítás dátuma 2021. 08. 09. 8:41:47

Data Quality in Web Archiving

Típus Dolgozat
Szerző Marc Spaniol
Szerző Dimitar Denev
Szerző Arturas Mazeika
Szerző Gerhard Weikum
Szerző Pierre Senellart
URL http://doi.acm.org/10.1145/1526993.1526999
Hely New York, NY, USA
Kiadó ACM
Oldalszám 19-26
ISBN 978-1-60558-488-1
Dátum 2009
Egyéb Series Title: WICOW '09
Citation Key: Spaniol:2009:DQW:1526993.1526999
DOI 10.1145/1526993.1526999
Kivonat Web archives preserve the history of Web sites and have high long-term value for media and business analysts. Such archives are maintained by periodically re-crawling entire Web sites of interest. From an archivist's point of view, the ideal case to ensure highest possible data quality of the archive would be to "freeze" the complete contents of an entire Web site during the time span of crawling and capturing the site. Of course, this is practically infeasible. To comply with the politeness specification of a Web site, the crawler needs to pause between subsequent http requests in order to avoid unduly high load on the site's http server. As a consequence, capturing a large Web site may span hours or even days, which increases the risk that contents collected so far are incoherent with the parts that are still to be crawled. This paper introduces a model for identifying coherent sections of an archive and, thus, measuring the data quality in Web archiving. Additionally, we present a crawling strategy that aims to ensure archive coherence by minimizing the diffusion of Web site captures. Preliminary experiments demonstrate the usefulness of the model and the effectiveness of the strategy.
Kiadvány címe Proceedings of the 3rd Workshop on Information Credibility on the Web
Hozzáadás dátuma 2021. 08. 09. 8:43:20
Módosítás dátuma 2021. 08. 09. 8:43:20

Címkék:

  • web archiving
  • data quality
  • temporal coherence

Demonstrating intelligent crawling and archiving of web applications

Típus Dolgozat
Szerző Muhammad Faheem
Szerző Pierre Senellart
URL http://doi.acm.org/10.1145/2505515.2508197
Hely New York, NY, USA
Kiadó ACM
Oldalszám 2481-2484
ISBN 978-1-4503-2263-8
Dátum 2013
Egyéb Series Title: CIKM '13
Citation Key: Faheem:2013:DIC:2541176.2508197
DOI 10.1145/2505515.2508197
Kivonat We demonstrate here a new approach to Web archival crawling, based on an application-aware helper that drives crawls of Web applications according to their types (especially, according to their content management systems). By adapting the crawling strategy to the Web application type, one is able to crawl a given Web application (say, a given forum or blog) with fewer requests than traditional crawling techniques. Additionally, the application-aware helper is able to extract semantic content from the Web pages crawled, which results in a Web archive of richer value to an archive user. In our demonstration scenario, we invite a user to compare application-aware crawling to regular Web crawling on the Web site of their choice, both in terms of efficiency and of experience in browsing and searching the archive.
Kiadvány címe Proceedings of the 22nd ACM international conference on Conference on information &#38; knowledge management
Hozzáadás dátuma 2021. 08. 09. 8:43:21
Módosítás dátuma 2021. 08. 09. 8:43:21

Címkék:

  • web archiving
  • crawling
  • content management system
  • web application

Deriving Dynamics of Web Pages: A Survey

Típus Dokumentum
Szerző Pierre Senellart
Szerző Marilena Oita
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kiadó HAL CCSD
Dátum 2011
Kivonat The World Wide Web is dynamic by nature: content is continuously added, deleted, or changed, which makes it challenging for Web crawlers to keep up-to-date with the current version of a Web page, all the more so since not all apparent changes are significant ones. We review major approaches to change detection in Web pages and extraction of temporal properties (especially, timestamps) of Web pages. We focus our attention on techniques and systems that have been proposed in the last ten years and we analyze them to get some insight into the practical solutions and best practices available. We aim at providing an analytical view of the range of methods that can be used, distinguishing them on several dimensions, especially, their static or dynamic nature, the modeling of Web pages, or, for dynamic methods relying on comparison of successive versions of a page, the similarity metrics used. We advocate for more comprehensive studies of the effectiveness of Web page change detection methods, and finally highlight open issues.
Hozzáadás dátuma 2021. 08. 09. 8:42:50
Módosítás dátuma 2021. 08. 09. 8:42:50

Címkék:

  • Web archiving
  • [INFO.INFO-WB] Computer Science [cs]/Web
  • ACM : H.3.5.2
  • Change monitoring
  • Timestamping

Descriptive metadata for web archiving: Literature review of user needs

Típus Dokumentum
Szerző OCLC
URL https://www.oclc.org/research/publications/2018/oclcresearch-descriptive-metadata/recommendations.html
Kiadó OCLC Research
Dátum 2018
Egyéb Place: United States, North America
Hozzáférés 2020. 08. 14. 2:00:00
Kivonat Under the auspices of the OCLC Research Library Partnership Web Archiving Metadata Working Group, this document is a literature review to inform the development of descriptive metadata best practices for archived web content that would meet end-user needs, enhance discovery, and improve metadata consistency. Selected readings include — at minimum — a substantive section related to metadata, but most covered a wider swath of issues. This helped the Working Group to learn much else about who the users of web archives are, the strategies they use and the challenges they face.
Hozzáadás dátuma 2021. 08. 09. 8:43:32
Módosítás dátuma 2021. 08. 09. 8:43:32

Címkék:

  • Web archiving
  • Archives
  • Electronic information resources–Management
  • Library metadata

Descriptive metadata for web archiving: Review of harvesting tools

Típus Dokumentum
Szerző OCLC
URL https://www.oclc.org/research/publications/2018/oclcresearch-descriptive-metadata/recommendations.html
Kiadó OCLC Research
Dátum 2018
Egyéb Place: United States, North America
Hozzáférés 2020. 08. 14. 2:00:00
Kivonat OCLC Research Library Partnership Web Archiving Working Group, Tools Subgroup's objective analysis of 11 tools designed to extract descriptive metadata from harvested web content. Selected tools included those tools that harvest or replay web content, are actively under development and/or are actively supported, and appeared to include descriptive metadata capture features. Tools reviewed include: Archive-It, Heritrix, HTTrack, Memento, Netarchive Suite, SiteStory, Social Feed Manager, Wayback Machine, Web Archive.
Hozzáadás dátuma 2021. 08. 09. 8:42:10
Módosítás dátuma 2021. 08. 09. 8:42:10

Címkék:

  • Web archiving
  • Archives
  • Electronic information resources–Management
  • Application software–Reviews

Desiderata for Exploratory Search Interfaces to Web Archives in Support of Scholarly Activities

Típus Dolgozat
Szerző Andrew Jackson
Szerző Jimmy Lin
Szerző Ian Milligan
Szerző Nick Ruest
URL http://doi.acm.org/10.1145/2910896.2910912
Hely New York, NY, USA
Kiadó ACM
Oldalszám 103-106
ISBN 978-1-4503-4229-2
Dátum 2016
Egyéb Series Title: JCDL '16
Citation Key: Jackson:2016:DES:2910896.2910912
DOI 10.1145/2910896.2910912
Kivonat Web archiving initiatives around the world capture ephemeral web content to preserve our collective digital memory. In this paper, we describe initial experiences in providing an exploratory search interface to web archives for humanities scholars and social scientists. We describe our initial implementation and discuss our findings in terms of desiderata for such a system. It is clear that the standard organization of a search engine results page (SERP), consisting of an ordered list of hits, is inadequate to support the needs of scholars. Shneiderman's mantra for visual information seeking ("overview first, zoom and filter, then details-on-demand") provides a nice organizing principle for interface design, to which we propose an addendum: "Make everything transparent". We elaborate on this by highlighting the importance of the temporal dimension of web pages as well as issues surrounding metadata and veracity.
Kiadvány címe Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:16
Módosítás dátuma 2021. 08. 09. 8:43:16

Címkék:

  • metadata
  • faceted browsing
  • shneiderman's mantra
  • veracity

Design and implementation of crawling algorithm to collect deep web information for web archiving

Típus Folyóiratcikk
Szerző Hyo-Jung Oh
Szerző Won Dong-Hyun
Szerző Chonghyuck Kim
Szerző Sung-Hee Park
Szerző Yong Kim
URL https://search.proquest.com/docview/2083825786?accountid=27464
Kötet 52
Szám 2
Oldalszám 266-277
Kiadvány Data Technologies and Applications
ISSN 25149288
Dátum 2018
Egyéb Number: 2
Publisher: Emerald Group Publishing Limited
Place: Graduate School of Archives and Records Management, Chonbuk National University, Jeonju, The Republic of Korea ; Center for Disaster Safety Information, Chonbuk National University, Jeonju, The Republic of Korea ; Department of English Language and Litera
DOI http://dx.doi.org/10.1108/DTA-07-2017-0053
Nyelv English
Kivonat PurposeThe purpose of this paper is to describe the development of an algorithm for realizing web crawlers that automatically collect dynamically generated webpages from the deep web.Design/methodology/approachThis study proposes and develops an algorithm to collect web information as if the web crawler gathers static webpages by managing script commands as links. The proposed web crawler actually experiments with the algorithm by collecting deep webpages.FindingsAmong the findings of this study is that if the actual crawling process provides search results as script pages, the outcome only collects the first page. However, the proposed algorithm can collect deep webpages in this case.Research limitations/implicationsTo use a script as a link, a human must first analyze the web document. This study uses the web browser object provided by Microsoft Visual Studio as a script launcher, so it cannot collect deep webpages if the web browser object cannot launch the script, or if the web document contains script errors.Practical implicationsThe research results show deep webs are estimated to have 450 to 550 times more information than surface webpages, and it is difficult to collect web documents. However, this algorithm helps to enable deep web collection through script runs.Originality/valueThis study presents a new method to be utilized with script links instead of adopting previous keywords. The proposed algorithm is available as an ordinary URL. From the conducted experiment, analysis of scripts on individual websites is needed to employ them as links.
Hozzáadás dátuma 2021. 08. 09. 8:42:20
Módosítás dátuma 2021. 08. 09. 8:42:20

Címkék:

  • Web archiving
  • Archiving
  • Library And Information Sciences–Computer Applica
  • Digital archives
  • Web sites
  • Algorithms
  • Case depth
  • Electronic documents
  • Links
  • Visual programming languages
  • Webs
  • Websites

Design of an Enhanced Web Archiving System for Preserving Content Integrity with Blockchain

Típus Folyóiratcikk
Szerző Hyun Cheon Hwang
Szerző Jin Gon Shon
Szerző Ji Su Park
URL https://www.mdpi.com/2079-9292/9/8/1255
Jogok http://creativecommons.org/licenses/by/3.0/
Kötet 9
Szám 8
Oldalszám 1255
Kiadvány Electronics
Dátum 2020/8
Egyéb Number: 8
Publisher: Multidisciplinary Digital Publishing Institute
DOI 10.3390/electronics9081255
Hozzáférés 2021. 07. 15. 9:36:56
Könyvtár Katalógus www.mdpi.com
Nyelv en
Kivonat A Web archive system is a traditional subject for preserving web content for the future and the importance is getting more significant due to the explosive growth of web content. The reference model for an open archival information system (OAIS) has been advising guidance for a long-term archiving system and most organizations that archive web content follow this guidance. In addition, the web archive (WARC) ISO standard is for web content archiving. However, there is no way to secure content integrity, and it is hard to identify the original. Because of limitations, a web archive system has a weakness against the dispute of content integrity. In this paper, we proposed the blockchain linked (BCLinked) web archiving system, which uses blockchain technology and an extended WARC field to keep a web content integrity metadata into a blockchain. Furthermore, we designed the BCLinked web archiving system, and we confirmed the proposed system secures content integrity through the experiment.
Hozzáadás dátuma 2021. 08. 09. 8:43:52
Módosítás dátuma 2021. 08. 09. 8:43:52

Címkék:

  • WARC
  • web archive
  • web crawling
  • BCLinked
  • blockchain
  • web archiving system

Designing Efficient Sampling Techniques to Detect Webpage Updates

Típus Dolgozat
Szerző Qingzhao Tan
Szerző Ziming Zhuang
Szerző Prasenjit Mitra
Szerző C Lee Giles
URL http://doi.acm.org/10.1145/1242572.1242738
Hely New York, NY, USA
Kiadó ACM
Oldalszám 1147-1148
ISBN 978-1-59593-654-7
Dátum 2007
Egyéb Series Title: WWW '07
Citation Key: Tan:2007:DES:1242572.1242738
DOI 10.1145/1242572.1242738
Kivonat Due to resource constraints, Web archiving systems and search engines usually have difficulties keeping the entire local repository synchronized with the Web. We advance the state-of-art of the sampling-based synchronization techniques by answering a challenging question: Given a sampled webpage and its change status, which other webpages are also likely to change? We present a study of various downloading granularities and policies, and propose an adaptive model based on the update history and the popularity of the webpages. We run extensive experiments on a large dataset of approximately 300,000 webpages to demonstrate that it is most likely to find more updated webpages in the current or upper directories of the changed samples. Moreover, the adaptive strategies outperform the non-adaptive one in terms of detecting important changes.
Kiadvány címe Proceedings of the 16th International Conference on World Wide Web
Hozzáadás dátuma 2021. 08. 09. 8:43:37
Módosítás dátuma 2021. 08. 09. 8:43:37

Címkék:

  • web crawler
  • sampling
  • search engine

Detecting Age of Page Content

Típus Dolgozat
Szerző Adam Jatowt
Szerző Yukiko Kawai
Szerző Katsumi Tanaka
URL http://doi.acm.org/10.1145/1316902.1316925
Hely New York, NY, USA
Kiadó ACM
Oldalszám 137-144
ISBN 978-1-59593-829-9
Dátum 2007
Egyéb Series Title: WIDM '07
Citation Key: Jatowt:2007:DAP:1316902.1316925
DOI 10.1145/1316902.1316925
Kivonat Web pages often contain objects cr eated at different times. The information about the age of such objects may provide useful context for understanding page content and may serve many potential uses. In this paper, we describe a novel concept for detecting approximate creation date s of content elements in Web pages. Our approach is based on dynamically reconstructing page histories using data extracted from external sources – Web archives and efficiently searching inside them to detect insertion dates of content elements. We di scuss various issues involving the proposed approach and demonstrate the example of an application that enhances browsing the Web by inserting annotations with temporal metadata into page content on user request.
Kiadvány címe Proceedings of the 9th Annual ACM International Workshop on Web Information and Data Management
Hozzáadás dátuma 2021. 08. 09. 8:43:35
Módosítás dátuma 2021. 08. 09. 8:43:35

Címkék:

  • metadata
  • web archive
  • age detection
  • document annotation

Detecting Off-Topic Pages in Web Archives

Típus Könyvfejezet
Szerző Yasmin AlNoamany
Szerző Michele C Weigle
Szerző Michael L Nelson
Szerkesztő Sarantos Kapidakis
Szerkesztő Cezary Mazurek
Szerkesztő Marcin Werla
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Hely Cham
Kiadó Springer International Publishing
Oldalszám 225-237
ISBN 978-3-319-24592-8
Dátum 2015-01
Egyéb DOI: 10.1007/978-3-319-24592-8_17
ISSN: 9783662481615
Kivonat Web archives have become a significant repository of our recent history and cultural heritage. Archival integrity and accuracy is a precondition for future cultural research. Currently, there are no quantitative or content-based tools that allow archivists to judge the quality of the Web archive captures. In this paper, we address the problems of detecting off-topic pages in Web archive collections. We evaluate six different methods to detect when the page has gone off-topic through subsequent captures. Those predicted off-topic pages will be presented to the collection’s curator for possible elimination from the collection or cessation of crawling. We created a gold standard data set from three Archive-It collections to evaluate the proposed methods at different thresholds. We found that combining cosine similarity at threshold 0.10 and change in size using word count at threshold $$-$$0.85 performs the best with accuracy = 0.987, $$F_{1}$$score = 0.906, and AUC = 0.968. We evaluated the performance of the proposed method on several Archive-It collections. The average precision of detecting the off-topic pages is 0.92.
Könyv címe Research and Advanced Technology for Digital Libraries. TPDL 2015. Lecture Notes in Computer Science, vol 9316.
Hozzáadás dátuma 2021. 08. 09. 8:43:26
Módosítás dátuma 2021. 08. 09. 8:43:26

Címkék:

  • Web archiving
  • Internet Archive
  • Archived collections
  • Document filtering
  • Document similarity
  • Information retrieval
  • Web content mining

Detecting off-topic pages within TimeMaps in Web archives.

Típus Folyóiratcikk
Szerző Yasmin AlNoamany
Szerző Michele C Weigle
Szerző Michael L Nelson
URL https://search.proquest.com/docview/1811905000?accountid=27464
Kötet 17
Szám 3
Oldalszám 203-221
Kiadvány International Journal on Digital Libraries
ISSN 14325012
Dátum 2016-09
Egyéb Number: 3
Publisher: Springer Science & Business Media B.V.
Place: Heidelberg
DOI http://dx.doi.org/10.1007/s00799-016-0183-5
Nyelv English
Kivonat Web archives have become a significant repository of our recent history and cultural heritage. Archival integrity and accuracy is a precondition for future cultural research. Currently, there are no quantitative or content-based tools that allow archivists to judge the quality of the Web archive captures. In this paper, we address the problems of detecting when a particular page in a Web archive collection has gone off-topic relative to its first archived copy. We do not delete off-topic pages (they remain part of the collection), but they are flagged as off-topic so they can be excluded for consideration for downstream services, such as collection summarization and thumbnail generation. We propose different methods (cosine similarity, Jaccard similarity, intersection of the 20 most frequent terms, Web-based kernel function, and the change in size using the number of words and content length) to detect when a page has gone off-topic. Those predicted off-topic pages will be presented to the collection's curator for possible elimination from the collection or cessation of crawling. We created a gold standard data set from three Archive-It collections to evaluate the proposed methods at different thresholds. We found that combining cosine similarity at threshold 0.10 and change in size using word count at threshold −0.85 performs the best with accuracy = 0.987, $$F_{1}$$ score = 0.906, and AUC $$=$$ 0.968. We evaluated the performance of the proposed method on several Archive-It collections. The average precision of detecting off-topic pages in the collections is 0.89. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:43:32
Módosítás dátuma 2021. 08. 09. 8:43:32

Címkék:

  • Web archiving
  • Internet Archive
  • ARCHIVES
  • WEB archives
  • Library And Information Sciences–Computer Applica
  • Archives & records
  • Internet
  • Data mining
  • INFORMATION retrieval
  • Archived collections
  • Document filtering
  • Document similarity
  • Filtering systems
  • HTTP (Computer network protocol)
  • Information retrieval
  • UNIFORM Resource Identifiers
  • Web content mining

Determining Users' Motivations to Participate in Online Community Archives: A Preliminary Study of Documenting Ferguson

Típus Dolgozat
Szerző Chris Freeland
Szerző Kodjo Atiso
URL http://dl.acm.org/citation.cfm?id=2857070.2857176
Hely Silver Springs, MD, USA
Kiadó American Society for Information Science
Oldalszám 106:1–106:4
ISBN 0-87715-547-X
Dátum 2015
Egyéb Series Title: ASIST '15
Citation Key: Freeland:2015:DUM:2857070.2857176
Kivonat The shooting death of teenager Michael Brown in Ferguson, Missouri, spurred an immediate national and international response in the fall of 2014. Washington University Libraries in St. Louis, Missouri, established the Documenting Ferguson web archive to gather digital media documenting local protests and demonstrations as captured by community members in order to archive the materials for future research and scholarly use. This preliminary study identified the factors that motivated participants to contribute content to the Documenting Ferguson online community archive, uncovering themes of altruism, reciprocity, and personal development.
Kiadvány címe Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community
Hozzáadás dátuma 2021. 08. 09. 8:43:05
Módosítás dátuma 2021. 08. 09. 8:43:05

Címkék:

  • human-computer interaction
  • motivation
  • participatory archives

Developing and raising awareness of the zine collections at the British Library

Típus Folyóiratcikk
Szerző Debbie Cox
URL https://search.proquest.com/docview/2018595315?accountid=27464
Kötet 43
Szám 2
Oldalszám 77-81
Kiadvány Art Libraries Journal
ISSN 03074722
Dátum 2018-04
Egyéb Number: 2
Publisher: Cambridge University Press
Place: Cambridge
DOI http://dx.doi.org/10.1017/alj.2018.5
Nyelv English
Kivonat This article presents a practice-based account of collection development related to zines in the British Library. Rather than making the case for the collecting of zines, it aims to describe the process of collection building in a specific time and place, so that researchers have a better understanding of why certain resources are offered to them and others are not, and to share experiences with other librarians with zine collections. Zines form an element of the cultural memory of activists and cultural creators, and for researchers studying them it would seem useful to make transparent the motivations, methods and limitations of collection building. Librarians in the USA have written about their collecting practices for some time, for instance at Barnard College1and New York Public Library2, there has been less written about the practices of UK libraries. The article aims to make a contribution as a case study alongside accounts of collection development in a range of other libraries with zine collections, and it is written primarily from my own perspective as a curator in Contemporary British Collections since 2015, focusing on current practice, with some reference to earlier collecting.
Hozzáadás dátuma 2021. 08. 09. 8:42:19
Módosítás dátuma 2021. 08. 09. 8:42:19

Címkék:

  • Web archiving
  • Collection development
  • Library And Information Sciences
  • Library collections
  • Cultural heritage
  • Depository libraries
  • Donations
  • National libraries
  • Research
  • Researchers
  • United Kingdom–UK

Developing Web Archiving Metadata Best Practices to Meet User Needs

Típus Folyóiratcikk
Szerző Jackey Dooley
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 8
Szám 2
Kiadvány Journal of Western Achives
Dátum 2017
Egyéb Number: 2
Publisher: DigitalCommons@USU
Place: United States, North America
Kivonat The OCLC Research Library Partnership Web Archiving Metadata Working Group was established to meet a widely recognized need for best practices for descriptive metadata for archived websites. The Working Group recognizes that development of successful best practices intended to ensure discoverability requires an understanding of user needs and behavior. We have therefore conducted an extensive literature review to build our knowledge and will issue a white paper summarizing what we have learned. We are also studying existing and emerging approaches to descriptive metadata in this realm and will publish a second report recommending best practices. We will seek broad community input prior to publication.
Hozzáadás dátuma 2021. 08. 09. 8:42:08
Módosítás dátuma 2021. 08. 09. 8:42:08

Címkék:

  • web archiving
  • Metadata
  • best practices
  • Cataloging of archival materials
  • descriptive metadata

Development of the National Library of the Czech Republic 2011–2016: Past, Present and Future

Típus Folyóiratcikk
Szerző Tomáš Böhm
URL http://10.0.28.59/ALX.0028
Kötet 25
Szám 3
Oldalszám 17-24
Kiadvány Alexandria: The Journal of National and International Library and Information Issues
ISSN 0955-7490
Dátum 2014-12
Egyéb Number: 3
DOI 10.7227/ALX.0028
Kivonat The National Library of the Czech Republic, which was founded in 1773 by the Austrian Empress Maria Theresa, is one of the oldest National Libraries in Europe. It has been through various organizational changes incorporating other libraries and institutions. In addition to providing traditional library service, the library is active in such fields as digitization, paper documents restoration and preservation, refurbishment of its main seat in the baroque Klementinum building and international cooperation. The most important digitization project is the creation of the National Digital Library, which will also serve as the LTP (Long Term Preservation) repository for other digitization projects carried out by either the National Library or by other libraries and institutions in the Czech Republic. Other projects in this field are: the world's biggest digital manuscript library (Manuscriptorium), creation of the Web Archive, digitization of rare books in partnership with Google, formation of the repository for digitized Czech cultural heritage and, together with other main Czech libraries, work on the creation of the Czech Libraries Portal. The Library is further active in paper documents restoration and preservation where it is trying to tackle the problem of de-acidification as well as the formation of the physical Czech Depository Library and the Interdisciplinary Methodological Centre for Book Restoration and Conservation. The Library continues to serve its users during the refurbishment of the Klementinum. It aims to create 'a modern library in baroque walls' by the end of 2018. Furthermore, a new physical depository has been bulit on the outskirts of Prague. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:53
Módosítás dátuma 2021. 08. 09. 8:42:53

Címkék:

  • web archiving
  • Web archives
  • digitization
  • Digitization of library materials
  • library refurbishment
  • Narodni knihovna Ceske republiky
  • National Library of the Czech Republic
  • paper documents restoration

Developments in Digital Preservation at the University of Illinois: The Hub and Spoke Architecture for Supporting Repository Interoperability and Emerging Preservation Standards

Típus Folyóiratcikk
Szerző Thomas Habing
Szerző Janet Eke
Szerző Matthew A. Cordial
Szerző William Ingram
Szerző Robert Manaster
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 57
Szám 3
Oldalszám 556-579
Kiadvány Library Trends
ISSN 1559-0682
Dátum 2009
Egyéb Number: 3
DOI 10.1353/lib.0.0052
Kivonat Funded by the National Digital Information Infrastructure and Preservation Program (NDIIPP), the ECHO DEPository Project supports the digital preservation efforts of the Library of Congress by contributing research and software to help society GET, SAVE, and KEEP its digital cultural heritage. Project activities include building Web archiving tools, evaluating existing repository software, developing architectures to enhance existing repositories' interoperability and preservation features, and modeling next-generation repositories for supporting long-term preservation. This article describes the development of the Hub and Spoke (HandS) Tool Suite, built to help curators of digital objects manage content in multiple repository systems while preserving valuable preservation metadata. Implementing METS and PREMIS, HandS provides a standards-based method for packaging content that allows digital objects to be moved between repositories more easily while supporting the collection of technical and provenance information crucial for long-term preservation. Related project work investigating the more fundamental semantic issues underlying the preservation of the meaning of digital objects over time is profiled separately in this issue (Dubin et al., 2009). [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:48
Módosítás dátuma 2021. 08. 09. 8:42:48

Címkék:

  • Web archiving
  • Information science
  • Web archives
  • Digitization of archival materials
  • Digital preservation
  • Digitization
  • Library science
  • Digitization of library materials
  • Archives — Computer network resources
  • Preservation of materials
  • Library of Congress

Digital Archaeology in the Web of Links: Reconstructing a Late-1990s Web Sphere

Típus Könyvfejezet
Szerző Peter Webster
Szerkesztő Daniel Gomes
Szerkesztő Elena Demidova
Szerkesztő Jane Winters
Szerkesztő Thomas Risse
URL https://doi.org/10.1007/978-3-030-63291-5_12
Hely Cham
Kiadó Springer International Publishing
Oldalszám 155-164
ISBN 978-3-030-63291-5
Dátum 2021
Egyéb DOI: 10.1007/978-3-030-63291-5_12
Hozzáférés 2021. 07. 15. 9:52:26
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat One unit of analysis within the archived Web is the “web sphere”, a body of material from different hosts that is related in some meaningful sense. This chapter outlines a method of reconstructing such a web sphere from the late 1990s, that of conservative British Christians as they interacted with each other and with others in the USA in relation to issues of morality, domestic and international politics, law and the prophetic interpretation of world events. Using an iterative method of interrogation of the graph of links for the archived UK Web, it shows the potential for the reconstruction of what I describe as a “soft” web sphere from what is in effect an archive with a finding aid with only classmarks and no descriptions.
Könyv címe The Past Web: Exploring Web Archives
Rövid cím Digital Archaeology in the Web of Links
Hozzáadás dátuma 2021. 08. 09. 8:43:58
Módosítás dátuma 2021. 08. 09. 8:43:58

Digital Contemporary History Sources, Tools, Methods, Issues

Típus Folyóiratcikk
Szerző Peter Webster
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 7
Szám 14
Oldalszám 30-38
Kiadvány Temp – tidsskrift for historie
Dátum 2017
Egyéb Number: 14
Publisher: Nyt Selskab for Historie
Place: Denmark, Europe
Kivonat Digital contemporary history: sources, tools, methods, issuesThis essay suggests that there has been a relative lack of digitally enabled historical research on the recent past, when compared to earlier periods of history. It explores why this might be the case, focussing in particular on both the obstacles and some missing drivers to mass digitisation of primary sources for the 20th century. It suggests that the situation is likely to change, and relatively soon, as a result of the increasing availability of sources that were born digital, and of Web archives in particular. The article ends with some reflections on several shifts in method and approach which that changed situation is likely to entail.
Hozzáadás dátuma 2021. 08. 09. 8:43:25
Módosítás dátuma 2021. 08. 09. 8:43:25

Címkék:

  • web archives
  • digital history
  • digital research
  • digital sources
  • information society

Digital curation: the development of a discipline within information science

Típus Folyóiratcikk
Szerző Sarah Higgins
URL https://www.emeraldinsight.com/doi/10.1108/JD-02-2018-0024
Kötet 74
Szám 6
Oldalszám 1318-1338
Kiadvány Journal of Documentation
ISSN 0022-0418
Dátum 2018-10-08
Egyéb Number: 6
DOI 10.1108/JD-02-2018-0024
Kivonat Digital curation addresses the technical, administrative and financial ecology required to ensure that digital information remains accessible and usable over the long term. The purpose of this paper is to trace digital curation’s disciplinary emergence and examine its position within the information sciences domain in terms of theoretical principles, using a case study of developments in the UK and the USA.
Hozzáadás dátuma 2021. 08. 09. 8:43:42
Módosítás dátuma 2021. 08. 09. 8:43:42

Címkék:

  • Development
  • Digital curation
  • Education
  • History
  • Models
  • Professional associations

Digital Heritage and Heritagization ; Patrimoine et patrimonialisation numériques

Típus Folyóiratcikk
Szerző Francesca Musiani
Szerző Valerie Schafer
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Dátum 2017
Egyéb Place: Luxembourg, Europe
Kivonat Introduction to a special issueThe six articles and the introduction composing this issue fully situate themselves within the interdisciplinary dimension of digital heritage analyses, including perspectives from history, information and communication sciences, sociology of innovation, digital humanities or juridical sciences.
Hozzáadás dátuma 2021. 08. 09. 8:43:24
Módosítás dátuma 2021. 08. 09. 8:43:24

Címkék:

  • digital
  • web archives
  • Arts & humanities :: Multidisciplinary
  • Arts & sciences humaines :: Multidisciplinaire
  • born-digital heritage
  • digital traces
  • general & others [A99]
  • généralités & autres [A99]
  • history

Digital humanities and web archives: Possible new paths for combining datasets

Típus Folyóiratcikk
Szerző Niels Brügger
URL https://doi.org/10.1007/s42803-021-00038-z
Kiadvány International Journal of Digital Humanities
ISSN 2524-7840
Dátum 2021-05-28
Folyóirat rövid neve Int J Digit Humanities
DOI 10.1007/s42803-021-00038-z
Hozzáférés 2021. 07. 15. 11:10:23
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat This article discusses the importance of web archives making their collections available as data and not only as sources seen through the Wayback Machine’s interface where only individual web pages are displayed. This will help unlock the full potential of the treasure trove that web archives constitute, and thereby also open up for methods from the wider field of digital humanities. Based on a case study of the entire Danish web domain .dk the article discusses methodological challenges involved in combining large extracted datasets from web archives, namely metadata about the size of websites and data about hyperlinks from the same websites. The aim is to answer the following two questions: 1) How to combine two different types of datasets extracted from a web archive, in this case the Danish Netarkivet? 2) What can the result of such a combination teach us about the structural characteristics of the Danish web domain from 2006 to 2015? The article shows that, indeed, it is possible to go beyond the Wayback Machine as the prime interface to web archives by combining two distinct datasets, and that such a venture can provide valuable knowledge about the overall structure of the Danish web domain, thus highlighting that websites of the same size tend to constitute isolated ‘link islands’, and that big websites are also the most important in the hyperlink network, which is more clearly the case in 2015 than in 2006.
Rövid cím Digital humanities and web archives
Hozzáadás dátuma 2021. 08. 09. 8:44:20
Módosítás dátuma 2021. 08. 09. 8:44:20

Digital Humanities in the 21st Century:Digital Material as a Driving Force

Típus Folyóiratcikk
Szerző Niels Brügger
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 10
Szám 3
Kiadvány Digital Humanities Quarterly
Dátum 2016
Egyéb Number: 3
Place: Denmark, Europe
Kivonat In this article it is argued that one of the major transformative factors of the humanities at the beginning of the 21st century is the shift from analogue to digital source material, and that this shift will affect the humanities in a variety of ways. But various kinds of digital material are not digital in the same way, which a distinction between digitized, born-digital, and reborn-digital may help us acknowledge, thereby helping us to understand how each of these types of digital material affects different phases of scholarly work in its own way. This is illustrated by a detailed comparison of the nature of digitized collections and web archives. ; In this article it is argued that one of the major transformative factors of the humanities at the beginning of the 21st century is the shift from analogue to digital source material, and that this shift will affect the humanities in a variety of ways. But various kinds of digital material are not digital in the same way, which a distinction between digitized, born-digital, and reborn-digital may help us acknowledge, thereby helping us to understand how each of these types of digital material affects different phases of scholarly work in its own way. This is illustrated by a detailed comparison of the nature of digitized collections and web archives.
Hozzáadás dátuma 2021. 08. 09. 8:42:03
Módosítás dátuma 2021. 08. 09. 8:42:03

Címkék:

  • web archiving
  • web
  • web archive
  • born digital
  • digital humaniora
  • digital humanities
  • digital material
  • digitaliseret
  • digitalitet
  • digitality
  • digitalt materiale
  • digitised
  • født digitalt
  • genfødt digitalt
  • reborn digital
  • webarkiv
  • webarkivering

Digital humanities preservation: A conversation for developing sustainable digital projects

Típus Könyvfejezet
Szerző A. Miller
Szerző Molly Taylor-Poleskey
Kiadó Routledge
ISBN 978-0-429-39992-3
Dátum 2020
Egyéb Num Pages: 17
Kivonat This chapter describes an urgent need to transform practice in digital humanities scholarship to include preservation at the forefront of digital project planning. Typical obstacles to an effective preservation plan include lack of time and funding, the structure of digital scholarship grants, lack of a culture of collaboration outside of the humanities, and uncertain ownership of projects after the end of implementation. Creating and teaching DH typically take precedence over thought about project preservation. This chapter argues instead that working with an interdisciplinary team on a preservation plan at the outset of a project will greatly increase the longevity and reproducibility of digital projects. Additionally, this chapter offers one way to spark the preservation conversation with an accompanying Preservation Plan template. Ideally, the principles of preservation will become an expected part of DH education to circumvent lost projects due to lack of technical, human, institutional, and financial support.
Könyv címe Transformative Digital Humanities
Rövid cím Digital humanities preservation
Hozzáadás dátuma 2021. 08. 09. 8:44:25
Módosítás dátuma 2021. 08. 09. 8:44:25

Digital Libraries and Engines of Search: New Information Systems in the Context of the Digital Preservation

Típus Dolgozat
Szerző Ricardo Campos
URL http://doi.acm.org/10.1145/1352694.1352703
Hely New York, NY, USA
Kiadó ACM
Oldalszám 8:1–8:9
ISBN 978-1-59593-598-4
Dátum 2007
Egyéb Series Title: EATIS '07
Citation Key: Campos:2007:DLE:1352694.1352703
DOI 10.1145/1352694.1352703
Kivonat The first's library projects occur some years ago with digitization, but just in 1996, the first's web archive initiatives start occurring. Such, was based in the Internet growth and in its increasing use, items that revealed to be an opportunity to transform and readapt the traditional library services. In this context, search engines play a fundamental role of support to the new paradigm of knowledge, by capturing, storing and providing access to the resources, allowing the existence of a digital library in each computer with internet access. In this article we analyze the ways of developing a digital library, taking higher attention to the web harvesting technique, and presenting digital libraries capabilities and limitations. Then we fully summarize relevant projects and initiatives, to finally study the role of search engines in what concerns to, digital preservation, access and information diffusion.
Kiadvány címe Proceedings of the 2007 Euro American Conference on Telematics and Information Systems
Hozzáadás dátuma 2021. 08. 09. 8:43:18
Módosítás dátuma 2021. 08. 09. 8:43:18

Címkék:

  • web archiving
  • digital preservation
  • digital libraries
  • web harvesting
  • search engines
  • information systems

Digital methods in a post-API environment

Típus Folyóiratcikk
Szerző Jessamy Perriam
Szerző Andreas Birkbak
Szerző Andy Freeman
URL http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=142313286&lang=hu&site=ehost-live
Kötet 23
Szám 3
Oldalszám 277-290
Kiadvány International Journal of Social Research Methodology
ISSN 13645579
Dátum May 2020
Egyéb Number: 3
Folyóirat rövid neve International Journal of Social Research Methodology
DOI 10.1080/13645579.2019.1682840
Hozzáférés 2021. 07. 16. 11:16:13
Könyvtár Katalógus EBSCOhost
Kivonat Qualitative and mixed methods digital social research often relies on gathering and storing social media data through the use of APIs (Application Programming Interfaces). In past years this has been relatively simple, with academic developers and researchers using APIs to access data and produce visualisations and analysis of social networks and issues. In recent years, API access has become increasingly restricted and regulated by corporations at the helm of social media networks. Facebook (the corporation) has restricted academic research access to Facebook (the social media platform) along with Instagram (a Facebook-owned social media platform). Instead, they have allowed access to sources where monetisation can easily occur, in particular, marketers and advertisers. This leaves academic researchers of digital social life in a difficult situation where API related research has been curtailed. In this paper we describe some rationales and methodologies for using APIs in social research. We then introduce some of the major events in academic API use that have led to the prohibitive situation researchers now find themselves in. Finally, we discuss the methodological and ethical issues this produces for researchers and, suggest some possible steps forward for API related research.
Hozzáadás dátuma 2021. 08. 09. 8:44:38
Módosítás dátuma 2021. 08. 09. 8:44:38

Címkék:

  • Twitter
  • ethics
  • UNIVERSITY research
  • SOCIAL media
  • Facebook
  • APIs
  • web scraping
  • SOCIAL networks
  • Digital methods
  • Netvizz
  • SOCIAL media in business
  • SOCIAL media in education
  • SOCIAL network analysis
  • SOCIAL network theory
  • SOCIAL science research

Digital Preservation and Authentic Legal Information

Típus Jelentés
Szerző G. Patrick Flanagan
URL https://papers.ssrn.com/abstract=2463288
Hely Rochester, NY
Dátum 2010
Egyéb Issue: ID 2463288
DOI: 10.2139/ssrn.2463288
Hozzáférés 2020. 08. 20. 10:20:14
Intézmény Social Science Research Network
Jelentés típusa SSRN Scholarly Paper
Könyvtár Katalógus papers.ssrn.com
Nyelv en
Kivonat Writing and researching about the permanence of digital documents is a quizzical, self-referential activity. I set out to uncover approaches to the problems facing the longevity of authentic legal information. How did I do this? Primarily, I accessed and read electronic documents. My exercise here might very well suffer the same issues raised in the information science and legal literature. Faulty, inconsistent, and potentially inauthentic electronic databases may unduly – however subtly – shade my analysis. For what I’m doing here – a student’s attempt to add to an academic discourse – I’m pretty unconcerned
Jelentés száma ID 2463288
Hozzáadás dátuma 2021. 08. 09. 8:43:50
Módosítás dátuma 2021. 08. 09. 8:43:50

Címkék:

  • digital preservation
  • convergence
  • legal information
  • obsolescence

Digital Preservation Metadata Practice for Web Archives.

Típus Könyvfejezet
Szerző Clément Oury
Szerző Karl-Rainer Blumenthal
Szerző Sébastien Peyrard
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Oldalszám 59-82
Dátum 2016
Egyéb ISSN: 9783319437613
Könyv címe Digital Preservation Metadata for Practitioners
Hozzáadás dátuma 2021. 08. 09. 8:41:55
Módosítás dátuma 2021. 08. 09. 8:41:55

Digital Preservation through Archival Collaboration: The Data Preservation Alliance for the Social Sciences.

Típus Folyóiratcikk
Szerző Micah Altman
Szerző Margaret O Adams
Szerző Jonathan Crabtree
Szerző Darrell Donakowski
Szerző Marc Maynard
Szerző Amy Pienta
Szerző Copeland H Young
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 72
Szám 1
Oldalszám 170-184
Kiadvány American Archivist
ISSN 03609081
Dátum 2009
Egyéb Number: 1
Kivonat The Data Preservation Alliance for the Social Sciences (Data-PASS) is a partnership of five major U.S. institutions with a strong focus on archiving social science research. The Library of Congress supports the partnership through its National Digital Information Infrastructure and Preservation Program (NDIIPP). The goal of Data-PASS is to acquire and preserve data from opinion polls, voting records, large-scale surveys, and other social science studies at risk of being lost to the research community. This paper discusses the agreements, processes, and infrastructure that provide a foundation for the collaboration. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:49
Módosítás dátuma 2021. 08. 09. 8:42:49

Címkék:

  • Web archiving
  • Digitization of archival materials
  • Digital preservation
  • Metadata
  • Electronic records
  • Information resources management
  • Archives — Computer network resources
  • Preservation of materials
  • Archives collection management
  • Document imaging systems
  • Social science methodology
  • Social science research

Digital Preservation: challenges, requirements, strategies and scientific output

Típus Folyóiratcikk
Szerző Danilo Formenton
Szerző Luciana Gracioso
Kötet 18
Oldalszám 1-26
Dátum June 14, 2020
Könyvtár Katalógus ResearchGate
Kivonat The aim of this article is to provide a broad and reflective perspective on the main aspects of digital preservation, based on the challenges indicated, the recognized requirements and the strategies analyzed by the scientific community. The methodology adopts quantitative-qualitative and exploratory-descriptive research, with a review of the national and international literature of the last twenty-one years on digital preservation, in order to delineate the trends and policies on the theme as well as deepening the discussion on the needs for archiving and long-term preservation of digital content. Data is analyzed from the bibliographic survey of scientific publications indexed by Scopus and Web of Science from the last five years (2015-2019) that deal with the subject "digital preservation". It was found that among the themes discussed, budgets, costs and metadata for preserving and Web archiving are emerging and studies lacking in Brazilian Information Science. In the international scientific output, Brazil stands out for publication quantity, indicating a maturation of the theme, coinciding with the advance of national projects, such as the Cariniana Network. However, we have financial, human and technological demands that, together with the characteristics of strategies for digital preservation, highlight the usefulness of collaborations and of little-explored national topics.
Rövid cím Digital Preservation
Hozzáadás dátuma 2021. 08. 09. 8:44:15
Módosítás dátuma 2021. 08. 09. 8:44:15

Digitálna knižnica v dobe vírovej II. – Vývoj a služba národného systému sprístupňovania diel

Típus Folyóiratcikk
Szerző Zdenko Vozár
URL http://search.ebscohost.com/login.aspx?direct=true&db=lxh&AN=151051032&lang=hu&site=ehost-live
Kötet 32
Szám 1
Oldalszám 43-58
Kiadvány Digital Library in the era of coronavirus II – the development and series of the national system of accessing publications.
ISSN 18013252
Dátum June 2021
Egyéb Number: 1
Folyóirat rövid neve Knihovna
Hozzáférés 2021. 07. 16. 10:47:14
Könyvtár Katalógus EBSCOhost
Kivonat This article describes implementation, operation and development of emergency systems of digital libraries throughout the Czech Republic during the pandemic crisis of Covid-19 from 2020 till present (2021/03) as alternative information source, mainly for the university students and the public R&D sector. It details changes, evaluation and evolution of these operations, which were brought by the successive and unexpected development of pandemic of Covid-19 during the year 2020 and the early spring of 2021 – mainly the launch of National digital library portal, pressure on licencing policies and intern migration of data, also as introduction of continuous automatic web archiving campaign on this momentous event. An emergency of this magnitude created a pressure upon the necessity of re-evaluation of licencing policies in the short term, but also furthered the agenda of adjusting the general strategy of libraries towards building and enriching online services, especially digital libraries and repositories. It is precisely the access from home-office which facilitates accessibility of otherwise inaccessible titles for all students and registered readers. Moreover, this type of access allows instant and sustainable long term culture exchange during the times of almost total suspension of the circulation of printed words. Principally, this kind of new type of information circulation provided by digital library services should be attainable, but only in the environment of fair licencing agreements for all participants in the book market and information transmission. (English)
Hozzáadás dátuma 2021. 08. 09. 8:44:34
Módosítás dátuma 2021. 08. 09. 8:44:34

Címkék:

  • Národní digitální knihovna
  • National digital library
  • digital libraries
  • Česká republika
  • continuous covid crawl
  • Covid druhá vlna
  • Covid jaro 2020
  • Covid second and third wave
  • Covid spring 2020
  • covid-19
  • Czech Republic
  • data migration
  • digitální knihovny
  • díla nedostupná na trhu
  • emergency licence
  • home-office
  • kontinuální sklizeň webů
  • Kramerius
  • licence
  • migrace
  • Moravian Library
  • Moravská zemská knihovna
  • NDK.cz
  • nouzový stav
  • online access
  • online přístup
  • out of commerce works
  • práva
  • rights
  • rozvoj software
  • state of emergency
  • studenti
  • students
  • SW development
  • system of electronic access
  • systém zpřístupnění
  • universities
  • univerzity
  • výjimečná licence

Digitálne pramene – národný projekt zberu a archivácie v roku 1.

Típus Folyóiratcikk
Szerző Ing. Alojz Androvič
Szerző Bc. Andrej Bizík
Szerző Ing. Peter Hausleitner
Szerző PhDr. Beáta Katrincová
Szerző Mgr. Iveta Lacková
Szerző PhDr. Jana Matúšková
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Szám 1
Oldalszám 1-14
Kiadvány Knihovna PLUS
ISSN 18015948
Dátum 2017-01
Egyéb Number: 1
Kivonat In 2015 the University Library in Bratislava put in the practice the national project Digital Resources — Webharvesting and E-Born Content Archiving. The project was running in the framework of the Operational Program Informatisation of Society. Its ambition was to establish a technical, application and management infrastructure for systematical harvesting and long term preservation of web pages and e-Born resources. The implementation is based on open source software modules (Heritrix, OpenWayback, Invenio). The systems management is optimized for parallel webharvesting. This article presents the experiences and results of the operation of IS Digital Resources in 2016. It describes the workflow of webharvesting and acquisition of e-Born resources and discusses some methodological and practical problems in dealing with e-Born serials. The article brings the analytical and statistical overview of harvests realised in 2016 with a special highlight on the complex harvest of the national .sk domain. (English) [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:07
Módosítás dátuma 2021. 08. 09. 8:42:07

Címkék:

  • web archiving
  • WARC
  • archivácia webu
  • digital curation
  • digitálne kurátorstvo
  • e-Born pramene
  • e-Born resources
  • ISSN
  • web analytics
  • webharvesting
  • webová analytika
  • zber webu

Disappearing News Archives

Típus Folyóiratcikk
Szerző Sarah Jane Davis
URL https://search.proquest.com/docview/1861822700?accountid=27464
Kötet 40
Szám 6
Oldalszám 46
Kiadvány Online Searcher
ISSN 23249684
Dátum 2016
Egyéb Number: 6
Publisher: Information Today, Inc.
Place: Medford
Nyelv English
Kivonat Part of the preservation problem lies in the fact that newspapers are not official public records. According to the ProQuest title list, ProQuest News has the full text of the Milwaukee Journal Sentinel from April 1, 1995, to Dec. 31, 2009, a fraction of the full 123 years (1884-2007) formerly in Google News Archive.
Hozzáadás dátuma 2021. 08. 09. 8:42:26
Módosítás dátuma 2021. 08. 09. 8:42:26

Címkék:

  • Web archiving
  • Public libraries
  • Digital archives
  • Digitization
  • Computers–Internet
  • Internet
  • Technological obsolescence
  • Erdogan
  • Information professionals
  • Newspapers
  • Recep Tayyip
  • Turkey

Discovery Happens Here: PW Talks with Wikipedia's Jake Orlowitz

Típus Folyóiratcikk
Szerző Anonymous
URL https://search.proquest.com/docview/1940703367?accountid=27464
Kötet 264
Szám 38
Oldalszám 28
Kiadvány Publishers Weekly
ISSN 00000019
Dátum 2017-09-15
Egyéb Number: 38
Publisher: PWxyz, LLC
Place: New York
Nyelv English
Kivonat […]we're looking to provide a better experience for our users.[…]we're working with partners like the Internet Archive to make sure more than a million URLs are properly archived and functioning; with OCLC to make it possible to cite books automatically, via an ISBN; and with OAdoi and OAbot to make free versions of paywalled sources cited on Wikipedia accessible and easy to find.[…]our hope is that readers who engage with Wikipedia will go on to explore the full-text resources cited there, whether in books, repositories, publisher websites, or, of course, in their public or university libraries.[…]those edits must pass through machine learning bots running on increasingly sophisticated neural networks looking for common vandalism patterns, through hundreds of language-matching RegEx filters catching bad words, through thousands of human "recent change" patrollers, and through tens of thousands of people's personal article watch lists.There's been tremendous evolution and flux around everything from peer review, to article levels and alternative metrics, open access and business models, creative commons licensing, social media, you name it.
Hozzáadás dátuma 2021. 08. 09. 8:42:25
Módosítás dátuma 2021. 08. 09. 8:42:25

Címkék:

  • Web archiving
  • Library And Information Sciences
  • Archives & records
  • Internet
  • Library collections
  • Information literacy
  • E-books
  • Community
  • Essays
  • Librarians
  • Library associations

Doing Web history with the Internet Archive: screencast documentaries

Típus Folyóiratcikk
Szerző Richard Rogers
URL http://www.tandfonline.com/doi/abs/10.1080/24701475.2017.1307542
Kötet 1
Szám 1-2
Oldalszám 160-172
Kiadvány Internet Histories
ISSN 2470-1475
Dátum 2017-01-02
Egyéb Number: 1-2
Publisher: Routledge
DOI 10.1080/24701475.2017.1307542
Kivonat This short article explores the challenges involved in demonstrating the value of web archives, and the histories that they embody, beyond media and Internet studies. Given the difficulties of working with such complex archival material, how can researchers in the humanities and social sciences more generally be persuaded to integrate Internet histories into their research? How can institutions and organisations be sufficiently convinced of the worth of their own online histories to take steps to preserve them? And how can value be demonstrated to the wider general public? It touches on public attitudes to personal and institutional Internet histories, barriers to access to web archives – technical, legal and methodological – and the cultural factors within academia that have hindered the penetration of new ways of working with new kinds of primary source. Rather than providing answers, this article is intended to provoke discussion and dialogue between the communities for whom Internet histories can and should be of significance.
Hozzáadás dátuma 2021. 08. 09. 8:41:44
Módosítás dátuma 2021. 08. 09. 8:41:44

Durable Top-k Search in Document Archives

Típus Dolgozat
Szerző Leong Hou U
Szerző Nikos Mamoulis
Szerző Klaus Berberich
Szerző Srikanta Bedathur
URL http://doi.acm.org/10.1145/1807167.1807228
Hely New York, NY, USA
Kiadó ACM
Oldalszám 555-566
ISBN 978-1-4503-0032-2
Dátum 2010
Egyéb Series Title: SIGMOD '10
Citation Key: U:2010:DTS:1807167.1807228
DOI 10.1145/1807167.1807228
Kivonat We propose and study a new ranking problem in versioned databases. Consider a database of versioned objects which have different valid instances along a history (e.g., documents in a web archive). Durable top-k search finds the set of objects that are consistently in the top-k results of a query (e.g., a keyword query) throughout a given time interval (e.g., from June 2008 to May 2009). Existing work on temporal top-k queries mainly focuses on finding the most representative top-k elements within a time interval. Such methods are not readily applicable to durable top-k queries. To address this need, we propose two techniques that compute the durable top-k result. The first is adapted from the classic top-k rank aggregation algorithm NRA. The second technique is based on a shared execution paradigm and is more efficient than the first approach. In addition, we propose a special indexing technique for archived data. The index, coupled with a space partitioning technique, improves performance even further. We use data from Wikipedia and the Internet Archive to demonstrate the efficiency and effectiveness of our solutions.
Kiadvány címe Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data
Hozzáadás dátuma 2021. 08. 09. 8:43:37
Módosítás dátuma 2021. 08. 09. 8:43:37

Címkék:

  • temporal queries
  • document archives
  • top-k search

Dynamic Classification in Web Archiving Collections

Típus Dolgozat
Szerző Krutarth Patel
Szerző Cornelia Caragea
Szerző Mark Phillips
URL https://aclanthology.org/2020.lrec-1.182
Hely Marseille, France
Kiadó European Language Resources Association
Oldalszám 1459–1468
ISBN 979-10-95546-34-4
Dátum 2020-05
Hozzáférés 2021. 07. 15. 9:37:39
Könyvtár Katalógus ACLWeb
Konferencia címe LREC 2020
Nyelv English
Kivonat The Web archived data usually contains high-quality documents that are very useful for creating specialized collections of documents. To create such collections, there is a substantial need for automatic approaches that can distinguish the documents of interest for a collection out of the large collections (of millions in size) from Web Archiving institutions. However, the patterns of the documents of interest can differ substantially from one document to another, which makes the automatic classification task very challenging. In this paper, we explore dynamic fusion models to find, on the fly, the model or combination of models that performs best on a variety of document types. Our experimental results show that the approach that fuses different models outperforms individual models and other ensemble methods on three datasets.
Kiadvány címe Proceedings of the 12th Language Resources and Evaluation Conference
Hozzáadás dátuma 2021. 08. 09. 8:43:52
Módosítás dátuma 2021. 08. 09. 8:43:52

Effects of Maximum Flow Algorithm on Identifying Web Community

Típus Dolgozat
Szerző Noriko Imafuji
Szerző Masaru Kitsuregawa
URL http://doi.acm.org/10.1145/584931.584941
Hely New York, NY, USA
Kiadó ACM
Oldalszám 43-48
ISBN 1-58113-593-9
Dátum 2002
Egyéb Series Title: WIDM '02
Citation Key: Imafuji:2002:EMF:584931.584941
DOI 10.1145/584931.584941
Kivonat In this paper, we describe the effects of using maximum flow algorithm on extracting web community from the web. A web community is a set of web pages having a common topic. Since the web can be recognized as a graph that consists of nodes and edges that represent web pages and hyperlinks respectively, so far various graph theoretical approaches have been proposed to extract web communities from the web graph. The method of finding a web community using maximum flow algorithm was proposed by NEC Research Institute in Princeton two years ago. However the properties of web communities derived by this method have been seldom known. To examine the effects of this method, we selected 30 topics randomly and experimented using Japanese web archives crawled in 2000. Through these experiments, it became clear that the method has both advantages and disadvantages. We will describe some strategies to use this method effectively. Moreover, by using same topics, we examined another method that is based on complete bipartite graphs. We compared the web communities obtained by those methods and analyzed those characteristics.
Kiadvány címe Proceedings of the 4th International Workshop on Web Information and Data Management
Hozzáadás dátuma 2021. 08. 09. 8:43:37
Módosítás dátuma 2021. 08. 09. 8:43:37

Címkék:

  • web graph
  • web community
  • maximum-flow algorithm

Efficient Temporal Keyword Search over Versioned Text

Típus Dolgozat
Szerző Avishek Anand
Szerző Srikanta Bedathur
Szerző Klaus Berberich
Szerző Ralf Schenkel
URL http://doi.acm.org/10.1145/1871437.1871528
Hely New York, NY, USA
Kiadó ACM
Oldalszám 699-708
ISBN 978-1-4503-0099-5
Dátum 2010
Egyéb Series Title: CIKM '10
Citation Key: Anand:2010:ETK:1871437.1871528
DOI 10.1145/1871437.1871528
Kivonat Modern text analytics applications operate on large volumes of temporal text data such as Web archives, newspaper archives, blogs, wikis, and micro-blogs. In these settings, searching and mining needs to use constraints on the time dimension in addition to keyword constraints. A natural approach to address such queries is using an inverted index whose entries are enriched with valid-time intervals. It has been shown that these indexes have to be partitioned along time in order to achieve efficiency. However, when the temporal predicate corresponds to a long time range, requiring the processing of multiple partitions, naive query processing incurs high cost of reading of redundant entries across partitions. We present a framework for efficient approximate processing of keyword queries over a temporally partitioned inverted index which minimizes this overhead, thus speeding up query processing. By using a small synopsis for each partition we identify partitions that maximize the number of final non-redundant results, and schedule them for processing early on. Our approach aims to balance the estimated gains in the final result recall against the cost of index reading required. We present practical algorithms for the resulting optimization problem of index partition selection. Our experiments with three diverse, large-scale text archives reveal that our proposed approach can provide close to 80% result recall even when only about half the index is allowed to be read.
Kiadvány címe Proceedings of the 19th ACM International Conference on Information and Knowledge Management
Hozzáadás dátuma 2021. 08. 09. 8:43:37
Módosítás dátuma 2021. 08. 09. 8:43:37

Címkék:

  • partition selection
  • partitioned inverted index
  • synopses
  • time-travel search

Efficient Topical Focused Crawling Through Neighborhood Feature

Típus Folyóiratcikk
Szerző Tanaphol Suebchua
Szerző Bundit Manaskasemsak
Szerző Arnon Rungsawang
Szerző Hayato Yamana
URL http://link.springer.com/10.1007/s00354-017-0029-8
Kötet 36
Szám 2
Oldalszám 95-118
Kiadvány New Generation Computing
ISSN 0288-3635
Dátum 2018-04-15
Egyéb Number: 2
DOI 10.1007/s00354-017-0029-8
Kivonat A focused web crawler is an essential tool for gathering domain-specific data used by national web corpora, vertical search engines, and so on, since it is more efficient than general Breadth-First or Depth-First crawlers. The problem in focused crawling research is the prioritization of unvisited web pages in the crawling frontier followed by crawling these web pages in the order of their priority. The most common feature, adopted in many focused crawling researches, to prioritize an unvisited web page is the relevancy of the set of its source web pages, i.e., its in-linked web pages. However, this feature is limited, because we cannot estimate the relevancy of the unvisited web page correctly if we have few source web pages. To solve this problem and enhance the efficiency of focused web crawlers, we propose a new feature, called the ‘‘neighborhood feature’’. This enables the adoption of additional already-downloaded web pages to estimate the priority of a target web page. The additionally adopted web pages consist both of web pages located at the same directory as that of the target web page and web pages whose directory paths are similar to that of the target web page. Our experimental results show that our enhanced focused crawlers outperform the crawlers not utilizing the neighborhood feature as well as the state-ofthe-art focused crawlers, including HMM crawler.
Hozzáadás dátuma 2021. 08. 09. 8:42:37
Módosítás dátuma 2021. 08. 09. 8:42:37

Címkék:

  • Web archive
  • Domain-specific dataset
  • Focused crawler
  • Vertical search engine

Egyedi mentésekre szolgáló webarchiváló szoftverek

Típus Folyóiratcikk
Szerző László Drótos
Szerző Márton Németh
URL http://ojs.elte.hu/3k/article/view/1371
Kötet 29
Szám 12.
Oldalszám 3-11
Kiadvány Könyv, Könyvtár, Könyvtáros
Dátum Január 5, 2021
Egyéb Number: 12.
Section: Műhelykérdések
Folyóirat rövid neve 3K
Hozzáférés 2021. 08. 04. 2:00:00
Hozzáadás dátuma 2021. 08. 09. 8:44:41
Módosítás dátuma 2021. 08. 09. 8:44:41

Együttmûködési lehetôségek a webarchiválás területén.

Típus Folyóiratcikk
Szerző Ákos László Visky
URL http://ojs.elte.hu/kf/article/view/2297
Kötet 67
Szám 1
Oldalszám 39-45
Kiadvány Opportunities for collaboration in the field of web archiving.
ISSN 00233773
Dátum Március 2021
Egyéb Number: 1
Folyóirat rövid neve Library Review / Konyvtari Figyelo
Könyvtár Katalógus EBSCOhost
Kivonat This is the published version of a presentation at the online workshop “404 Not Found – Who is to preserve the internet?”. It describes a collaboration model established within the framework of the Public Collection Digitization Strategy (KDS) between the Web Archive of the National Széchényi Library (NSZL) and some main county libraries. The advantages of collaboration and the co-ordination of workflows among the partners in the field of web archiving are presented. Some further professional discussions and presentations are also mentioned along with proposals for further research and training activities related to NSZL’s Web Archive. [ABSTRACT FROM AUTHOR]
Archívum Library, Information Science & Technology Abstracts
Hozzáadás dátuma 2021. 08. 09. 8:44:41
Módosítás dátuma 2021. 08. 09. 8:44:41

Címkék:

  • Web archiving
  • Web archives
  • Public libraries
  • Digitization
  • Internet
  • National libraries
  • Preservation
  • Development plan
  • Hungary
  • National library
  • Co-operation
  • National archives

Electronic Legal Deposit: Shaping the library collections of the future

Típus Könyv
Szerző Paul Gooding
Szerző Melissa Terras
Kiadó Facet Publishing
ISBN 978-1-78330-377-9
Dátum 2020-10-02
Egyéb Google-Books-ID: 7HoUEAAAQBAJ
Könyvtár Katalógus Google Books
Nyelv en
Kivonat Legal deposit libraries, the national and academic institutions who systematically preserve our written cultural record, have recently been mandated with expanding their collection practices to include digitised and born-digital materials. The regulations that govern electronic legal deposit often also prescribe how these materials can be accessed. Although a growing international activity, there has been little consideration of the impact of e-legal deposit on the 21st Century library, or on its present or future users.This edited collection is a timely opportunity to bring together international authorities who are placed to explore the social, institutional and user impacts of e-legal deposit. It uniquely provides a thorough overview of this worldwide issue at an important juncture in the history of library collections in our changing information landscape, drawing on evidence gathered from real-world case studies produced in collaboration with leading libraries, researchers and practitioners (Biblioteca Nacional de México, Bodleian Libraries, British Library, National Archives of Zimbabwe, National Library of Scotland, National Library of Sweden). Chapters consider the viewpoint of a variety of stakeholders, including library users, researchers, and publishers, and provide overviews of the complex digital preservation and access issues that surround e-legal deposit materials, such as web archives and interactive media.The book will be essential reading for practitioners and researchers in national and research libraries, those developing digital library infrastructures, and potential users of these collections, but also those interested in the long-term implications of how our digital collections are conceived, regulated and used. Electronic legal deposit is shaping our digital library collections, but also their future use, and this volume provides a rigorous account of its implementation and impact.
Rövid cím Electronic Legal Deposit
Terjedelem 272
Hozzáadás dátuma 2021. 08. 09. 8:44:05
Módosítás dátuma 2021. 08. 09. 8:44:05

Címkék:

  • Language Arts & Disciplines / Library & Information Science / Digital & Online Resources

Embracing Web 2.0: Archives and the Newest Generation of Web Applications.

Típus Folyóiratcikk
Szerző Mary Samouelian
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 72
Szám 1
Oldalszám 42-71
Kiadvány American Archivist
ISSN 03609081
Dátum 2009
Egyéb Number: 1
Kivonat Archivists are converting physical collections to digital formats and displaying surrogates of these primary sources on their websites. Simultaneously, the Web is moving toward a shared environment that embraces collective intelligence and participation, which is often called Web 2.0. This paper investigates the extent to which Web 2.0 features have been integrated into archival digitization projects. Although the use of Web 2.0 features has not yet been widely discussed in the professional archival literature, this exploratory study of college and university repository websites in the United States suggests that archival professionals are embracing Web 2.0 to promote their digital content and redefine relationships with their patrons. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:49
Módosítás dátuma 2021. 08. 09. 8:42:49

Címkék:

  • Web archiving
  • Digitization of archival materials
  • Digital preservation
  • Digitization
  • Institutional repositories
  • Archivists
  • Archival materials
  • University & college archives
  • Collection management (Libraries)
  • Internet publishing
  • Scholarly electronic publishing
  • Scholarly websites
  • Technological innovations
  • Web 2.0

Empirical Research on Web Harvesting in the Process of Text and Data Mining in National Libraries of EU Member States

Típus Folyóiratcikk
Szerző Marinos Papadopoulos
Szerző Maria Botti
Szerző M. A. Paraskevi (Vicky) Ganatsiou
Szerző Christos Zampakolas
URL http://www.scirp.org/journal/Paperabs.aspx?PaperID=98160
Kötet 10
Szám 01
Oldalszám 88
Kiadvány Open Journal of Philosophy
Dátum 2020-02-07
Egyéb Number: 01
Publisher: Scientific Research Publishing
DOI 10.4236/ojpp.2020.101007
Hozzáférés 2021. 07. 15. 10:44:27
Könyvtár Katalógus www.scirp.org
Nyelv en
Kivonat Almost two decades of experience on web harvesting and archiving are counted; the subject of web harvesting and web archiving have been top in the interest of researchers, technologists and librarians-information scientists. Web harvesting projects and pilot programs on archiving content traced on the Web are becoming priorities for national libraries and cultural heritage organizations in the EU. This paper pertains to web harvesting as a process for data mining from web and only through web (“pull” function); this paper elaborates upon research implemented in the framework of the funded research project titled “Web Archiving in Public Libraries and IP Law” that focused on the processes of web-harvesting and archiving as well as Text and Data Mining (TDM) operations in the national libraries of EU Member States. Web archiving as an official operation in national libraries of EU Member States creates web collections and preserves them for the purpose of being accessible and usable in perpetuity. This paper pertains to research on various components of web harvesting and archiving through an online survey (qualitative research) which targeted the national libraries of EU Member States. The research team of authors posed seventeen questions to EU national libraries. The survey output comes from answers delivered by 22 national libraries of EU Member States. The questionnaire was created through the use of Google forms. The researchers reached the EU national libraries via email and follow up telephone calls seeking libraries’ participation in the research. The aim of the research was to delve on participant libraries’ Text and Data Mining operation leveraging on Web harvesting and Web archiving technologies and operations. Results analysis reveals that web harvesting is considered among national libraries’ top priorities; the relevant projects increase in number, the web collections become more and more and the technological infrastructures and tools for web harvesting improve. Yet, there are many issues that remain unresolved. A significant number of surveyed libraries consider that legal and technical issues remain the most important to resolve. Access to harvested material is still under legal restrictions. The Directive 2019/790/EU on Copyright in the Digital Single Market (DSM) creates a favorable legal foundation for the deployment of web harvesting operations in national libraries of the EU Member States. TDM technologies make possible new areas of research. Web harvesting that was initially aimed for preservation purposes now expands to unprecedented research of national heritage through state-of-the-art automated TDM processes.
Hozzáadás dátuma 2021. 08. 09. 8:44:12
Módosítás dátuma 2021. 08. 09. 8:44:12

End of Term 2016 Presidential Web Archive

Típus Folyóiratcikk
Szerző Mark E Phillips
Szerző Kristy K Phillips
URL https://search.proquest.com/docview/2077076158?accountid=27464
Kötet 29
Szám 6
Oldalszám 27
Kiadvány Against the Grain
ISSN 1043-2094
Dátum 2018
Egyéb Number: 6
Publisher: Against the Grain, LLC
Place: Associate Dean for Digital Libraries, the University of North Texas ; University of North Texas ; Associate Dean for Digital Libraries, the University of North Texas
Nyelv English
Kivonat During every Presidential election in the US since 2008, a group of librarians, archivists, and technologists representing institutions across the nation can be found hard at work, preserving the federal web domain and documenting the changes that occur online during the transition. Anecdotally, evidence exists that the data available on the federal web changes after each election cycle, either as a new president takes office, or when an incumbent president changes messages during the transition into a new term of office. Until 2004, nothing had been done to document this change. Originally, the National Archives and Records Administration (NARA) conducted the first large-scale capture of the federal web at the end of George W. Bush’s first term in office in 2004. This is noteworthy because, while institutions like the Library of Congress, the Government Publishing Office, and NARA itself have web archiving as part of their imperative, none of their mandates are so broad as to cover the capture and preservation of the entirety of the federal web.
Hozzáadás dátuma 2021. 08. 09. 8:42:17
Módosítás dátuma 2021. 08. 09. 8:42:17

Címkék:

  • Web archiving
  • Digital archives
  • Library And Information Sciences
  • Presidential elections

Ensuring Long-Term Access to the Memory of the Web Preservation Working Group of the International Internet Preservation Consortium

Típus Folyóiratcikk
Szerző Clément Oury
Szerző Tobias Steinke
Szerző Gina Jones
URL https://search.proquest.com/docview/1272325401?accountid=27464
Szám 58
Oldalszám 34-37
Kiadvány International Preservation News
Dátum 2012-12
Egyéb Number: 58
PMID: 1272325401
Publisher: IFLA — International Federation of Library Associations and Institutions
Place: The Hague
Nyelv English
Kivonat Archiving the Web is the process through which documents and objects on the World Wide Web are captured and stored. There are and have been a number of ways through which this has been accomplished, but the end result is archived Web content (Web site, page, or part of a Website) that is preserved for future researchers, historians and the general public. Preservation involves maintaining the ability to present meaningful access to information over time. In the context of Web archives, the intention of preservation is to retain access to archived Web resources, so they can continue to be used and understood despite changes in access technologies or without unacceptable loss of integrity or meaning. The International Internet Preservation Consortium, chartered in 2003, is made up of institutions with basically similar goals of preserving Web content for heritage purposes and which generally share the same harvesting and access tools.
Hozzáadás dátuma 2021. 08. 09. 8:41:39
Módosítás dátuma 2021. 08. 09. 8:41:39

Címkék:

  • World Wide Web
  • Library And Information Sciences
  • Archives & records
  • Research
  • Web sites
  • Migration
  • Preservation
  • Data bases
  • Public access

Entity Extraction and Consolidation for Social Web Content Preservation

Típus Könyv
Szerző Stefan Dietze
Szerző Diana Maynard
Szerző Elena Demidova
Szerző Thomas Risse
Szerző Wim Peters
Szerző Katerina Doka
Szerző Yannis Stavrakas
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Hely United States, North America
Dátum 2012
Egyéb DOI: 10.1.1.423.3432
Kivonat With the rapidly increasing pace at which Web content is evolving, particularly social media, preserving the Web and its evolution over time becomes an important challenge. Meaningful analysis of Web content lends itself to an entity-centric view to organise Web resources according to the information objects related to them. Therefore, the crucial challenge is to extract, detect and correlate entities from a vast number of heterogeneous Web resources where the nature and quality of the content may vary heavily. While a wealth of information extraction tools aid this process, we believe that, the consolidation of automatically extracted data has to be treated as an equally important step in order to ensure high quality and non-ambiguity of generated data. In this paper we present an approach which is based on an iterative cycle exploiting Web data for (1) targeted archiving/crawling of Web objects, (2) entity extraction, and detection, and (3) entity correlation. The long-term goal is to preserve Web content over time and allow its navigation and analysis based on well-formed structured RDF data about entities.
Hozzáadás dátuma 2021. 08. 09. 8:42:46
Módosítás dátuma 2021. 08. 09. 8:42:46

Címkék:

  • Web Archiving
  • Data Consolidation
  • Data Enrichment
  • Entity Recognition
  • Linked Data

Erasing history. (Cover story)

Típus Folyóiratcikk
Szerző Maria Bustillos
Szerző Shannon Freshwater
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 57
Szám 1
Oldalszám 112-118
Kiadvány Columbia Journalism Review
ISSN 0010194X
Dátum 2018
Egyéb Number: 1
Kivonat The article discusses the digital journalism focusing on the failure of an online news outlet "The Honolulu Advertiser". The author discusses the history of digital archiving systems, role played by U.S. government in protecting digital archival documents, and technological innovations that protects internet archives.
Hozzáadás dátuma 2021. 08. 09. 8:42:42
Módosítás dátuma 2021. 08. 09. 8:42:42

Címkék:

  • WEB archives
  • ARCHIVES — United States
  • BUSINESS failures
  • HISTORY
  • HONOLULU Advertiser (Newspaper)
  • NEWS websites
  • ONLINE journalism
  • TECHNOLOGICAL innovations in journalism

Ereviews

Típus Folyóiratcikk
Szerző Henrietta Verma and Gary Price
URL https://search.proquest.com/docview/1964143235?accountid=27464
Kötet 142
Szám 19
Oldalszám 100
Kiadvány Library Journal
ISSN 03630277
Dátum 2017-11-15
Egyéb Number: 19
Publisher: Media Source
Place: New York
Nyelv English
Kivonat According to IA founder Brewster Kahle, the BPL collection includes "hillbilly music, early brass bands, and accordion recordings from the turn of the last century, offering an authentic audio portrait of how America sounded a century ago. The Presidential Records Act has in the past been understood to mean that executive branch administrative communication must be archived, but the U.S. Justice Department is moving to dismiss the lawsuit, saying that the president has authority over what is saved in accordance with the act. […]FCW , a publication for federal technology executives, quotes Jason R. Baron, formerly chief litigator for the National Archives and Records Administration: "If White House counsel reads [the statute] narrowly…resulting in White House staff not being required to copy or transfer presidential records to an official electronic account before individual communications self-destruct, is that decision reviewable?" For further information on this case, see ow.ly/RAnE30fT8Yc.
Hozzáadás dátuma 2021. 08. 09. 8:42:18
Módosítás dátuma 2021. 08. 09. 8:42:18

Címkék:

  • Web archiving
  • Communication
  • Digital archives
  • Library And Information Sciences
  • Digitization
  • Books
  • Internet
  • Library collections
  • Metadata
  • Copyright
  • Litigation
  • Online instruction

Estimating PageRank deviations in crawled graphs

Típus Folyóiratcikk
Szerző Helge Holzmann
Szerző Avishek Anand
Szerző Megha Khosla
URL https://doi.org/10.1007/s41109-019-0201-9
Kötet 4
Szám 1
Oldalszám 86
Kiadvány Applied Network Science
ISSN 2364-8228
Dátum 2019
Egyéb Number: 1
DOI 10.1007/s41109-019-0201-9
Kivonat Most real-world graphs collected from the Web like Web graphs and social network graphs are partially discovered or crawled. This leads to inaccurate estimates of graph properties based on link analysis such as PageRank. In this paper we focus on studying such deviations in ordering/ranking imposed by PageRank over crawled graphs. We first show that deviations in rankings induced by PageRank are indeed possible. We measure how much a ranking, induced by PageRank, on an input graph could deviate from the original unseen graph. More importantly, we are interested in conceiving a measure that approximates the rank correlation among them without any knowledge of the original graph. To this extent we formulate the HAK measure that is based on computing the impact redistribution of PageRank according to the local graph structure. We further propose an algorithm that identifies connected subgraphs over the input graph for which the relative ordering is preserved. Finally, we perform extensive experiments on both real-world Web and social network graphs with more than 100M vertices and 10B edges as well as synthetic graphs to showcase the utility of HAK and our High-fidelity Component Selection approach.
Hozzáadás dátuma 2021. 08. 09. 8:43:39
Módosítás dátuma 2021. 08. 09. 8:43:39

Címkék:

  • Crawls
  • PageRank
  • Ranking deviations

Ether Today, Gone Tomorrow: 21st Century Sound Recording Collection in Crisis

Típus Folyóiratcikk
Szerző Judy Tsou
Szerző John Vallier
URL https://search.proquest.com/docview/1761140761?accountid=27464
Kötet 72
Szám 3
Oldalszám 461-483
Kiadvány Music Library Association. Notes
ISSN 00274380
Dátum 2016-03
Egyéb Number: 3
Publisher: Music Library Association
Place: Philadelphia
Nyelv English
Kivonat Today's music industry increasingly favors online-only, direct-to-consumer distribution. No longer can librarians expect to collect recordings on tangible media where first-sale doctrine applies. Instead, at an ever-increasing rate, librarians are discovering that music recordings are available only via such online distribution sites as iTunes or Amazon.com. These distributors require individual purchasers to agree to restrictive end-user license agreements (EULAs) that explicitly forbid institutional ownership and such core library functions as lending. What does this mean for the future of music libraries? The coauthors present an overview of an Institute of Museum and Library Services (IMLS) funded project tasked with investigating the issue, and recommend a series of next steps designed to build our professional capacity toward addressing the challenge.
Hozzáadás dátuma 2021. 08. 09. 8:42:35
Módosítás dátuma 2021. 08. 09. 8:42:35

Címkék:

  • Web archiving
  • Academic libraries
  • Archives & records
  • Library collections
  • Cultural heritage
  • Librarians
  • Library associations
  • Apple iTunes
  • Blues music
  • Emergency preparedness
  • Motion pictures
  • Music libraries
  • Musical recordings
  • Online sales
  • Public access
  • Sound Recording And Reproduction
  • Streaming media

Ethical Challenges and Current Practices in Activist Social Media Archives

Típus Folyóiratcikk
Szerző Ashlyn Velte
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 81
Szám 1
Oldalszám 112-134
Kiadvány The American Archivist
ISSN 0360-9081
Dátum 2018-03
Egyéb Number: 1
DOI 10.17723/0360-9081-81.1.112
Kivonat Social media (Web applications supporting communication between Internet users) empower current activist groups to create records of their activities. Recent digital collections, such as the digital archives of the Occupy Wall Street movement and the Documenting Ferguson Project, demonstrate archival interest in preserving and providing access to activist social media. Literature describing current practices exists for related topics such as Web and social media archives, privacy and access for digital materials, and activist archives. However, research on activist social media archives is scarce. These materials likely present subject- and format-specific challenges not yet identified in peer-reviewed research. Using a survey and semistructured interviews with archivists who collect activist social media, this study describes ethical challenges regarding acquisition and access. Specifically, the respondents were concerned about acquiring permission to collect and provide long-term access to activist groups' social media. When collecting social media as data sets, archivists currently intend to provide moderated access to the archives, whereas when dealing with social media accounts, archivists intend to seek permission to collect from the activist groups and provide access online. These current practices addressing ethical issues may serve as models for other institutions interested in collecting social media from activists. Understanding how to approach activist social media ethically decreases the risk th a t these important records of modern activism will be left out of the historical narrative. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:41
Módosítás dátuma 2021. 08. 09. 8:42:41

Címkék:

  • Web archives
  • Archival theory and principles
  • Copyright and intellectual property
  • Digital preservation
  • Ethics
  • Privacy and confidentiality
  • Social media archives

Evaluating sliding and sticky target policies by measuring temporal drift in acyclic walks through a web archive

Típus Folyóiratcikk
Szerző Scott G. Ainsworth
Szerző Michael L. Nelson
URL https://search.proquest.com/docview/1681852984?accountid=27464
Kötet 16
Szám 2
Oldalszám 129-144
Kiadvány International Journal on Digital Libraries
ISSN 14325012
Dátum 2015-06-05
Egyéb Number: 2
PMID: 1681852984
Publisher: Springer Science & Business Media
Place: Heidelberg
DOI http://dx.doi.org/10.1007/s00799-014-0120-4
Nyelv English
Kivonat When viewing an archived page using the archive’s user interface (UI), the user selects a datetime to view from a list. The archived web page, if available, is then displayed. From this display, the web archive UI attempts to simulate the web browsing experience by smoothly transitioning between archived pages. During this process, the target datetime changes with each link followed, potentially drifting away from the datetime originally selected. For sparsely archived resources, this almost transparent drift can be many years in just a few clicks. We conducted 200,000 acyclic walks of archived pages, following up to 50 links per walk, comparing the results of two target datetime policies. The Sliding Target policy allows the target datetime to change as it does in archive UIs such as the Internet Archive’s Wayback Machine. The Sticky Target policy, represented by the Memento API, keeps the target datetime the same throughout the walk. We found that the Sliding Target policy drift increases with the number of walk steps, number of domains visited, and choice (number of links available). However, the Sticky Target policy controls temporal drift, holding it to <30 days on average regardless of walk length or number of domains visited. The Sticky Target policy shows some increase as choice increases, but this may be caused by other factors. We conclude that based on walk length, the Sticky Target policy generally produces at least 30 days less drift than the Sliding Target policy.
Hozzáadás dátuma 2021. 08. 09. 8:42:28
Módosítás dátuma 2021. 08. 09. 8:42:28

Címkék:

  • Digital libraries
  • Library And Information Sciences–Computer Applica
  • Digital archives
  • Temporal logic

Evaluating Sliding and Sticky Target Policies by Measuring Temporal Drift in Acyclic Walks Through a Web Archive

Típus Dolgozat
Szerző Scott G Ainsworth
Szerző Michael L Nelson
URL http://doi.acm.org/10.1145/2467696.2467718
Hely New York, NY, USA
Kiadó ACM
Oldalszám 39-48
ISBN 978-1-4503-2077-1
Dátum 2013
Egyéb Series Title: JCDL '13
Citation Key: Ainsworth:2013:ESS:2467696.2467718
DOI 10.1145/2467696.2467718
Kivonat When a user views an archived page using the archive's user interface (UI), the user selects a datetime to view from a list. The archived web page, if available, is then displayed. From this display, the web archive UI attempts to simulate the web browsing experience by smoothly transitioning between archived pages. During this process, the target datetime changes with each link followed; drifting away from the datetime originally selected. When browsing sparsely-archived pages, this nearly-silent drift can be many years in just a few clicks. We conducted 200,000 acyclic walks of archived pages, following up to 50 links per walk, comparing the results of two target datetime policies. The Sliding Target policy allows the target datetime to change as it does in archive UIs such as the Internet Archive's Wayback Machine. The Sticky Target policy, represented by the Memento API, keeps the target datetime the same throughout the walk. We found that the Sliding Target policy drift increases with the number of walk steps, number of domains visited, and choice (number of links available). However, the Sticky Target policy controls temporal drift, holding it to less than 30 days on average regardless of walk length or number of domains visited. The Sticky Target policy shows some increase as choice increases, but this may be caused by other factors. We conclude that based on walk length, the Sticky Target policy generally produces at least 30 days less drift than the Sliding Target policy.
Kiadvány címe Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:18
Módosítás dátuma 2021. 08. 09. 8:43:18

Címkék:

  • web archiving
  • digital preservation
  • http
  • web architecture
  • resource versioning
  • temporal applications

EventSearch: A System for Event Discovery and Retrieval on Multi-type Historical Data

Típus Dolgozat
Szerző Dongdong Shan
Szerző Wayne Xin Zhao
Szerző Rishan Chen
Szerző Baihan Shu
Szerző Ziqi Wang
Szerző Junjie Yao
Szerző Hongfei Yan
Szerző Xiaoming Li
URL http://doi.acm.org/10.1145/2339530.2339781
Hely New York, NY, USA
Kiadó ACM
Oldalszám 1564-1567
ISBN 978-1-4503-1462-6
Dátum 2012
Egyéb Series Title: KDD '12
Citation Key: Shan:2012:ESE:2339530.2339781
DOI 10.1145/2339530.2339781
Kivonat We present EventSearch, a system for event extraction and retrieval on four types of news-related historical data, i.e., Web news articles, newspapers, TV news program, and micro-blog short messages. The system incorporates over 11 million web pages extracted from "Web InfoMall", the Chinese Web Archive since 2001. The newspaper and TV news video clips also span from 2001 to 2011. The system, upon a user query, returns a list of event snippets from multiple data sources. A novel burst model is used to discover events from time-stamped texts. In addition to offline event extraction, our system also provides online event extraction to further meet the user needs. EventSearch provides meaningful analytics that synthesize an accurate description of events. Users interact with the system by ranking the identified events using different criteria (scale, recency and relevance) and submitting their own information needs in different input fields.
Kiadvány címe Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Hozzáadás dátuma 2021. 08. 09. 8:43:10
Módosítás dátuma 2021. 08. 09. 8:43:10

Címkék:

  • event detection
  • event search

EverLast: A Distributed Architecture for Preserving the Web

Típus Dolgozat
Szerző Avishek Anand
Szerző Srikanta Bedathur
Szerző Klaus Berberich
Szerző Ralf Schenkel
Szerző Christos Tryfonopoulos
URL http://doi.acm.org/10.1145/1555400.1555455
Hely New York, NY, USA
Kiadó ACM
Oldalszám 331-340
ISBN 978-1-60558-322-8
Dátum 2009
Egyéb Series Title: JCDL '09
Citation Key: Anand:2009:EDA:1555400.1555455
DOI 10.1145/1555400.1555455
Kivonat The World Wide Web has become a key source of knowledge pertaining to almost every walk of life. Unfortunately, much of data on the Web is highly ephemeral in nature, with more than 50-80% of content estimated to be changing within a short time. Continuing the pioneering efforts of many national (digital) libraries, organizations such as the International Internet Preservation Consortium (IIPC), the Internet Archive (IA) and the European Archive (EA) have been tirelessly working towards preserving the ever changing Web. However, while these web archiving efforts have paid significant attention towards long term preservation of Web data, they have paid little attention to developing an global-scale infrastructure for collecting, archiving, and performing historical analyzes on the collected data. Based on insights from our recent work on building text analytics for Web Archives, we propose EverLast, a scalable distributed framework for next generation Web archival and temporal text analytics over the archive. Our system is built on a loosely-coupled distributed architecture that can be deployed over large-scale peer-to-peer networks. In this way, we allow the integration of many archival efforts taken mainly at a national level by national digital libraries. Key features of EverLast include support of time-based text search & analysis and the use of human-assisted archive gathering. In this paper, we outline the overall architecture of EverLast, and present some promising preliminary results.
Kiadvány címe Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:35
Módosítás dátuma 2021. 08. 09. 8:43:35

Címkék:

  • web archives
  • crawling
  • indexing
  • time-travel search

Everything on the Internet can be saved: Archive Team and the death/resurrection of Tumblr NSFW

Típus Folyóiratcikk
Szerző Jessica Ogden
URL https://research-information.bris.ac.uk/en/publications/everything-on-the-internet-can-be-saved-archive-team-and-the-deat
Kiadvány Internet Histories
ISSN 2470-1483
Dátum 2020/10/03
Egyéb Publisher: Taylor & Francis Group
Hozzáférés 2021. 07. 15. 10:17:54
Könyvtár Katalógus research-information.bris.ac.uk
Nyelv English
Rövid cím ‘Everything on the Internet can be saved’
Hozzáadás dátuma 2021. 08. 09. 8:44:06
Módosítás dátuma 2021. 08. 09. 8:44:06

Evolution of legal deposit in New Zealand

Típus Folyóiratcikk
Szerző Jhonny Antonio Pabón Cadavid
URL https://search.proquest.com/docview/1979964191?accountid=27464
Kötet 43
Szám 4
Oldalszám 379-390
Kiadvány IFLA Journal
ISSN 0340-0352
Dátum 2017-12
Egyéb Number: 4
Publisher: Sage Publications Ltd.
Place: Universidad Externado de Colombia, Colombia; Victoria University of Wellington, New Zealand ; Universidad Externado de Colombia, Colombia; Victoria University of Wellington, New Zealand
DOI http://dx.doi.org/10.1177/0340035217713763
Nyelv English
Kivonat The evolution of legal deposit shows changes and challenges in collecting, access to and use of documentary heritage. Legal deposit emerged in New Zealand at the beginning of the 20th century with the aim of preserving print publications mainly for the use of a privileged part of society. In the 21st century legal deposit has evolved to include the safeguarding of electronic resources and providing access to the documentary heritage for all New Zealanders. The National Library of New Zealand has acquired new functions for a proper stewardship of digital heritage. E-deposit and web harvesting are two new mechanisms for collecting New Zealand publications. The article proposes that legal deposit through human rights and multiculturalism should involve different communities of heritage in web curation.
Hozzáadás dátuma 2021. 08. 09. 8:42:19
Módosítás dátuma 2021. 08. 09. 8:42:19

Címkék:

  • web archiving
  • Library And Information Sciences
  • Cultural pluralism
  • Digital heritage
  • Human rights
  • Legal deposit
  • Multiculturalism & pluralism
  • national library
  • New Zealand
  • Publications
  • Twenty first century

Experimenting with computational methods for large-scale studies of tracking technologies in web archives

Típus Folyóiratcikk
Szerző Janne Nielsen
URL https://www.tandfonline.com/doi/full/10.1080/24701475.2019.1671074
Kötet 3
Szám 3-4
Oldalszám 293-315
Kiadvány Internet Histories
ISSN 2470-1475
Dátum 2019-10-02
Egyéb Number: 3-4
DOI 10.1080/24701475.2019.1671074
Hozzáadás dátuma 2021. 08. 09. 8:43:41
Módosítás dátuma 2021. 08. 09. 8:43:41

Címkék:

  • big data
  • computational methods
  • historiography
  • Web history
  • web tracking

Exploiting the Social and Semantic Web for Guided Web Archiving

Típus Könyvfejezet
Szerző Thomas Risse
Szerző Stefan Dietze
Szerző Wim Peters
Szerző Katerina Doka
Szerző Yannis Stavrakas
Szerző Pierre Senellart
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Hely Germany, Europe
Kiadó Heidelberg : Springer Verlag
Oldalszám 426-432
Dátum 2012
Egyéb DOI: 10.1007/978-3-642-33290-6_47
Kivonat The constantly growing amount of Web content and the success of the Social Web lead to increasing needs for Web archiving. These needs go beyond the pure preservation of Web pages. Web archives are turning into "community memories" that aim at building a better understanding of the public view on, e.g., celebrities, court decisions, and other events. In this paper we present the ARCOMEM architecture that uses semantic information such as entities, topics, and events complemented with information from the social Web to guide a novel Web crawler. The resulting archives are automatically enriched with semantic meta-information to ease the access and allow retrieval based on conditions that involve high-level concepts. The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-33290-6_47. ; German Federal Ministry for the Environment, Nature Conservation and Nuclear Safety/0325296 ; Solland Solar Cells BV ; SolarWorld Innovations GmbH ; SCHOTT Solar AG ; RENA GmbH ; SINGULUS TECHNOLOGIES AG
Hozzáadás dátuma 2021. 08. 09. 8:42:48
Módosítás dátuma 2021. 08. 09. 8:42:48

Címkék:

  • Web archives
  • Web Archiving
  • Digital libraries
  • Artificial intelligence
  • Court decisions
  • ddc:004
  • Meta information
  • Semantic information
  • Social Web
  • Text Analysis
  • Web content
  • Web Crawler

Exploring Online Diasporas: London’s French and Latin American Communities in the UK Web Archive

Típus Könyvfejezet
Szerző Saskia Huc-Hepher
Szerző Naomi Wells
Szerkesztő Daniel Gomes
Szerkesztő Elena Demidova
Szerkesztő Jane Winters
Szerkesztő Thomas Risse
URL https://doi.org/10.1007/978-3-030-63291-5_15
Hely Cham
Kiadó Springer International Publishing
Oldalszám 189-201
ISBN 978-3-030-63291-5
Dátum 2021
Egyéb DOI: 10.1007/978-3-030-63291-5_15
Hozzáférés 2021. 07. 15. 9:52:26
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat The aim of the UK Web Archive to collect and preserve the entire UK web domain ensures that it is able to reflect the diversity of voices and communities present on the open Web, including migrant communities who sustain a presence across digital and physical environments. At the same time, patterns of wider social and political exclusion, as well as the use of languages other than English, mean these communities’ web presence is often overlooked in more generic and Anglophone web archiving and (re)searching practices.
Könyv címe The Past Web: Exploring Web Archives
Rövid cím Exploring Online Diasporas
Hozzáadás dátuma 2021. 08. 09. 8:43:59
Módosítás dátuma 2021. 08. 09. 8:43:59

Exploring special web archives collections related to COVID-19: The case of the National Széchényi Library in Hungary

Típus Interjú
Interjú alanya Márton Németh
Interjú készítője Friedel Geeraert
URL https://cc.au.dk/fileadmin/user_upload/WARCnet/Geeraert_et_al_COVID-19_Hungary.pdf
Dátum 2020
Egyéb Interviewees: _:n7712
Interviewers: _:n7714
Hozzáférés 2021. 08. 04. 2:00:00
Médium WARCnet Papers
Nyelv English
Hozzáadás dátuma 2021. 08. 09. 8:44:41
Módosítás dátuma 2021. 08. 09. 8:44:41

Exploring the 20-year evolution of a research community: web-archives as essential sources for historical research

Típus Folyóiratcikk
Szerző Niels Brügger
Szerző Valerie Schafer
Szerző Friedel Geeraert
Szerző Nadège Isbergue
Szerző Sally Chambers
URL https://orbilu.uni.lu/handle/10993/43903
Kötet 2
Kiadvány Cahiers de la documentation
ISSN 0007-9804
Dátum 2020-07
Egyéb Publisher: Belgium
Hozzáférés 2021. 07. 15. 10:39:51
Könyvtár Katalógus orbilu.uni.lu
Nyelv en
Rövid cím Exploring the 20-year evolution of a research community
Hozzáadás dátuma 2021. 08. 09. 8:44:11
Módosítás dátuma 2021. 08. 09. 8:44:11

Exploring the Past of the Web: Alexandria &#38; Archive-it Hackathon

Típus Dolgozat
Szerző Avishek Anand
Szerző Jefferson Bailey
URL http://doi.acm.org/10.1145/2908131.2908212
Hely New York, NY, USA
Kiadó ACM
Oldalszám 14
ISBN 978-1-4503-4208-7
Dátum 2016
Egyéb Series Title: WebSci '16
Citation Key: Anand:2016:EPW:2908131.2908212
DOI 10.1145/2908131.2908212
Kivonat The Web has pervaded all walks of life and has become an important corpus for studying the humanities, social sciences, and for use by computer scientists and other disciplines. Web archives collect, preserve, and provide ongoing access to ephemeral Web pages and hence encode traces of human thought, activity, and history. This makes them a valuable resource for analysis and study. However, there have been only few concerted efforts to bring together tools, platforms, storage, processing frameworks, and existing collections for mining and analysing Web archives.
Kiadvány címe Proceedings of the 8th ACM Conference on Web Science
Hozzáadás dátuma 2021. 08. 09. 8:43:08
Módosítás dátuma 2021. 08. 09. 8:43:08

Exploring Web Archives Through Temporal Anchor Texts

Típus Dolgozat
Szerző Helge Holzmann
Szerző Wolfgang Nejdl
Szerző Avishek Anand
URL http://doi.acm.org/10.1145/3091478.3091500
Hely New York, NY, USA
Kiadó ACM
Oldalszám 289-298
ISBN 978-1-4503-4896-6
Dátum 2017
Egyéb Series Title: WebSci '17
Citation Key: Holzmann:2017:EWA:3091478.3091500
DOI 10.1145/3091478.3091500
Kivonat Web archives have been instrumental in digital preservation of the Web and provide great opportunity for the study of the societal past and evolution. These Web archives are massive collections, typically in the order of terabytes and petabytes. Due to this, search and exploration of archives has been limited as full-text indexing is both resource and computationally expensive. We identify that for typical access methods to archives, which are navigational and temporal in nature, we do not always require indexing full-text. Instead, meaningful text surrogates like anchor texts already go a long way in providing meaningful solutions and can act as reasonable entry points to exploring Web archives. In this paper, we present a new approach to searching Web archives based on temporal link graphs and corresponding anchor texts. Departing from traditional informational intents, we show how temporal anchor texts can be effective in answering queries beyond purely navigational intents, like finding the most central webpages of an entity in a given time period. We propose indexing methods and a temporal retrieval model based on anchor texts. Further, we discuss several interesting search results as well as one experiment in which we demonstrate how such results can be integrated in a data processing workflow to scale up to thousands of pages. In this analysis we were able to replicate results reported by an offline study, showing that restaurant prices indeed increased in Germany when the Euro was introduced as Europe's currency.
Kiadvány címe Proceedings of the 2017 ACM on Web Science Conference
Hozzáadás dátuma 2021. 08. 09. 8:43:14
Módosítás dátuma 2021. 08. 09. 8:43:14

Címkék:

  • web archives
  • big data analysis
  • temporal information retrieval

Extracting Evolution of Web Communities from a Series of Web Archives

Típus Dolgozat
Szerző Masashi Toyoda
Szerző Masaru Kitsuregawa
URL http://doi.acm.org/10.1145/900051.900059
Hely New York, NY, USA
Kiadó ACM
Oldalszám 28-37
ISBN 1-58113-704-4
Dátum 2003
Egyéb Series Title: HYPERTEXT '03
Citation Key: Toyoda:2003:EEW:900051.900059
DOI 10.1145/900051.900059
Kivonat Recent advances in storage technology make it possible to store a series of large Web archives. It is now an exciting challenge for us to observe evolution of the Web. In this paper, we propose a method for observing evolution of web communities. A web community is a set of web pages created by individuals or associations with a common interest on a topic. So far, various link analysis techniques have been developed to extract web communities. We analyze evolution of web communities by comparing four Japanese web archives crawled from 1999 to 2002. Statistics of these archives and community evolution are examined, and the global behavior of evolution is described. Several metrics are introduced to measure the degree of web community evolution, such as growth rate, novelty, and stability. We developed a system for extracting detailed evolution of communities using these metrics. It allows us to understand when and how communities emerged and evolved. Some evolution examples are shown using our system.
Kiadvány címe Proceedings of the Fourteenth ACM Conference on Hypertext and Hypermedia
Hozzáadás dátuma 2021. 08. 09. 8:43:09
Módosítás dátuma 2021. 08. 09. 8:43:09

Címkék:

  • web
  • link analysis
  • evolution
  • web community

Extracting Online Publications Embedded in Websites: NDL Initiatives and Challenges

Típus Előadás
Előadó INOIE Nobuaki
Előadó SHIBATA Masaki
Előadó KUDO Tetsuro
URL https://origin-www.ifla.org/files/assets/information-technology/Webinars/ifla_professional_units_virtual_events_-_inoie-en.pdf
Hely Dublin
Dátum 2020
Egyéb Presenters: _:n6596
Hozzáférés 2021. 08. 06. 2:00:00
Találkozó neve IFLA WLIC 2020
Nyelv en
Kivonat The National Diet Library (NDL) has been operating the Web ARchiving Project (WARP) since 2002 and steadily archiving Japanese websites. However, it is often difficult for users to find e-books, ezines and other online publications embedded in websites, because they are stored as a part of websites and do not have sufficient metadata.
Hozzáadás dátuma 2021. 08. 09. 8:44:09
Módosítás dátuma 2021. 08. 09. 8:44:09

Factors Affecting Website Reconstruction from the Web Infrastructure

Típus Dolgozat
Szerző Frank McCown
Szerző Norou Diawara
Szerző Michael L Nelson
URL http://doi.acm.org/10.1145/1255175.1255182
Hely New York, NY, USA
Kiadó ACM
Oldalszám 39-48
ISBN 978-1-59593-644-8
Dátum 2007
Egyéb Series Title: JCDL '07
Citation Key: McCown:2007:FAW:1255175.1255182
DOI 10.1145/1255175.1255182
Kivonat When a website is suddenly lost without a backup, it maybe reconstituted by probing web archives and search engine caches for missing content. In this paper we describe an experiment where we crawled and reconstructed 300 randomly selected websites on a weekly basis for 14 weeks. The reconstructions were performed using our web-repository crawler named Warrick which recovers missing resources from the Web Infrastructure (WI), the collective preservation effort of web archives and search engine caches. We examine several characteristics of the websites over time including birth rate, decay and age of resources. We evaluate the reconstructions when compared to the crawled sites and develop a statistical model for predicting reconstruction success from the WI. On average, we were able to recover 61% of each website's resources. We found that Google's PageRank, number of hops and resource age were the three most significant factors in determining if a resource would be recovered from the WI.
Kiadvány címe Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:20
Módosítás dátuma 2021. 08. 09. 8:43:20

Címkék:

  • web archiving
  • digital preservation
  • search engine caches

Fair Use, Notice Failure, and the Limits of Copyright as Property

Típus Folyóiratcikk
Szerző Joseph P. Liu
URL http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=118226656&lang=hu&site=ehost-live
Kötet 96
Szám 3
Oldalszám 833-856
Kiadvány Boston University Law Review
ISSN 00068047
Dátum May 2016
Egyéb Number: 3
Publisher: Boston University, School of Law
Folyóirat rövid neve Boston University Law Review
Hozzáférés 2020. 08. 17. 8:58:54
Könyvtár Katalógus EBSCOhost
Kivonat If we start with the assumption that copyright law creates a system of property rights, to what extent does this system give adequate notice to third parties regarding the scope of such rights, particularly given the prominent role played by the fair use doctrine? This essay argues that, although the fair use doctrine may provide adequate notice to sophisticated third parties, it fails to provide adequate notice to less sophisticated parties. Specifically, the fair use doctrine imposes nearly insuperable informational burdens upon the general public regarding the scope of the property entitlement and the corresponding duty to avoid infringement. Moreover, these burdens have only increased with changes in technology that enable more, and more varied, uses of copyrighted works. The traditional response to uncertainty in fair use has been to suggest ways of curing the notice failure by providing clearer rules about what is and is not permitted. This essay suggests, however, that these efforts to reinforce the property framework feel increasingly strained and fail to reflect how copyright law is actually experienced by the general public. Indeed, the extent of the notice failure is such that it may be time to stop treating copyright like a property right, at least for certain classes of users. The essay ends by suggesting a number of alternative frameworks that would seek to regulate public behavior regarding copyrighted works without imposing the unrealistic informational burdens required by a system of property rights.
Hozzáadás dátuma 2021. 08. 09. 8:43:43
Módosítás dátuma 2021. 08. 09. 8:43:43

Címkék:

  • COPYRIGHT infringement
  • COPYRIGHT notices
  • COPYRIGHT of digital media
  • FAIR use (Copyright)
  • PROPERTY rights

Featured Web Resource: Theological Commons

Típus Folyóiratcikk
Szerző Gregory P Murray
URL https://search.proquest.com/docview/1842842888?accountid=27464
Kötet 9
Szám 2
Oldalszám 1
Kiadvány Theological Librarianship
ISSN 1937-8904
Dátum 2016-10
Egyéb Number: 2
Publisher: American Theological Library Association
Place: Chicago
Nyelv English
Kivonat In late 2010, Dr Iain Torrance, at that time the President of Princeton Theological Seminary, asked a small subset of library staff to consider how to improve discoverability and access to the thousands of volumes on theology and religion that Princeton Seminary and other institutions had digitized through the Internet Archive, to facilitate research by students, scholars, and pastors both locally and globally. However, because the goal was to provide access to relevant resources, not to showcase Princeton's digital content, the digital library team subsequently took a detailed list of Library of Congress subject headings provided by Don Vorp, at that time Collection Development Librarian at Princeton Seminary, and performed searches in the Internet Archive system for digitized books with those subjects, irrespective of library of origin. Those items were then harvested in the same manner. This procedure soon amassed tens of thousands of digital texts, and in March 2012, the Theological Commons was publicly released as a free, web-accessible digital library.
Hozzáadás dátuma 2021. 08. 09. 8:41:42
Módosítás dátuma 2021. 08. 09. 8:41:42

Címkék:

  • Web archiving
  • Collection development
  • Digital libraries
  • Digitization
  • Archives & records
  • Internet resources
  • Access to materials
  • Princeton New Jersey
  • Religions And Theology
  • Theological schools

Felieton "archiwalny" – ponownie po pięciu latach

Típus Folyóiratcikk
Szerző Lidia Derfert-Wolf
Szerző Marcin Wilkowski
URL https://search.proquest.com/docview/1951541363?accountid=27464
Szám 172
Oldalszám 1
Kiadvány Elektroniczny Biuletyn Informacyjny Bibliotekarzy : EBIB
Dátum 2017
Egyéb Number: 172
Publisher: Stowarzyszenie Bibliotekarzy Polskich
Place: Biblioteka Główna Uniwersytetu Technologiczno-Przyrodniczego w Bydgoszczy Stowarzyszenie EBIB ; Laboratorium Cyfrowe Humanistyki Uniwersytet Warszawski ; Biblioteka Główna Uniwersytetu Technologiczno-Przyrodniczego w Bydgoszczy Stowarzyszenie EBIB
Nyelv Polish
Hozzáadás dátuma 2021. 08. 09. 8:42:32
Módosítás dátuma 2021. 08. 09. 8:42:32

Címkék:

  • Library And Information Sciences

Finding Pages on the Unarchived Web

Típus Dolgozat
Szerző Hugo C Huurdeman
Szerző Anat Ben-David
Szerző Jaap Kamps
Szerző Thaer Samar
Szerző Arjen P de Vries
URL http://dl.acm.org/citation.cfm?id=2740769.2740827
Hely Piscataway, NJ, USA
Kiadó IEEE Press
Oldalszám 331-340
ISBN 978-1-4799-5569-5
Dátum 2014
Egyéb Series Title: JCDL '14
Citation Key: Huurdeman:2014:FPU:2740769.2740827
Kivonat Web archives preserve the fast changing Web, yet are highly incomplete due to crawling restrictions, crawling depth and frequency, or restrictive selection policies—most of the Web is unarchived and therefore lost to posterity. In this paper, we propose an approach to recover significant parts of the unarchived Web, by reconstructing descriptions of these pages based on links and anchors in the set of crawled pages, and experiment with this approach on the Dutch Web archive. Our main findings are threefold. First, the crawled Web contains evidence of a remarkable number of unarchived pages and websites, potentially dramatically increasing the coverage of the Web archive. Second, the link and anchor descriptions have a highly skewed distribution: popular pages such as home pages have more terms, but the richness tapers off quickly. Third, the succinct representation is generally rich enough to uniquely identify pages on the unarchived Web: in a known-item search setting we can retrieve these pages within the first ranks on average.
Kiadvány címe Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:35
Módosítás dátuma 2021. 08. 09. 8:43:35

Címkék:

  • web archiving
  • web archives
  • information retrieval
  • anchor text
  • link evidence
  • web crawlers

Finding the unfound: Recovery of missing URLs through Internet Archive

Típus Folyóiratcikk
Szerző Vinay D Kumar
Szerző B T Sampath Kumar
URL https://search.proquest.com/docview/2073135310?accountid=27464
Kötet 64
Szám 3
Oldalszám 165
Kiadvány Annals of Library and Information Studies
ISSN 0972-5423
Dátum 2017-09
Egyéb Number: 3
Publisher: National Institute of Science Communication & Information Resources
Place: Lecturer, Department of Library and Information Science, Kuvempu University, Jnanasahyadri, Shankaraghatta, Karnataka, India ; Professor, Department of Library and Information Science, Tumkur University, Tumakuru, Karnatka, India ; Lecturer, Department of
Nyelv English
Kivonat The study investigated the accessibility and permanency of citations containing URLs in the articles published in DESIDOC Journal of Library and Information Technology journal during 2006-2015. A total of 2133 URL citations were identified out of which 823 were found to be incorrect or missing. HTTP-404 was the most common error message associated with the missing URLs. The study also tried to recover the incorrect or URL citations using Internet Archive and recovered a total of 484 (58.81%) missing URL citations.
Hozzáadás dátuma 2021. 08. 09. 8:42:24
Módosítás dátuma 2021. 08. 09. 8:42:24

Címkék:

  • Web archiving
  • Library And Information Sciences
  • Archives & records
  • URLs

First crawling of the Slovenian National web domain *.si: pitfalls, obstacles and challenges

Típus Dolgozat
Szerző Matjaž KRAGELJ
Szerző Mitja ( KOVAČIČ
Hely Cape Town
Kiadó IFLA — International Federation of Library Associations and Institutions
Dátum 2015
Kivonat The National and University Library (NUK) has been archiving the web for almost fifteen years. During the last six years, we have been trying to act on different levels of harvesting. For most of the time, we have dealt with harvesting of selected web sites that might be significant for future generations. The harvesting process runs smoothly, with the exception of some technical difficulties resulting from the use of scripted languages (for instance Ajax, Flash, Java script, asynchronous transmissions, real time streaming protocols, etc.). The number of archived web pages keeps growing very fast. We are also very successful in harvesting social media web sites with tools developed in NUK. Being aware that the amount of the web pages cannot be compared with the harvested one – it is much more extensive – we decided to start the Slovenian domain (*.si) harvesting. The first domain harvesting was successful; however, we realized that much deeper and broader levels should be harvested by using heuristic methods. Our experiences showed that most informative web contents are hidden beneath the *.si domain's data provided by ARNES (Academic Research Network of Slovenia), therefore, the contents are not accessible. The paper presents the results of the first harvesting iteration of the Slovenian web. Further, on a sample of the first hundred domains, the results of the first and second harvesting iteration will be compared and analysed. At the end, the relevance of data acquired in the harvested web pages as a digital library complementary data source will be presented.
Kiadvány címe Preservation and Conservation with Information Technology. IFLA 2015 South Africa
Hozzáadás dátuma 2021. 08. 09. 8:41:48
Módosítás dátuma 2021. 08. 09. 8:41:48

Címkék:

  • web archiving
  • digital library
  • harvesting
  • national domain
  • social networks harvesting

First steps in archiving the mobile web

Típus Dolgozat
Szerző Richard Schneider
Szerző Frank McCown
URL http://dl.acm.org/citation.cfm?doid=2467696.2467735
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 53-56
ISBN 978-1-4503-2077-1
Dátum 2013
DOI 10.1145/2467696.2467735
Kivonat Smartphones and tablets are increas ingly used to access the Web, and many websites now provide alternative sites tailored specifically for these mobile devices. Web archivists are in need of tools to aid in archiving this equally ephemeral Mobile Web. We present Findmobile, a tool for automating the discovery of mobile websites. We tested our t ool in an experiment examining 10K popular websites and found that the most frequently used technique used by popular websites to direct mobile users to mobile sites was by automated client and server-side redirection. We found that nearly half of mob ile web pages differ dramatically from their stationary web counter parts and that the most popular websites are those most likely to have mobile-specific pages.
Kiadvány címe Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries – JCDL '13
Hozzáadás dátuma 2021. 08. 09. 8:41:50
Módosítás dátuma 2021. 08. 09. 8:41:50

Címkék:

  • web archiving
  • web crawling
  • mobile web

FluxCapacitor: Efficient Time-travel Text Search

Típus Dolgozat
Szerző Klaus Berberich
Szerző Srikanta Bedathur
Szerző Thomas Neumann
Szerző Gerhard Weikum
URL http://dl.acm.org/citation.cfm?id=1325851.1326029
Kiadó VLDB Endowment
Oldalszám 1414-1417
ISBN 978-1-59593-649-3
Dátum 2007
Egyéb Series Title: VLDB '07
Citation Key: Berberich:2007:FET:1325851.1326029
Kivonat An increasing number of temporally versioned text collections is available today with Web archives being a prime example. Search on such collections, however, is often not satisfactory and ignores their temporal dimension completely. Time-travel text search solves this problem by evaluating a keyword query on the state of the text collection as of a user-specified time point. This work demonstrates our approach to efficient time-travel text search and its implementation in the FLUXCAPACITOR prototype.
Kiadvány címe Proceedings of the 33rd International Conference on Very Large Data Bases
Hozzáadás dátuma 2021. 08. 09. 8:43:06
Módosítás dátuma 2021. 08. 09. 8:43:06

Focused Crawl of Web Archives to Build Event Collections

Típus Dolgozat
Szerző Martin Klein
Szerző Lyudmila Balakireva
Szerző Herbert de Sompel
URL http://doi.acm.org/10.1145/3201064.3201085
Hely New York, NY, USA
Kiadó ACM
Oldalszám 333-342
ISBN 978-1-4503-5563-6
Dátum 2018
Egyéb Series Title: WebSci '18
Citation Key: Klein:2018:FCW:3201064.3201085
DOI 10.1145/3201064.3201085
Kivonat Event collections are frequently built by crawling the live web on the basis of seed URIs nominated by human experts. Focused web crawling is a technique where the crawler is guided by reference content pertaining to the event. Given the dynamic nature of the web and the pace with which topics evolve, the timing of the crawl is a concern for both approaches. We investigate the feasibility of performing focused crawls on the archived web. By utilizing the Memento infrastructure, we obtain resources from 22 web archives that contribute to building event collections. We create collections on four events and compare the relevance of their resources to collections built from crawling the live web as well as from a manually curated collection. Our results show that focused crawling on the archived web can be done and indeed results in highly relevant collections, especially for events that happened further in the past
Kiadvány címe Proceedings of the 10th ACM Conference on Web Science
Hozzáadás dátuma 2021. 08. 09. 8:43:20
Módosítás dátuma 2021. 08. 09. 8:43:20

Címkék:

  • web archiving
  • collection building
  • focused crawling
  • memento

Focused crawler for events.

Típus Folyóiratcikk
Szerző Mohamed M G Farag
Szerző Sunshin Lee
Szerző Edward A Fox
URL https://search.proquest.com/docview/2002183191?accountid=27464
Kötet 19
Szám 1
Oldalszám 3-19
Kiadvány International Journal on Digital Libraries
ISSN 14325012
Dátum 2018-03
Egyéb Number: 1
Publisher: Springer Science & Business Media
Place: Virginia Tech, Blacksburg, VA, USA ; Virginia Tech, Blacksburg, VA, USA
DOI http://dx.doi.org/10.1007/s00799-016-0207-1
Nyelv English
Kivonat There is need for an Integrated Event Focused Crawling system to collect Web data about key events. When a disaster or other significant event occurs, many users try to locate the most up-to-date information about that event. Yet, there is little systematic collecting and archiving anywhere of event information. We propose intelligent event focused crawling for automatic event tracking and archiving, ultimately leading to effective access. We developed an event model that can capture key event information, and incorporated that model into a focused crawling algorithm. For the focused crawler to leverage the event model in predicting webpage relevance, we developed a function that measures the similarity between two event representations. We then conducted two series of experiments to evaluate our system about two recent events: California shooting and Brussels attack. The first experiment series evaluated the effectiveness of our proposed event model representation when assessing the relevance of webpages. Our event model-based representation outperformed the baseline method (topic-only); it showed better results in precision, recall, and F1-score with an improvement of 20% in F1-score. The second experiment series evaluated the effectiveness of the event model-based focused crawler for collecting relevant webpages from the WWW. Our event model-based focused crawler outperformed the state-of-the-art baseline focused crawler (best-first); it showed better results in harvest ratio with an average improvement of 40%. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:55
Módosítás dátuma 2021. 08. 09. 8:42:55

Címkék:

  • Web archiving
  • DIGITAL libraries
  • WORLD Wide Web
  • Digital libraries
  • WEB archives
  • Archiving
  • AUTOMATIC tracking
  • Data analysis
  • Event archiving
  • Event modeling
  • Focused crawling
  • Library And Information Sciences–Computer Applica
  • Representations
  • Shooting
  • WEBSITES
  • World Wide Web

For Old Times' Sake: Technostalgia's Greatest Hits

Típus Folyóiratcikk
Szerző Carly Lamphere
URL https://search.proquest.com/docview/1942462381?accountid=27464
Kötet 41
Szám 5
Oldalszám 27-29
Kiadvány Online Searcher
ISSN 23249684
Dátum 2017
Egyéb Number: 5
Publisher: Information Today, Inc.
Place: Crowell Public Library ; Crowell Public Library
Nyelv English
Kivonat Nostalgia is a powerful feeling/emotion. In my case, chasing childhood nostalgia caused me to lug around an almost obsolete format for years before reluctantly parting with it- but only for practical reasons. Naturally, nostalgia's strong emotional pull makes it a driving force in consumption and marketing today. Nostalgia marketing is everywhere, from foods and advertising to technology. When it comes to technology, the coined word "technostalgia" describes a "fond reminiscence of, or longing for, outdated technology" (en. wiktionary.org/wiki/technostalgia).
Hozzáadás dátuma 2021. 08. 09. 8:42:26
Módosítás dátuma 2021. 08. 09. 8:42:26

Címkék:

  • Web archiving
  • Digital preservation
  • Computers–Internet
  • Computer & video games
  • 14:COMMUNICATIONS AND INFORMATION TECHNOLOGY
  • Consumers
  • Marketing
  • Nostalgia
  • Photographs
  • Technological obsolescence
  • Trends

Forget me net, not.

Típus Folyóiratcikk
Szerző Helen HOCKX-Yu
Szerző Brewster KAHLE
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 163
Szám 2
Oldalszám 1-6
Kiadvány Newsweek Global
ISSN 00289604
Dátum 2014-07-11
Egyéb Number: 2
Publisher: Newsweek LLC
Kivonat The article discusses web archiving, focussing on a private project, Internet Archive, founded by Brewster Kahle and the project of the British Library to capture and preserve every web-page in the British domain, .co.uk, led by Helen Hockx-Yu. Topics include estimates of the amount of digital data created each year, estimates of the amount of data lost or altered in a year and the evolution of the role of libraries as they branch out to web archiving.
Hozzáadás dátuma 2021. 08. 09. 8:42:57
Módosítás dátuma 2021. 08. 09. 8:45:44

Címkék:

  • WEB archiving
  • WEB archives
  • INTERNET Archive (Firm)
  • BRITISH Library
  • Brewster
  • KAHLE
  • Helen
  • HOCKX-Yu

Formátová analyza sklízenych dat vrámci projektu Webarchiv NK ČR. (Czech)

Típus Folyóiratcikk
Szerző Jaroslav Kvasnica
Szerző Rudolf Kreibich
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Szám 2
Oldalszám 1
Kiadvány File Format Recognition of Data Harvested by Web Archiving Project of National Library of the Czech Republic. (English)
ISSN 18042406
Dátum 2013-09
Egyéb Number: 2
Kivonat National Library of the Czech Republic just begun to ingest harvested data from web archiving project into Long-term Preservation System. This article is output of Institutional Science and Research project aiming to implement retrospective file format recognition framework for harvested data and map tools related to file format recognition. Precise knowledge of archived data is cornerstone for building Long-term Preservation Strategy. Such analysis may also improve conditions of end-user access. (English) [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:52
Módosítás dátuma 2021. 08. 09. 8:42:52

Címkék:

  • DIGITAL preservation
  • WEB archiving
  • WARC
  • web archive
  • ARC
  • archiving
  • Heritrix
  • archivace
  • dlouhodobá ochrana digitálních dokumentů
  • file formats
  • FILE organization (Computer science)
  • long term preservation
  • METADATA harvesting
  • Národní digitální knihovna
  • NARODNI knihovna Ceske republiky
  • National digital library
  • souborové formáty
  • web archiv

Forschungsdatenmanagement an der ETH-Bibliothek

Típus Könyv
Szerző Matthias Töwe
URL https://www.degruyter.com/document/doi/10.1515/9783110553796-015/html
Kiadó De Gruyter Saur
ISBN 978-3-11-055379-6
Dátum 2018/06/11
Egyéb Pages: 250-260
Publication Title: Bibliotheken der Schweiz: Innovation durch Kooperation
Section: Bibliotheken der Schweiz: Innovation durch Kooperation
Hozzáférés 2021. 07. 16. 10:22:45
Könyvtár Katalógus www.degruyter.com
Nyelv de
Kivonat Forschungsdatenmanagement an der ETH-Bibliothek was published in Bibliotheken der Schweiz: Innovation durch Kooperation on page 250.
Hozzáadás dátuma 2021. 08. 09. 8:44:34
Módosítás dátuma 2021. 08. 09. 8:44:34

Fostering Community Engagement through Datathon Events: The Archives Unleashed Experience

Típus Folyóiratcikk
Szerző Samantha Fritz
Szerző Ian Milligan
Szerző Nick Ruest
Szerző Jimmy Lin
URL http://hdl.handle.net/10315/38257
Oldalszám 14
Dátum 2021
Könyvtár Katalógus Zotero
Nyelv en
Kivonat This article explores the impact that a series of Archives Unleashed datathon events have had on community engagement both within the web archiving field, and more specifically, on the professional practices of attendees. We present results from surveyed datathon participants, in addition to related evidence from our events, to discuss how our participants saw the datathons as dramatically impacting both their professional practices as well as the broader web archiving community. Drawing on and adapting two leading community engagement models, we combine them to introduce a new understanding of how to build and engage users in an open-source digital humanities project. Our model illustrates both the activities undertaken by our project as well as the related impact they have on the field. The model can be broadly applied to other digital humanities projects seeking to engage their communities.
Hozzáadás dátuma 2021. 08. 09. 8:44:10
Módosítás dátuma 2021. 08. 09. 8:44:10

Fotografia cyfrowa i technologia 360o – zastosowanie w projektach realizowanych przez Politechnikę Wrocławską TT – Digital photography and 360o technology – applied in projects realized by Wroclaw University of Technology

Típus Folyóiratcikk
Szerző Monika Laura Pichlak
URL https://search.proquest.com/docview/1951540134?accountid=27464
Szám 172
Oldalszám 1
Kiadvány Elektroniczny Biuletyn Informacyjny Bibliotekarzy : EBIB
Dátum 2017
Egyéb Number: 172
Publisher: Stowarzyszenie Bibliotekarzy Polskich
Place: Centrum Wiedzy i Informacji Naukowo-Technicznej Politechnika Wrocławska ; Centrum Wiedzy i Informacji Naukowo-Technicznej Politechnika Wrocławska
Nyelv Polish
Kivonat W artykule została krótko przytoczona historia aparatu cyfrowego oraz podstawowe różnice między fotografią cyfrową a tradycyjną. Opisano działanie studia fotograficznego 360o oraz jego wykorzysta­nie w projektach Politechniki Wrocławskiej.
Hozzáadás dátuma 2021. 08. 09. 8:42:31
Módosítás dátuma 2021. 08. 09. 8:42:31

Címkék:

  • History
  • Library And Information Sciences
  • 10.12:INFORMATION COMMUNICATION – HUMANITIES
  • Digital photography
  • Wroclaw Poland

From a System of Journals to a Web of Objects

Típus Folyóiratcikk
Szerző Herbert Van de Sompel
Szerző Susan Davis
URL http://10.0.4.56/0361526X.2015.1026748
Kötet 68
Szám 1-4
Oldalszám 51-63
Kiadvány The Serials Librarian
ISSN 0361-526X
Dátum 2015-05-19
Egyéb Number: 1-4
DOI 10.1080/0361526X.2015.1026748
Kivonat The article focuses on the web-based research process presented by Herbert Van de Sompel, Prototyping Team Leader at the Research Library of the Los Alamos National Laboratory in New Mexico, in which he explored the transition from a paper-based system to a web-based scholarly communication system. Topics discussed include de Sompel's current and ongoing projects, the core functions of the scholarly communication system, and the possibility of a long-term access to the scholarly record.
Hozzáadás dátuma 2021. 08. 09. 8:42:59
Módosítás dátuma 2021. 08. 09. 8:42:59

Címkék:

  • web archiving
  • ACCESS to information
  • ARCHIVES
  • WORLD Wide Web
  • link rot
  • scholarly communication
  • LEARNING & scholarship
  • INFORMATION resources management
  • reference rot
  • SERIAL publications
  • web of objects

From archive to analysis: accessing web archives at scale through a cloud-based interface

Típus Folyóiratcikk
Szerző Nick Ruest
Szerző Samantha Fritz
Szerző Ryan Deschamps
Szerző Jimmy Lin
Szerző Ian Milligan
URL https://doi.org/10.1007/s42803-020-00029-6
Kiadvány International Journal of Digital Humanities
ISSN 2524-7840
Dátum 2021-01-06
Folyóirat rövid neve Int J Digit Humanities
DOI 10.1007/s42803-020-00029-6
Hozzáférés 2021. 07. 15. 10:43:31
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat This paper introduces the Archives Unleashed Cloud, a web-based interface for working with web archives at scale. Current access paradigms, largely driven by the scope and scale of web archives, generally involve using the command line and writing code. This access gap means that subject-matter experts, as opposed to developers and programmers, have few options to directly work with web archives beyond the page-by-page paradigm of the Wayback Machine. Drawing on first-hand research and analysis of how scholars use web archives, we present the interface design and underpinning architecture of the Archives Unleashed Cloud. We also discuss the sustainability implications of providing a cloud-based service for researchers to analyze their collections at scale.
Rövid cím From archive to analysis
Hozzáadás dátuma 2021. 08. 09. 8:44:12
Módosítás dátuma 2021. 08. 09. 8:44:12

From Print to Digital, from Document to Data: Digitalisation at the Publications Office of the European Union

Típus Folyóiratcikk
Szerző Valérie Schafer
URL https://www.degruyter.com/document/doi/10.1515/opis-2020-0015/html
Kötet 4
Szám 1
Oldalszám 203-216
Kiadvány Open Information Science
ISSN 2451-1781
Dátum 2020/01/01
Egyéb Number: 1
Publisher: De Gruyter Open Access
Section: Open Information Science
DOI 10.1515/opis-2020-0015
Hozzáférés 2021. 07. 16. 10:21:05
Könyvtár Katalógus www.degruyter.com
Nyelv en
Kivonat Since the 1970s, the Publications Office of the European Union, the official publisher of all the institutions and bodies of the EU, has had to adapt to a fast-changing situation as the number of EU Member States has grown and the number and nature of publications has evolved (including publishing public tenders of EU institutions and Member States in 1978 through a supplement to the Official Journal of the European Union and handling CELEX, an interinstitutional and multilingual automated documentation system for community law, in 1992). These changes occurred over several ages of computing. The computerisation of the Publications Office was primarily a response to the need for rationalisation and productivity, but the aim was also to gradually adapt to new types of document publication and consultation. These different stages of digitalisation required the constant transfer of information to a multitude of media. Supports, such as punched cards, optical discs and CD-ROMs, had varying life expectancies and are all evidence of attempts to digitise information before the Web. This evolution not only illustrates the need to constantly harmonise a large amount of information, it also highlights some continuities. It affects the management of information systems but also meets regularly updated standardisation, interoperability and sustainability needs within a complex ecosystem.
Rövid cím From Print to Digital, from Document to Data
Hozzáadás dátuma 2021. 08. 09. 8:44:33
Módosítás dátuma 2021. 08. 09. 8:44:33

From tree to network: reordering an archival catalogue

Típus Folyóiratcikk
Szerző Mark Bell
URL https://www.proquest.com/scholarly-journals/tree-network-reordering-archival-catalogue/docview/2466656573/se-2?accountid=15756
Kötet 30
Szám 3
Oldalszám 379-394
Kiadvány Records Management Journal
ISSN 09565698
Dátum 2020
Pontos lelőhely 2466656573
Egyéb Number: 3
Place: Bradford
Publisher: Emerald Group Publishing Limited
DOI 10.1108/RMJ-09-2019-0051
Nyelv English
Kivonat PurposeThis paper presents the results of a number of experiments performed at the National Archives, all related to the theme of linking collections of records. This paper aims to present a methodology for translating a hierarchy into a network structure using a number of methods for deriving statistical distributions from records metadata or content and then aggregating them. Simple similarity metrics are then used to compare and link, collections of records with similar characteristics.Design/methodology/approachThe approach taken is to consider a record at any level of the catalogue hierarchy as a summary of its children. A distribution for each child record is created (e.g. word counts and date distribution) and averaged/summed with the other children. This process is repeated up the hierarchy to find a representative distribution of the whole series. By doing this the authors can compare record series together and create a similarity network.FindingsThe summarising method was found to be applicable not only to a hierarchical catalogue but also to web archive data, which is by nature stored in a hierarchical folder structure. The case studies raised many questions worthy of further exploration such as how to present distributions and uncertainty to users and how to compare methods, which produce similarity scores on different scales.Originality/valueAlthough the techniques used to create distributions such as topic modelling and word frequency counts, are not new and have been used to compare documents, to the best of the knowledge applying the averaging approach to the archival catalogue is new. This provides an interesting method for zooming in and out of a collection, creating networks at different levels of granularity according to user needs.
Archívum ProQuest One Academic
Hozzáadás dátuma 2021. 08. 09. 8:44:39
Módosítás dátuma 2021. 08. 09. 8:44:39

Címkék:

  • Web archiving
  • Archives
  • Digital archives
  • Case studies
  • Business And Economics–Management
  • Network analysis
  • Record linkage
  • Topic modelling

From Web Archive to WebDigest: Concept and Examples

Típus Dolgozat
Szerző Li Xiaoming
Szerző Huang Lianen
URL http://dl.acm.org/citation.cfm?id=1378307.1378313
Hely Darlinghurst, Australia, Australia
Kiadó Australian Computer Society, Inc.
Oldalszám 11
ISBN 978-1-920682-56-9
Dátum 2007
Egyéb Series Title: ADC '08
Citation Key: Xiaoming:2008:WAW:1378307.1378313
Kivonat Much like a black hole, the Web, since its birth, has been absorbing all sorts of data (information) around the globe, ever generated along the path of human civilization. On the other hand, the digitized and networked (webbed) nature of web data, which generally means "easy to access", gives rise to much imagination on re-discovering, re-engineering, and re-using of the oceanic information. Nevertheless, lunch is not free. The same time when we see the grand opportunities, tremendous challenges are ahead. In this talk, I'll first introduce Web InfoMall (http://www.infomall.cn), the Chinese web archive we have been constructing since 2001. Along with the activities, we observe some useful capabilities have been developed, such as large scale web crawling and very large scale data organization. In addition, we discuss a step beyond the WebArchive, called WebDigest, which is an effort aimed at making use of the data in the web archive. With a web archive and associated capability, "web mining" here has a more or less different meaning, which spans from the structure analysis of the web to named entity and relation extractions, from spatial (if we consider URL as a space) information discovery to temporal information exhibition. The main challenge for us is around the theme of achieving reasonably good performance with affordable cost. As we are from a university lab, the underlying question is: what can be done (and how) in a university lab environment with modest resource. After all, most of the researches started from university lab. We need to understand the feasibilities and compromises while seeing the promises.
Kiadvány címe Proceedings of the Nineteenth Conference on Australasian Database – Volume 75
Hozzáadás dátuma 2021. 08. 09. 8:43:12
Módosítás dátuma 2021. 08. 09. 8:43:12

Full-Text and URL Search Over Web Archives

Típus Könyvfejezet
Szerző Miguel Costa
Szerkesztő Daniel Gomes
Szerkesztő Elena Demidova
Szerkesztő Jane Winters
Szerkesztő Thomas Risse
URL https://doi.org/10.1007/978-3-030-63291-5_7
Hely Cham
Kiadó Springer International Publishing
Oldalszám 71-84
ISBN 978-3-030-63291-5
Dátum 2021
Egyéb DOI: 10.1007/978-3-030-63291-5_7
Hozzáférés 2021. 07. 15. 9:52:26
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat Web archives are a historically valuable source of information. In some respects, web archives are the only record of the evolution of human society in the last two decades. They preserve a mix of personal and collective memories, the importance of which tends to grow as they age. However, the value of web archives depends on their users being able to search and access the information they require in efficient and effective ways. Without the possibility of exploring and exploiting the archived contents, web archives are useless. Web archive access functionalities range from basic browsing to advanced search and analytical services, accessed through user-friendly interfaces. Full-text and URL search have become the predominant and preferred forms of information discovery in web archives, fulfilling user needs and supporting search APIs that feed complex applications. Both full-text and URL search are based on the technology developed for modern web search engines, since the Web is the main resource targeted by both systems. However, while web search engines enable searching over the most recent web snapshot, web archives enable searching over multiple snapshots from the past. This means that web archives have to deal with a temporal dimension that is the cause of new challenges and opportunities, discussed throughout this chapter.
Könyv címe The Past Web: Exploring Web Archives
Hozzáadás dátuma 2021. 08. 09. 8:43:58
Módosítás dátuma 2021. 08. 09. 8:43:58

Functionalities of Web Archives

Típus Folyóiratcikk
Szerző Jinfang Niu
URL https://search.proquest.com/docview/1266143632?accountid=27464
Kötet 18
Szám 3-4
Kiadvány D-Lib Magazine
ISSN 1082-9873, 1082-9873
Dátum 2012-03
Egyéb Number: 3-4
Publisher: Corporation for National Research Initiatives, Reston, VA
Place: University of South Florida jinfang@usf.edu
DOI 10.1045/march2012-niu2
Nyelv English
Kivonat The functionalities that are important to the users of web archives range from basic searching and browsing to advanced personalized and customized services, data mining, and website reconstruction. The author examined ten of the most established English language web archives to determine which functionalities each of the archives supported, and how they compared. A functionality checklist was designed, based on use cases created by the International Internet Preservation Consortium (IIPC), and the findings of two related user studies. The functionality review was conducted, along with a comprehensive literature review of web archiving methods, in preparation for the development of a web archiving course for Library and Information School students. This paper describes the functionalities used in the checklist, the extent to which those functionalities are implemented by the various archives, and discusses the author's findings. Adapted from the source document.
Hozzáadás dátuma 2021. 08. 09. 8:42:12
Módosítás dátuma 2021. 08. 09. 8:42:12

Címkék:

  • Web archiving
  • Web archive
  • article
  • Methods
  • 5.18: ELECTRONIC MEDIA
  • evaluation
  • Evaluation
  • functionality
  • overview
  • usability
  • Usability

Fuse: A Reproducible, Extendable, Internet-scale Corpus of Spreadsheets

Típus Dolgozat
Szerző Titus Barik
Szerző Kevin Lubick
Szerző Justin Smith
Szerző John Slankas
Szerző Emerson Murphy-Hill
URL http://dl.acm.org/citation.cfm?id=2820518.2820594
Hely Piscataway, NJ, USA
Kiadó IEEE Press
Oldalszám 486-489
ISBN 978-0-7695-5594-2
Dátum 2015
Egyéb Series Title: MSR '15
Citation Key: Barik:2015:FUR:2820518.2820594
Kivonat Spreadsheets are perhaps the most ubiquitous form of end-user programming software. This paper describes a corpus, called Fuse, containing 2,127,284 URLs that return spreadsheets (and their HTTP server responses), and 249,376 unique spreadsheets, contained within a public web archive of over 26.83 billion pages. Obtained using nearly 60,000 hours of computation, the resulting corpus exhibits several useful properties over prior spreadsheet corpora, including reproducibility and extendability. Our corpus is unencumbered by any license agreements, available to all, and intended for wide usage by end-user software engineering researchers. In this paper, we detail the data and the spreadsheet extraction process, describe the data schema, and discuss the trade-offs of Fuse with other corpora.
Kiadvány címe Proceedings of the 12th Working Conference on Mining Software Repositories
Hozzáadás dátuma 2021. 08. 09. 8:43:10
Módosítás dátuma 2021. 08. 09. 8:43:10

Gathering the 'Net: Efforts and Challenges in Archiving Pacific Websites

Típus Folyóiratcikk
Szerző Eleanor Kleiber
URL https://search.proquest.com/docview/1629324578?accountid=27464
Kötet 26
Szám 1
Oldalszám 158-166
Kiadvány The Contemporary Pacific
ISSN 1043-898X, 1043-898X
Dátum 2014
Egyéb Number: 1
Publisher: University of Hawaii Press, Honolulu
DOI 10.1353/cp.2014.0017
Nyelv English
Kivonat In addition to more traditional material — books, journals and other serial publications, brochures, music, films, manuscripts, photographs, postcards and archives — the University of Hawai'i-Manoa (UHM) Library's Hawaiian and Pacific Collections are now actively collecting websites. With so many new websites being created in and about the Pacific Islands region, and so much more information being made available online — and at times exclusively so — it has become increasingly clear to the librarians of these collections that to adequately document this period in history it is necessary to collect and preserve websites. The UHM Library has been attempting to archive websites in one form or another since 2001. This essay will discuss the importance of collecting Pacific websites, describe how the Hawaiian and Pacific Collections are finding solutions for the inherent challenges of preserving websites, and explore some potential future directions that would strengthen the project and meet the information and research needs of the Pacific Islands region. Adapted from the source document.
Hozzáadás dátuma 2021. 08. 09. 8:43:04
Módosítás dátuma 2021. 08. 09. 8:43:04

Címkék:

  • Web archiving
  • Web sites
  • 9.15: TECHNICAL SERVICES – PRESERVATION
  • article
  • Pacific Region
  • University libraries

Generating Stories From Archived Collections

Típus Dolgozat
Szerző Yasmin AlNoamany
Szerző Michele C Weigle
Szerző Michael L Nelson
URL http://doi.acm.org/10.1145/3091478.3091508
Hely New York, NY, USA
Kiadó ACM
Oldalszám 309-318
ISBN 978-1-4503-4896-6
Dátum 2017
Egyéb Series Title: WebSci '17
Citation Key: AlNoamany:2017:GSA:3091478.3091508
DOI 10.1145/3091478.3091508
Kivonat With the extensive growth of the Web, multiple Web archiving initiatives have been started to archive different aspects of the Web. Services such as Archive-It exist to allow institutions to develop, curate, and preserve collections of Web resources. Understanding the contents and boundaries of these archived collections is a challenge, resulting in the paradox of the larger the collection, the harder it is to understand. Meanwhile, as the sheer volume of data grows on the Web, "storytelling" is becoming a popular technique in social media for selecting Web resources to support a particular narrative or "story". We address the problem of understanding archived collections by proposing the Dark and Stormy Archive (DSA) framework, in which we integrate "storytelling" social media and Web archives. In the DSA framework, we identify, evaluate, and select candidate Web pages from archived collections that summarize the holdings of these collections, arrange them in chronological order, and then visualize these pages using tools that users already are familiar with, such as Storify. Inspired by the Turing Test, we evaluate the stories automatically generated by the DSA framework against a ground truth dataset of hand-crafted stories, generated by expert archivists from Archive-It collections. Using Amazon's Mechanical Turk, we found that the stories automatically generated by DSA are indistinguishable from those created by human subject domain experts, while at the same time both kinds of stories (automatic and human) are easily distinguished from randomly generated stories.
Kiadvány címe Proceedings of the 2017 ACM on Web Science Conference
Hozzáadás dátuma 2021. 08. 09. 8:43:19
Módosítás dátuma 2021. 08. 09. 8:43:19

Címkék:

  • web archiving
  • archived collections
  • document similarity
  • information retrieval
  • internet archive
  • storytelling
  • web content mining

Getting acquainted with social networks and apps: capturing and archiving social media content

Típus Folyóiratcikk
Szerző Katie Elson Anderson
URL https://www.proquest.com/trade-journals/getting-acquainted-with-social-networks-apps/docview/2499027983/se-2?accountid=15756
Kötet 37
Szám 2
Oldalszám 18-22
Kiadvány Library Hi Tech News
ISSN 07419058
Dátum 2020
Pontos lelőhely 2499027983
Egyéb Number: 2
Place: Bradford
Publisher: Emerald Group Publishing Limited
DOI 10.1108/LHTN-03-2019-0011
Nyelv English
Kivonat The ephemeral nature of the content and perceived lack of permanency of the platforms led to questions about the actual staying power of sites such as Facebook and Twitter.

The important thing to remember is that while the platforms and apps may continue to thrive or be shuttered, created or forgotten, the underlying nature of connection, networking, data storage and content sharing is unlikely to change dramatically, just the platform, method and space may change.

Libraries and librarians have been part of that consistent group of users, embracing the ability to post in a number of different formats, provide attribution and connect with communities (Power, 2014; Anderson, 2015).

Looking beyond the personal risk of losing one’s teenage online past or the comments on an old blog post to the larger impact of social media and society, one can quickly see the importance of preserving and archiving large chunks of internet history represented on these social media platforms.

Archívum ProQuest One Academic; SciTech Premium Collection
Hozzáadás dátuma 2021. 08. 09. 8:44:39
Módosítás dátuma 2021. 08. 09. 8:44:39

Címkék:

  • Web archiving
  • Web archives
  • Archiving
  • Digital archives
  • Digital media
  • Social media
  • Social networks
  • Libraries
  • Librarians
  • Automation
  • Information professionals
  • Internet content
  • Data storage
  • History of archives
  • Library And Information Sciences–Computer Applications

Getting Started in Web Archiving

Típus Dolgozat
Szerző Abigail Grotke
URL http://library.ifla.org/1637/
Dátum 2017
Hozzáférés 2017. 06. 26. 2:00:00
Kivonat This purpose of this paper is to provide general information about how organizations can get started in web archiving, for both those who are developing new web archiving programs and for libraries that are just beginning to explore the possibilities. The paper includes an overview of considerations when establishing a web archiving program, including typical approaches that national libraries take when preserving the web. These include: collection development, legal issues, tools and approaches, staffing, and whether to do work in-house or outsource some or most of the work. The paper will introduce the International Internet Preservation Consortium and the benefits of collaboration when building web archives.
Kiadvány címe IFLA Congress 2017, Wroclaw, Poland
Hozzáadás dátuma 2021. 08. 09. 8:41:46
Módosítás dátuma 2021. 08. 09. 8:41:46

Getting Structured Data from the Internet: Running Web Crawlers/Scrapers on a Big Data Production Scale

Típus Könyv
Szerző Jay M. Patel
URL http://link.springer.com/10.1007/978-1-4842-6576-5
Hely Berkeley, CA
Kiadó Apress
ISBN 978-1-4842-6575-8 978-1-4842-6576-5
Dátum 2020
Egyéb DOI: 10.1007/978-1-4842-6576-5
Hozzáférés 2021. 07. 15. 11:51:31
Könyvtár Katalógus DOI.org (Crossref)
Nyelv en
Rövid cím Getting Structured Data from the Internet
Hozzáadás dátuma 2021. 08. 09. 8:44:31
Módosítás dátuma 2021. 08. 09. 8:44:31

Getting to Know Our Web Archive: A Pilot Project to Collaboratively Increase Access to Digital Cultural Heritage Materials in Wyoming

Típus Könyv
Szerző Lehman R Amanda
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Hely United States, North America
Kiadó Digital USD
Dátum 2018
Kivonat The University of Wyoming is the only four year higher education institution in the state, a unique position amongst colleges and universities in the United States. Given this unusual status it is especially important that the university libraries use their resources to identify and partner with communities around the state to build collections that preserve their cultural heritage. An Archive-It subscription was purchased in 2016, with an initial goal of capturing university related materials. In an effort to expand the scope and meaningfulness of the web archive, a project has been undertaken to use university and statewide relationships to build a Wyoming focused Native American digital cultural heritage collection comprised of web-based materials. This is an interdepartmental effort led by the Digital Collections Librarian and the Metadata Librarian that includes collaboration within the library, the university, and the state.
Hozzáadás dátuma 2021. 08. 09. 8:42:10
Módosítás dátuma 2021. 08. 09. 8:42:10

Címkék:

  • web archiving
  • Archive-It
  • collaboration
  • metadata
  • access
  • Cataloging and Metadata
  • Collection Development and Management
  • Digital Humanities
  • outreach
  • WorldCat

Giving with one click, taking with the other: e-legal deposit, web archives and researcher access

Típus Könyvfejezet
Szerző Jane Winters
URL https://sas-space.sas.ac.uk/9439/1/Giving%20with%20one%20click%2C%20taking%20with%20the%20other.pdf
Kiadás 1.
Hely London
Kiadó Facet Publishing
Oldalszám 159-178
ISBN 978-1-78330-377-9
Dátum 2020.
Egyéb URL points to a pre-print version.
Hozzáférés 2021. 08. 06. 2:00:00
Könyvtár Katalógus Zotero
Nyelv en
Könyv címe Electronic Legal Deposit: Shaping the library collections of the future
Hozzáadás dátuma 2021. 08. 09. 8:44:12
Módosítás dátuma 2021. 08. 09. 8:44:12

Glitch

Típus Folyóiratcikk
Szerző Ursula K. Frederick
URL http://10.0.6.22/jca.v2i1.28284
Kötet 2
Szám 1
Oldalszám S28-S32
Kiadvány Journal of Contemporary Archaeology
ISSN 2051-3429
Dátum 2015-08-29
Egyéb Number: 1
Publisher: Equinox Publishing Group
DOI 10.1558/jca.v2i1.28244
Kivonat The rapid and continual advancement of the internet as a platform for communication on archaeological topics has brought permanent changes to the methods through which we present information from the sector to the public. This article discusses the potential for an exploration of the UK web archives for information about the history of archaeology online, and a case study undertaken as part of a Big Data project at the British Library by the author. The article concludes that we have a significant issue for media archaeologists in the future; the lack of material evidence for these iterations means we risk losing an understanding of our social, economic, cultural, and technological histories and our perception of these developments over time. It suggests that further exploration of these archives from an archaeological perspective could be beneficial both as an investigation of the iterations of digital archaeology (the creation of a history of public engagement with the subject), and as a study of the use of archaeological techniques for archival research. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:43:24
Módosítás dátuma 2021. 08. 09. 8:43:24

Címkék:

  • web archives
  • ARCHIVES
  • archaeology
  • ARCHAEOLOGY & history
  • ARCHIVAL resources
  • digital communications
  • digital data
  • DOCUMENTATION
  • media archaeology
  • SOCIOECONOMICS

Global trends in library web-archives

Típus Folyóiratcikk
Szerző Natalya S. Redkina
Szám 1
Oldalszám 100
Kiadvány Scientific and Technical Libraries
Dátum 2021
Egyéb Number: 1
Könyvtár Katalógus Google Scholar
Hozzáadás dátuma 2021. 08. 09. 8:44:02
Módosítás dátuma 2021. 08. 09. 8:44:02

Global Web Archive Integration with Memento

Típus Dolgozat
Szerző Robert Sanderson
URL http://doi.acm.org/10.1145/2232817.2232900
Hely New York, NY, USA
Kiadó ACM
Oldalszám 379-380
ISBN 978-1-4503-1154-0
Dátum 2012
Egyéb Series Title: JCDL '12
Citation Key: Sanderson:2012:GWA:2232817.2232900
DOI 10.1145/2232817.2232900
Kivonat In this poster, we describe the approach taken to designing and implementing a tera-scale multi-repository index of archived web resources using massively parallel processing.
Kiadvány címe Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:11
Módosítás dátuma 2021. 08. 09. 8:43:11

Címkék:

  • digital preservation
  • memento
  • high performance computing

Go fish: Conceptualising the challenges of engaging national web archives for digital research

Típus Folyóiratcikk
Szerző Jessica Ogden
Szerző Emily Maemura
URL https://doi.org/10.1007/s42803-021-00032-5
Kiadvány International Journal of Digital Humanities
ISSN 2524-7840
Dátum 2021-04-27
Folyóirat rövid neve Int J Digit Humanities
DOI 10.1007/s42803-021-00032-5
Hozzáférés 2021. 07. 15. 10:10:06
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat Our work considers the sociotechnical and organisational constraints of web archiving in order to understand how these factors and contingencies influence research engagement with national web collections. In this article, we compare and contrast our experiences of undertaking web archival research at two national web archives: the UK Web Archive located at the British Library and the Netarchive at the Royal Danish Library. Based on personal interactions with the collections, interviews with library staff and observations of web archiving activities, we invoke three conceptual devices (orientating, auditing and constructing) to describe common research practices and associated challenges in the context of each national web archive. Through this framework we centre the early stages of the research process that are often only given cursory attention in methodological descriptions of web archival research, to discuss the epistemological entanglements of researcher practices, instruments, tools and methods that create the conditions of possibility for new knowledge and scholarship in this space. In this analysis, we highlight the significant time and energy required on the part of researchers to begin using national web archives, as well as the value of engaging with the curatorial infrastructure that enables web archiving in practice. Focusing an analysis on these research infrastructures facilitates a discussion of how these web archival interfaces both enable and foreclose on particular forms of researcher engagement with the past Web and in turn contributes to critical ongoing debates surrounding the opportunities and constraints of digital sources, methodologies and claims within the Digital Humanities.
Rövid cím ‘Go fish’
Hozzáadás dátuma 2021. 08. 09. 8:44:03
Módosítás dátuma 2021. 08. 09. 8:44:03

Going Back in Time to Find What Existed on the Web and How much has been Preserved: How much of Palestinian Web has been Archived?

Típus Könyv
Szerző Thaer Samar
Szerző Hadi Khalilia
Dátum January 9, 2021
Egyéb DOI: 10.24897/acn.64.68.7108
Könyvtár Katalógus ResearchGate
Kivonat The web is an important resource for publishing and sharing content. The main characteristic of the web is its volatility. Content is added, updated, and deleted all the time. Therefore, many national and international institutes started crawling and archiving the content of the web. The main focus of national institutes is to archive the web related to their country heritage, for example, the National Library of the Netherlands is focusing on archiving website that are of value to the Dutch heritage. However, there are still countries that haven’t taken the action to archive their web, which will result in loosing and having a gap in the knowledge. In this research, we focus on shedding the light on the Palestinian web. Precisely, how much of the Palestinian web has been archived. First, we create a list of Palestinian hosts that were on the web. For that we queried Google index exploiting the time range filter in order to get hosts overtime. We collected in 98 hosts in average in 5-years granularity from the year 1990 to 2019. We also obtained Palestinian hosts from the DMOZ directory. We collected 188 hosts. Second, we investigate the coverage of collected hosts in the Internet Archive and the Common-Crawl. We found that coverage of Google hosts in the Internet Archive ranges from 0% to 89% from oldest to newest time-granularity. The coverage of DMOZ hosts was 96%. The coverage of Google hosts in the Common-Crawl 57.1% to 74.3, while the coverage of DMOZ hosts in the Common-Crawl was in average 25% in all crawls. We found that even the host is covered in Internet Archive and Common-Crawl, the lifespan and the number of archived versions are low.
Rövid cím Going Back in Time to Find What Existed on the Web and How much has been Preserved
Hozzáadás dátuma 2021. 08. 09. 8:44:10
Módosítás dátuma 2021. 08. 09. 8:44:10

Government Surveillance and Declassified Documents.

Típus Folyóiratcikk
Szerző Gail Golderman
Szerző Bruce Connolly
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 143
Szám 1
Oldalszám 124-131
Kiadvány Library Journal
ISSN 03630277
Dátum 2018-01
Egyéb Number: 1
Publisher: Media Source, Inc.
Kivonat Reviews are presented for several websites, including Digital National Security Archive at www.proquest.com/productsservices/databases/dnsa.html, ProQuest History Vault: Black Freedom Struggle in the 20th Century at www.proquest.com/productsservices/historyvault.html, and Secret Files from World.
Hozzáadás dátuma 2021. 08. 09. 8:42:38
Módosítás dátuma 2021. 08. 09. 8:42:38

Címkék:

  • ARCHIVES — Computer network resources
  • UNITED States. National Security Agency
  • WEB archives

Growing a web archiving program: A case study for evolving an organization-management plan

Típus Dolgozat
Szerző Todd Suomela
Hely Cape Town
Kiadó IFLA
Dátum 2015
Hozzáférés 2017. 06. 23. 2:00:00
Kivonat Web archiving presents a number of technical and organizational challenges for libraries. The University of Alberta Libraries has been using Archive-IT to manage a web archiving program for since 2009. This presentation will describe the history of web archiving at the University of Alberta and show the evolution of those services over time. Web archiving is not just technically challenging, it can also be organizationally challenging. Alberta has elected to use a distributed model for collection management by spreading the work for collection development and maintenance across subject librarians and library support staff. Some of the challenges of such a management plan include collection scoping, skill transfer, quality assurance, and metadata creation. The libraries also collaborate with regional and national consortia while working to expand services to researchers and casual users of the library. Attendees will takeaway lessons about collection management, collaboration, and research services for web archives.
Kiadvány címe Preservation and Conservation with Information Technology. IFLA 2015 South Africa
Hozzáadás dátuma 2021. 08. 09. 8:41:48
Módosítás dátuma 2021. 08. 09. 8:41:48

Címkék:

  • Web archives
  • Distributed collaboration
  • Library Collections Management

Growing an Archives Department: (and other concerns of a new library manager)

Típus Folyóiratcikk
Szerző Joe Marciniak
URL https://search.proquest.com/docview/1680526979?accountid=27464
Kötet 35
Szám 3
Oldalszám 16-19
Kiadvány Computers in Libraries
ISSN 10417915
Dátum 2015-04
Egyéb Number: 3
Publisher: Information Today, Inc.
Place: Westport
Nyelv English
Kivonat The difference between a librarian and an archivist was librarian will drill, glue, and tape a resource to get it back in the stacks. An archivist will seal, hide, and lock up a resource to preserve it. In other words, the difference between a librarian and an archivist is everything. Librarians and archivists just have different professional philosophies. It comes down to access versus preservation. Although the archives department had existed in various ways for many decades, it was only given a permanent library home in 2009. The library's mission statement outlines the overall goal of the library: to provide quality resources, a high level of service, and innovative learning environments with leading-edge technology. Providing quality resources was where the author felt the archives department could fit into the overall mission of the library. The mission statement of your institution is an essential starting point for establishing common ground with a colleague when working on a project.
Hozzáadás dátuma 2021. 08. 09. 8:42:15
Módosítás dátuma 2021. 08. 09. 8:42:15

Címkék:

  • Web archiving
  • Library And Information Sciences–Computer Applica
  • Digitization
  • Archives & records
  • Library collections
  • Metadata
  • Librarians
  • Archivists
  • Library managers
  • Meetings
  • Mission statements

Gyorsmérleg – az OSZK Webarchívumának és néhány könyvtárnak a KDS-K pályázat keretében történt együttműködésérő

Típus Folyóiratcikk
Szerző Ákos László Visky
Kötet 29.
Szám 7-8.
Kiadvány Könyv, könyvtár, könyvtáros
Dátum 2020
Egyéb Number: 7-8.
megjelenés alatt áll.
Nyelv magyar
Kivonat Most, hogy a végéhez értünk a Közgyűjteményi Digitalizálási Stratégia könyvtári ágának keretében (KDS-K) megvalósult együttműködésnek – mely során az Országos Széchényi Könyvtár (OSZK) Webarchívuma és a pályázatban nyertesként részt vevő megyei hatókörű városi könyvtárak elsősorban a nemzeti webtér regionális vonatkozású webhelyeinek feltárásában működtek együtt –, illő rövid összefoglalót adni a közös munka eredményéről.
Hozzáadás dátuma 2021. 08. 09. 8:43:48
Módosítás dátuma 2021. 08. 09. 8:43:48

Hachette Book Group v. Internet Archive: Is There a Better Way to Restore Balance in Copyright?

Típus Folyóiratcikk
Szerző Robin Schard
URL https://www.proquest.com/scholarly-journals/hachette-book-group-v-internet-archive-is-there/docview/2497898976/se-2?accountid=15756
Kötet 24
Szám 1-2
Oldalszám 53-58
Kiadvány Internet Reference Services Quarterly
ISSN 1087-5301
Dátum Jan 2021
Pontos lelőhely 2497898976
Egyéb Number: 1-2
Place: Binghamton
Publisher: Taylor & Francis Ltd.
DOI 10.1080/10875301.2021.1875100
Nyelv English
Kivonat Using the opening of the National Emergency Library as an opportunity, four large publishers, Hachette Book Group, HarperCollins Publishers, John Wiley & Sons, and Penguin Random House, filed suit against the Internet Archive claiming copyright infringement. This article discusses the lawsuit and the claims on both sides before discussing the weaknesses for the parties, and recommending that negotiation would be the best way to move forward.
Archívum SciTech Premium Collection
Hozzáadás dátuma 2021. 08. 09. 8:44:39
Módosítás dátuma 2021. 08. 09. 8:44:39

Címkék:

  • Web archiving
  • Library And Information Sciences
  • Archives & records
  • Internet
  • Copyright
  • Litigation
  • copyright
  • fair use
  • controlled digital lending
  • Hachette Book Group
  • Inc. v. Internet Archive
  • Infringement
  • National Emergency Library
  • Open Library
  • Publishing industry

Hachette Book Group v. Internet Archive: Is There a Better Way to Restore Balance in Copyright? – ProQuest

Típus Weboldal
URL https://www.proquest.com/docview/2497898976/B290DF7AC35D4B04PQ/12?accountid=15756
Dátum 2021-07-16 07:30:56
Hozzáférés 2021. 07. 16. 9:30:56
Nyelv hu
Kivonat Explore millions of resources from scholarly journals, books, newspapers, videos and more, on the ProQuest Platform.
Rövid cím Hachette Book Group v. Internet Archive
Hozzáadás dátuma 2021. 08. 09. 8:44:32
Módosítás dátuma 2021. 08. 09. 8:44:32

Hard Content, Fab Front-End: Archiving Websites of Dutch Public Broadcasters

Típus Folyóiratcikk
Szerző Lotte Belice Baltussen
Szerző Jaap Blom
Szerző Leïla Medjkoune
Szerző Radu Pop
Szerző Jasmijn Van Gorp
Szerző Hugo Huurdeman
Szerző Leidi Haaijer
URL https://search.proquest.com/docview/1623365171?accountid=27464
Kötet 25
Szám 1-2
Oldalszám 69-91
Kiadvány Alexandria: The Journal of National and International Library and Information Issues
ISSN 0955-7490
Dátum 2014-08
Egyéb Number: 1-2
PMID: 1623365171
Publisher: Sage Publications Ltd.
Place: London
DOI 10.7227/ALX.0021
Nyelv English
Kivonat Although there are a great variety of web archiving projects around the world, there are not many that focus explicitly on websites of broadcasters. The reason is that funds are often lacking to do this, and that broadcaster websites are difficult to archive, due to their dynamic and audiovisual content. The Netherlands Institute for Sound and Vision, with its collection of over 800,000 hours of audiovisual content has been involved in a small-scale research project related to web archiving since 2008. When Sound and Vision was approached by Dutch public broadcaster NTR to archive four of its websites, it was decided to start a collaborative pilot project that focused both on learning more about archiving broadcaster websites and developing a clean and modern public access interface. The main lesson learned from this pilot is that to archive highly dynamic and AV-heavy broadcaster websites it is vital to use supplementary capture tools and manual archiving of this ‘difficult’ content. Furthermore, since the focus of web archiving projects is usually not on a good-looking front-end, the wheel had to be partly re-invented by involving various stakeholders and determining the most important requirements. The first version of the web archive was evaluated by various prospective target users. This evaluation revealed that the participants indeed appreciated the look and speed of the web archive, and that users needed to be made more aware of the web archive's purpose and limitations. The work will be continued and scaled up, by archiving more broadcaster websites, continuing the research on how best to capture and make accessible dynamic and AV content, and by creating standard practices for making the web archive publicly available.
Hozzáadás dátuma 2021. 08. 09. 8:42:00
Módosítás dátuma 2021. 08. 09. 8:42:00

Címkék:

  • web archives
  • Library And Information Sciences
  • audiovisual material
  • broadcasters' websites
  • user studies

Here Today, Gone within a Month: The Fleeting Life of Digital News

Típus Dolgozat
Szerző Martin Halbert
Szerző Katherine Skinner
Szerző Marc Wilson
Szerző Frederick Zarndt
URL http://library.ifla.org/id/eprint/2077
Hely Lexington, KY, USA
Kiadó IFLA — International Federation of Library Associations and Institutions
Dátum 2016
Kivonat In 1989 on the shores of Montana’s beautiful Flathead Lake, the owners of the weekly newspaper the Bigfork Eagle started TownNews.com to help community newspapers with developing technology. TownNews.com has since evolved into an integrated digital publishing and content management system used by more than 1600 newspaper, broadcast, magazine, and web-native publications in North America. TownNews.com is now headquartered on the banks of the mighty Mississippi river in Moline Illinois. Not long ago Marc Wilson, CEO of TownNews.com, noticed that of the 220,000+ e-edition pages posted on behalf of its customers at the beginning of the month, 210,000 were deleted by month’s end. What? The front page story about a local business being sold to an international corporation that I read online September 1 will be gone by September 30? As well as the story about my daughter’s 1st place finish in the district field and track meet? A 2014 national survey by the Reynolds Journalism Institute (RJI) of 70 digital-only and 406 hybrid (digital and print) newspapers conclusively showed that newspaper publishers also do not maintain archives of the content they produce. RJI found a dismal 12% of the “hybrid” newspapers reported even backing up their digital news content and fully 20% of the “digital-only” newspapers reported that they are backing up none of their content. Educopia Institute’s 2012 and 2015 surveys with newspapers and libraries concur, and further demonstrate that the longstanding partner to the newspaper—the library—likewise is neither collecting nor preserving this digital content. This leaves us with a bitter irony, that today, one can find stories published prior to 1922 in the Library of Congress’s Chronicling America and other digitized, out-of-copyright newspaper collections but cannot, and never will be able to, read a story published online less than a month ago. In this paper we look at how much news is published online that is never published in print or on more permanent media. We estimate how much online news is or will soon be forever lost because no one preserves it: not publishers, not libraries, not content management systems, and not the Internet Archive. We delve into some of the reasons why this content is not yet preserved, and we examine the persistent challenges of digital preservation and of digital curation of this content type. We then suggest a pathway forward, via some initial steps that journalists, producers, legislators, libraries, distributors, and readers may each take to begin to rectify this historical loss going forward.
Kiadvány címe IFLA WLIC 2016 – Columbus, OH – Connections. Collaboration. Community in Session S21 – Satellite Meeting: News Media. In: News, new roles & preservation advocacy: moving libraries into action, 10-12 August 2016, Lexington, KY, USA.
Hozzáadás dátuma 2021. 08. 09. 8:43:31
Módosítás dátuma 2021. 08. 09. 8:43:31

Címkék:

  • preservation
  • news
  • born digital news
  • e-edition
  • newspapers

Hiberlink: Towards Time Travel for the Scholarly Web

Típus Dolgozat
Szerző Robert Sanderson
Szerző Herbert de Sompel
Szerző Peter Burnhill
Szerző Claire Grover
URL http://doi.acm.org/10.1145/2499583.2500370
Hely New York, NY, USA
Kiadó ACM
Oldalszám 21
ISBN 978-1-4503-2185-3
Dátum 2013
Egyéb Series Title: DPRMA '13
Citation Key: Sanderson:2013:HTT:2499583.2500370
DOI 10.1145/2499583.2500370
Kivonat The preservation of traditional, digital scholarly output, such as PDF or HTML journal articles, is relatively well understood, and adequately organized through systems such as Portico and LoCKSS. However, the scholarly record is expanding with a wide variety of materials for which no established archival approaches exist. This includes, for example, workflows and software, project descriptions, demonstrations, datasets, and videos published on the web. Some of these resources are referenced in traditional papers and the lack of archival infrastructure yields a scholarly record with many loose ends. The Hiberlink project aims to quantify the extent to which such referenced resources are preserved in web archives, and propose solutions to ensure the longevity of the context of the research, along side the formal publication. The Hiberlink project regards the problem of preserving web resources referenced in scholarly papers as a special case of the more general problem of preserving scholarly compound objects, aka Research Objects, which consist of resources with a variety of relationships and dependencies.
Kiadvány címe Proceedings of the 1st International Workshop on Digital Preservation of Research Methods and Artefacts
Hozzáadás dátuma 2021. 08. 09. 8:43:09
Módosítás dátuma 2021. 08. 09. 8:43:09

Címkék:

  • memento
  • preservation
  • web
  • research objects
  • repositories

Historians and Web Archives.

Típus Folyóiratcikk
Szerző Susanne Belovari
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Szám 83
Oldalszám 59-79
Kiadvány Archivaria
ISSN 03186954
Dátum 2017
Egyéb Number: 83
Kivonat Since the 1990s, the Web has increasingly become the location where we carry out our activities and generate primary and secondary records. Increasingly, such records exist only on the Web, with no complementary or supplementary records available elsewhere. While web archives began to preserve this legacy in 1991, web history has not yet emerged as a fully developed field. One explanation may be historians' concerns that they will not be able to replicate their historical research process when using web archives, and may not find essential and authoritative records. The article's first section proposes a thought experiment in which a future historian in 2050 wants to research web history using web archives as they existed in 2015. She relies on the customary historical research process through which historians choose topics and search, browse, and contextualize sources in depth and iteratively. The experiment fails when our historian is unable to locate appropriate repositories and authoritative records without resorting to the live Web of 2015. The second section then analyzes 21 eminent web archives in 2015 and issues that may have an impact on historical research. Most web archives are apparently akin to libraries of information resources. Archivists and historians, however, need web repositories to contain and make accessible essential web records of enduring cultural, historical, and evidentiary value. The article suggests that historians may once again prove invaluable in figuring out basic archival issues related to web records and archives, just as they helped shape archival policies a couple of centuries ago. (English) [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:05
Módosítás dátuma 2021. 08. 09. 8:42:05

Címkék:

  • Web archiving
  • Web archives
  • Archives
  • Digital libraries
  • Historians

Historical Web as a Tool for Analyzing Social Change BT – Second International Handbook of Internet Research

Típus Könyvfejezet
Szerző Ralph Schroeder
Szerző Niels Brügger
Szerző Josh Cowls
Szerkesztő Jeremy Hunsinger
Szerkesztő Matthew M Allen
Szerkesztő Lisbeth Klastrup
URL https://doi.org/10.1007/978-94-024-1555-1_24
Hely Dordrecht
Kiadó Springer Netherlands
Oldalszám 489-504
ISBN 978-94-024-1555-1
Dátum 2020
Egyéb DOI: 10.1007/978-94-024-1555-1_24
Kivonat This chapter discusses how the World Wide Web can be used as a resource for historians and social scientists. The web has existed for more than two decades and been used for many purposes, including as a source of information, entertainment, and much else. It has become an indispensable part of our daily lives. Future historians and social scientists are therefore bound to look to the web, its content, and structure, to understand how society was changing – just as they have used various records such as letters, novels, newspapers, radio, television, and other artifacts as a record of the past for the pre-digital era. This chapter explores how scholars can make use of the archived web as a source for understanding historical patterns of culture and society, including the challenges they face in doing so.
Könyv címe Second International Handbook of Internet Research
Hozzáadás dátuma 2021. 08. 09. 8:43:38
Módosítás dátuma 2021. 08. 09. 8:43:38

History in the Age of Abundance? How the Web is Transforming Historical Research. Ian Milligan.

Típus Folyóiratcikk
Szerző Lisa Dillon
URL https://www.utpjournals.press/doi/abs/10.3138/chr.102.1.br16
Kötet 102
Szám 1
Oldalszám 202-204
Kiadvány Canadian Historical Review
ISSN 0008-3755
Dátum March 1, 2021
Egyéb Number: 1
Publisher: University of Toronto Press
DOI 10.3138/chr.102.1.br16
Hozzáférés 2021. 07. 15. 10:19:26
Könyvtár Katalógus utpjournals.press (Atypon)
Rövid cím History in the Age of Abundance?
Hozzáadás dátuma 2021. 08. 09. 8:44:07
Módosítás dátuma 2021. 08. 09. 8:44:07

History’s Future in the Age of the Internet

Típus Folyóiratcikk
Szerző Daniel J. Story
Szerző Jo Guldi
Szerző Tim Hitchcock
Szerző Michelle Moravec
URL https://doi.org/10.1093/ahr/rhaa477
Kötet 125
Szám 4
Oldalszám 1337-1346
Kiadvány The American Historical Review
ISSN 0002-8762
Dátum October 21, 2020
Egyéb Number: 4
Folyóirat rövid neve The American Historical Review
DOI 10.1093/ahr/rhaa477
Hozzáférés 2021. 07. 15. 11:23:50
Könyvtár Katalógus Silverchair
Kivonat Ian Milligan’s History in the Age of Abundance? How the Web Is Transforming Historical Research (2019) presents and interrogates the challenges and opportunities that born-digital materials have for historians. Milligan argues that historians who wish to grapple with the archived internet need to think much more aggressively about engaging with digital methods and tools that can complement and extend the well-honed practices of close reading with approaches that can help analyze the vast and often unstructured archives of internet data. In this AHR Review Roundtable, three historians—Jo Guldi, Tim Hitchcock, and Michelle Moravec, all of whom incorporate digital approaches and concerns into their work—engage with a set of questions developed by Digital Scholarship Librarian Daniel J. Story, to discuss Milligan’s treatment of the digital archive of the web and its implications for historians’ work. Milligan offers a response to these insights and critiques, emphasizing the need for the historical discipline to change from within and build upon its valuable qualities.
Hozzáadás dátuma 2021. 08. 09. 8:44:24
Módosítás dátuma 2021. 08. 09. 8:44:24

Histrace: Building a Search Engine of Historical Events

Típus Dolgozat
Szerző Lian'en Huang
Szerző Jonathan J H Zhu
Szerző Xiaoming Li
URL http://doi.acm.org/10.1145/1367497.1367703
Hely New York, NY, USA
Kiadó ACM
Oldalszám 1155-1156
ISBN 978-1-60558-085-2
Dátum 2008
Egyéb Series Title: WWW '08
Citation Key: Huang:2008:HBS:1367497.1367703
DOI 10.1145/1367497.1367703
Kivonat In this paper, we describe an experimental search engine on our Chinese web archive since 2001. The original data set contains nearly 3 billion Chinese web pages crawled from past 5 years. From the collection, 430 million "article-like" pages are selected and then partitioned into 68 million sets of similar pages. The titles and publication dates are determined for the pages. An index is built. When searching, the system returns related pages in a chronological order. This way, if a user is interested in news reports or commentaries for certain previously happened event, he/she will be able to find a quite rich set of highly related pages in a convenient way.
Kiadvány címe Proceedings of the 17th International Conference on World Wide Web
Hozzáadás dátuma 2021. 08. 09. 8:43:19
Módosítás dátuma 2021. 08. 09. 8:43:19

Címkék:

  • web archive
  • text mining
  • replica detection

Hogyan tudjuk fejleszteni a webes gyűjteményünket? A Holland Nemzeti Könyvtár webarchiválási tevékenységének értékelése (2007–2017)

Típus Folyóiratcikk
Szerző Márton Németh
Recenzált mű szerzője Kees Teszelszky
Recenzált mű szerzője Barbara Shierman
URL http://epa.oszk.hu/00100/00143/00354/pdf/EPA00143_konyvtari_figyelo_2018_04_623-672.pdf
Kötet 64
Szám 4
Oldalszám 434
Kiadvány Könyvtári figyelő : külföldi lapszemle
ISSN 0023-3773
Dátum 2018
Egyéb Number: 4
Reviewed Authors: _:n7759
Folyóirat rövid neve KF
Hozzáférés 2021. 08. 04. 2:00:00
Nyelv magyar
Hozzáadás dátuma 2021. 08. 09. 8:44:42
Módosítás dátuma 2021. 08. 09. 8:44:42

How Can We Be Ready to Study History in the Age of Abundance? A Response

Típus Folyóiratcikk
Szerző Ian Milligan
URL https://doi.org/10.1093/ahr/rhaa478
Kötet 125
Szám 4
Oldalszám 1347-1349
Kiadvány The American Historical Review
ISSN 0002-8762
Dátum October 21, 2020
Egyéb Number: 4
Folyóirat rövid neve The American Historical Review
DOI 10.1093/ahr/rhaa478
Hozzáférés 2021. 07. 15. 11:26:19
Könyvtár Katalógus Silverchair
Kivonat Ian Milligan’s History in the Age of Abundance? How the Web Is Transforming Historical Research (2019) presents and interrogates the challenges and opportunities that born-digital materials have for historians. Milligan argues that historians who wish to grapple with the archived internet need to think much more aggressively about engaging with digital methods and tools that can complement and extend the well-honed practices of close reading with approaches that can help analyze the vast and often unstructured archives of internet data. In this AHR Review Roundtable, three historians—Jo Guldi, Tim Hitchcock, and Michelle Moravec, all of whom incorporate digital approaches and concerns into their work—engage with a set of questions developed by Digital Scholarship Librarian Daniel J. Story, to discuss Milligan’s treatment of the digital archive of the web and its implications for historians’ work. Milligan offers a response to these insights and critiques, emphasizing the need for the historical discipline to change from within and build upon its valuable qualities.
Rövid cím How Can We Be Ready to Study History in the Age of Abundance?
Hozzáadás dátuma 2021. 08. 09. 8:44:25
Módosítás dátuma 2021. 08. 09. 8:44:25

How can we improve our web collection? An evaluation of webarchiving at the KB National Library of the Netherlands (2007-2017)

Típus Folyóiratcikk
Szerző Barbara Sierman
Szerző Kees Teszelszky
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 27
Szám 2
Oldalszám 94-106
Kiadvány Alexandria
Dátum 2017
Egyéb Number: 2
Place: Europe, Europe
DOI 10.1177/0955749017725930
Kivonat The Koninklijke Bibliotheek, the Dutch National Library (KB-NL), started in 2007 the project “web archiving” based on a selection of Dutch websites. The initial selection of 1,000 websites has currently grown into over 12,000 selected web sites, crawled on different intervals. Although due to legal restrictions the current use is limited to the KB-NL reading room, it is important that the KB-NL includes the requirements of the (future) users in her approach of creating a web collection. With respect to the long term preservation of the collection, we also need to incorporate the requirements for long term archiving in our approach, as described in the Open Archival Information Model (OAIS)1. This article describes the results of a research project on web archiving and the web collection of archived sites in the KB-NL, investigating the following questions. What is web archiving in the Netherlands? What are the selection criteria of KB-NL and how are these related to what can be found on the Dutch web by the contemporary user? What is the influence of the choice of tools we use to harvest on the final archived website? Do we know enough of the value of the web collection and the potential usage of it by researchers and how can we improve this value? This article will describe the outcomes of the research, the conclusions and advice that can be drawn from it and will hopefully inspire broader discussions about the essence of creating web collections for long term preservation as part of cultural heritage.
Hozzáadás dátuma 2021. 08. 09. 8:42:08
Módosítás dátuma 2021. 08. 09. 8:42:08

Címkék:

  • web archiving
  • digital preservation
  • KB National Library of the Netherlands
  • OAIS

How it Happened

Típus Dolgozat
Szerző Omar Alonso
Szerző Vasileios Kandylas
Szerző Serge-Eric Tremblay
URL http://dl.acm.org/citation.cfm?doid=3197026.3197034
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 193-202
ISBN 978-1-4503-5178-2
Dátum 2018
DOI 10.1145/3197026.3197034
Kivonat Social networks like Twitter and Facebook are the largest sources of public opinion and real-time information on the Internet. If an event is of general interest, news articles follow and eventually a Wikipedia page. We propose the problem of automatic event story generation and archiving by combining social and news data to construct a new type of document in the form of a Wiki-like page structure. We introduce a technique that shows the evolution of a story as perceived by the crowd in social media, along with editorially authored articles annotated with examples of social media as supporting evidence. At the core of our research, is the temporally sensitive extraction of data that serve as context for retrieval purposes. Our approach includes a fine-grained vote counting strategy that is used for weighting purposes, pseudo-relevance feedback and query expansion with social data and web query logs along with a timeline algorithm as the base for a story. We demonstrate the effectiveness of our approach by processing a dataset comprising millions of English language tweets generated over a one year period and present a full implementation of our system.
Kiadvány címe Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries – JCDL '18
Hozzáadás dátuma 2021. 08. 09. 8:42:40
Módosítás dátuma 2021. 08. 09. 8:42:40

How Much of the Web is Archived?

Típus Dolgozat
Szerző Scott G Ainsworth
Szerző Ahmed Alsum
Szerző Hany SalahEldeen
Szerző Michele C Weigle
Szerző Michael L Nelson
URL http://doi.acm.org/10.1145/1998076.1998100
Hely New York, NY, USA
Kiadó ACM
Oldalszám 133-136
ISBN 978-1-4503-0744-4
Dátum 2011
Egyéb Series Title: JCDL '11
Citation Key: Ainsworth:2011:MWA:1998076.1998100
DOI 10.1145/1998076.1998100
Kivonat The Memento Project's archive access additions to HTTP have enabled development of new web archive access user interfaces. After experiencing this web time travel, the in- evitable question that comes to mind is "How much of the Web is archived?" This question is studied by approximating the Web via sampling URIs from DMOZ, Delicious, Bitly, and search engine indexes and measuring number of archive copies available in various public web archives. The results indicate that 35%-90% of URIs have at least one archived copy, 17%-49% have two to five copies, 1%-8% have six to ten copies, and 8%-63% at least ten copies. The number of URI copies varies as a function of time, but only 14.6-31.3% of URIs are archived more than once per month.
Kiadvány címe Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:17
Módosítás dátuma 2021. 08. 09. 8:43:17

Címkék:

  • web archiving
  • digital preservation
  • HTTP
  • web architecture
  • resource versioning
  • temporal applications

How Perceptions of Web Resource Boundaries Differ for Institutional and Personal Archives

Típus Könyv
Szerző Faryaneh Poursardar
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kiadó IEEE
ISBN 978-1-5386-2659-7
Dátum 2018
Egyéb Publication Title: 2018 IEEE International Conference on Information Reuse and Integration (IRI), Information Reuse and Integration (IRI), 2018 IEEE International Conference on, IRI
DOI: 10.1109/IRI.2018.00026
Kivonat What is and is not part of a web resource does not have a simple answer. Exploration of web resource boundaries have shown that people's assessments of resource bounds rely on understanding relationships between content fragments on the same web page and between content fragments on different web pages. This study explores whether such perceptions change based on whether the archive is for personal use or is institutional in nature. This survey explores user expectations when accessing archived web resources. Participants in the study were asked to assume they are making use of an archive provided by an institution tasked with preserving online resources, such as a digital archive that is part of the Library of Congress. Groups of pair web pages presented to the participants. Each group has a primary web page that is the resource being saved by the institutional archive. Each group has several subsequent parts or pages, which we will ask about. Consistent with our previous study on personal archiving, the primary-page content in the study comes from multi-page stories, multi-image collections, product pages with reviews and ratings on separate pages, and short single page writings. Participants were asked to assume the institutional archive wants to preserve the primary page and then answer what else they would expect to be saved along with the primary page. The results show that there are similar expectations for preserving continuations of the main content in personal and institutional archiving scenarios, institutional archives are more likely to be expected to preserve the context of the main content, such as additional linked content, advertisements, and author information.
Terjedelem 126
Hozzáadás dátuma 2021. 08. 09. 8:42:09
Módosítás dátuma 2021. 08. 09. 8:42:09

Címkék:

  • web archiving
  • digital preservation
  • Communication
  • Computer science
  • Computing and Processing
  • Conferences
  • Data science
  • General Topics for Engineers
  • Image sequences
  • Institutional archiving
  • Networking and Broadcast Technologies
  • personal archiving
  • Robotics and Control Systems
  • Signal Processing and Analysis
  • Task analysis
  • Uniform resource locators
  • user study
  • Web pages

How to Assess the Exhaustiveness of Longitudinal Web Archives: A Case Study of the German Academic Web

Típus Dolgozat
Szerző Michael Paris
Szerző Robert Jäschke
URL https://doi.org/10.1145/3372923.3404836
Sorozat HT '20
Hely New York, NY, USA
Kiadó Association for Computing Machinery
Oldalszám 85–89
ISBN 978-1-4503-7098-1
Dátum July 13, 2020
DOI 10.1145/3372923.3404836
Hozzáférés 2021. 07. 15. 2:00:00
Könyvtár Katalógus ACM Digital Library
Kivonat Longitudinal web archives can be a foundation for investigating structural and content-based research questions. One prerequisite is that they contain a faithful representation of the relevant subset of the web. Therefore, an assessment of the authority of a given dataset with respect to a research question should precede the actual investigation. Next to proper creation and curation, this requires measures for estimating the potential of a longitudinal web archive to yield information about the central objects the research question aims to investigate. In particular, content-based research questions often lack the ab-initio confidence about the integrity of the data. In this paper we focus on one specifically important aspect, namely the exhaustiveness of the dataset with respect to the central objects. Therefore, we investigate the recall coverage of researcher names in a longitudinal academic web crawl over a seven year period and the influence of our crawl method on the dataset integrity. Additionally, we propose a method to estimate the amount of missing information as a means to describe the exhaustiveness of the crawl and motivate a use case for the presented corpus.
Kiadvány címe Proceedings of the 31st ACM Conference on Hypertext and Social Media
Rövid cím How to Assess the Exhaustiveness of Longitudinal Web Archives
Hozzáadás dátuma 2021. 08. 09. 8:44:09
Módosítás dátuma 2021. 08. 09. 8:44:09

Címkék:

  • web archive
  • dataset
  • exhaustive
  • focused web crawl
  • longitudinal

How to catalogue a web archive?

Típus Előadás
Előadó Márton Németh
URL http://mekosztaly.oszk.hu/mia/doc/How_to_catalogue_a_web_archive.pptx
Hely Bratislava
Dátum 2018
Egyéb Presenters: _:n5863
Találkozó neve Information Interactions 2018 workshop
Nyelv English
Hozzáadás dátuma 2021. 08. 09. 8:43:46
Módosítás dátuma 2021. 08. 09. 8:43:46

How to Catch a Digital Speed Goat: A Web Archiving Case Study at the University of Wyoming

Típus Folyóiratcikk
Szerző Sara Davis
Szerző Rachel Gattermeyer
URL https://digitalcommons.kennesaw.edu/provenance/vol37/iss1/4
Kötet 37
Szám 1
Kiadvány Provenance, Journal of the Society of Georgia Archivists
ISSN 0739-4241
Dátum 2021-01-01
Egyéb Number: 1
Rövid cím How to Catch a Digital Speed Goat
Hozzáadás dátuma 2021. 08. 09. 8:43:55
Módosítás dátuma 2021. 08. 09. 8:43:55

How to Choose a Digital Preservation Strategy: Evaluating a Preservation Planning Procedure

Típus Dolgozat
Szerző Stephan Strodl
Szerző Christoph Becker
Szerző Robert Neumayer
Szerző Andreas Rauber
URL http://doi.acm.org/10.1145/1255175.1255181
Hely New York, NY, USA
Kiadó ACM
Oldalszám 29-38
ISBN 978-1-59593-644-8
Dátum 2007
Egyéb Series Title: JCDL '07
Citation Key: Strodl:2007:CDP:1255175.1255181
DOI 10.1145/1255175.1255181
Kivonat An increasing number of institutions throughout the world face legal obligations or business needs to collect and preserve digital objects over several decades. A range of tools exists today to support the variety of preservation strategies such as migration or emulation. Yet, different preservation requirements across institutions and settings make the decision on which solution to implement very diffcult. This paper presents the PLANETS Preservation Planning approach. It provides an approved way to make informed and accountable decisions on which solution to implement in order to optimally preserve digital objects for a given purpose. It is based on Utility Analysis to evaluate the performance of various solutions against well-defined requirements and goals. The viability of this approach is shown in a range of case studies for different settings. We present its application to two scenarios of web archives, two collections of electronic publications, and a collection of multimedia art. This work focuses on the different requirements and goals in the various preservation settings.
Kiadvány címe Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:37
Módosítás dátuma 2021. 08. 09. 8:43:37

Címkék:

  • digital preservation
  • evaluation
  • digital libraries
  • OAIS model
  • preservation planning
  • utility analysis

HOW TO DIG INTO THE HISTORY OF A NATION'S WEB? THE DEVELOPMENT OF THE DANISH WEB 2006-2015

Típus Előadás
Előadó Niels Brügger
Hely online webinar
Dátum 2018
Egyéb Presenters: _:n5844
Találkozó neve IIPC RWG webinar series
Nyelv en
Hozzáadás dátuma 2021. 08. 09. 8:43:45
Módosítás dátuma 2021. 08. 09. 8:43:45

How to Search the Internet Archive Without Indexing It

Típus Könyvfejezet
Szerző Nattiya Kanhabua
Szerző Philipp Kemkes
Szerző Wolfgang Nejdl
Szerző Tu Ngoc Nguyen
Szerző Felipe Reis
Szerző Nam Khanh Tran
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Oldalszám 147-160
Dátum 2016-01
Egyéb DOI: 10.1007/978-3-319-43997-6_12
ISSN: 9783319439969
Kivonat Significant parts of cultural heritage are produced on the web during the last decades. While easy accessibility to the current web is a good baseline, optimal access to the past web faces several challenges. This includes dealing with large-scale web archive collections and lacking of usage logs that contain implicit human feedback most relevant for today’s web search. In this paper, we propose an entity-oriented search system to support retrieval and analytics on the Internet Archive. We use Bing to retrieve a ranked list of results from the current web. In addition, we link retrieved results to the WayBack Machine; thus allowing keyword search on the Internet Archive without processing and indexing its raw archived content. Our search system complements existing web archive search tools through a user-friendly interface, which comes close to the functionalities of modern web search engines (e.g., keyword search, query auto-completion and related query suggestion), and provides a great benefit of taking user feedback on the current web into account also for web archive search. Through extensive experiments, we conduct quantitative and qualitative analyses in order to provide insights that enable further research on and practical applications of web archives
Könyv címe Research & Advanced Technology for Digital Libraries: 20th International Conference on Theory & Practice of Digital Libraries, TPDL 2016, Hannover, Germany, September 5-9, 2016, Proceedings
Hozzáadás dátuma 2021. 08. 09. 8:41:52
Módosítás dátuma 2021. 08. 09. 8:41:52

How Well Are Arabic Websites Archived?

Típus Dolgozat
Szerző Lulwah M Alkwai
Szerző Michael L Nelson
Szerző Michele C Weigle
URL http://doi.acm.org/10.1145/2756406.2756912
Hely New York, NY, USA
Kiadó ACM
Oldalszám 223-232
ISBN 978-1-4503-3594-2
Dátum 2015
Egyéb Series Title: JCDL '15
Citation Key: Alkwai:2015:WAW:2756406.2756912
DOI 10.1145/2756406.2756912
Kivonat t is has long been anecdotally known that web archives and search engines favor Western and English-language sites. In this paper we quantitatively explore how well indexed and archived are Arabic language web sites. We began by sam- pling 15,092 unique URIs from three different website direc- tories: DMOZ (multi-lingual), Raddadi and Star28 (both primarily Arabic language). Using language identification tools we eliminated pages not in the Arabic language (e.g., English language versions of Al-Jazeera sites) and culled the collection to 7,976 definitely Arabic language web pages. We then used these 7,976 pages and crawled the live web and web archives to produce a collection of 300,646 Arabic lan- guage pages. We discovered: 1) 46% are not archived and 31% are not indexed by Google ( www.google.com ), 2) only 14.84% of the URIs had an Arabic country code top-level domain (e.g., .sa ) and only 10.53% had a GeoIP in an Ara- bic country, 3) having either only an Arabic GeoIP or only an Arabic top-level domain appears to negatively impact archiving, 4) most of the archived pages are near the top level of the site and deeper links into the site are not well- archived, 5) the presence in a directory positively impacts indexing and presence in the DMOZ directory, specifically, positively impacts archiving.
Kiadvány címe Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:20
Módosítás dátuma 2021. 08. 09. 8:43:20

Címkék:

  • web archiving
  • digital preservation
  • Arabic web
  • indexing
  • Design
  • Experimentation
  • Measuremen

Hungarian web archiving pilot project in the National Széchényi Library

Típus Dolgozat
Szerző Marton Nemeth
Szerző Laszlo Drotos
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kiadó IEEE
Oldalszám 000209-000212
ISBN 978-1-5386-1264-4
Dátum 2017-09
DOI 10.1109/CogInfoCom.2017.8268244
Kivonat This demo paper introduces the web archiving pilot project in the Hungarian National Széchényi Library. Basic conception and goals are being described.
Kiadvány címe 2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)
Hozzáadás dátuma 2021. 08. 09. 8:42:06
Módosítás dátuma 2021. 08. 09. 8:42:06

Címkék:

  • Collaboration
  • web archiving
  • Communication
  • Conferences
  • General Topics for Engineers
  • Networking and Broadcast Technologies
  • Robotics and Control Systems
  • Internet
  • Web sites
  • Libraries
  • National Széchényi Library
  • pilot project
  • Software
  • Terrorism

Hypertext and "Twitterature".

Típus Folyóiratcikk
Szerző Massimo Lollini
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Oldalszám 1
Kiadvány Profession
ISSN 07406959
Dátum 2018-03-22
Kivonat The article offer information on the Oregon Petrarch Open Book (OPOB), a database-driven hypertext version of the poetry collection "Rerum vulgarium fragmentata" (Rvf) by Francesco Petrarca. Topics discussed include the use of features of Web archive and hypertext for the creation of the database; the use of technology in teaching Petrarchism; and the archive of separate editions of Rvf in the database.
Hozzáadás dátuma 2021. 08. 09. 8:42:10
Módosítás dátuma 2021. 08. 09. 8:42:10

Címkék:

  • WEB archiving
  • DIGITAL libraries
  • 1304-1374
  • EDITIONS
  • Francesco
  • HYPERTEXT systems
  • PETRARCA
  • PETRARCHISM
  • POETRY collections

iCrawl: Improving the Freshness of Web Collections by Integrating Social Web and Focused Web Crawling

Típus Dolgozat
Szerző Gerhard Gossen
Szerző Elena Demidova
Szerző Thomas Risse
URL http://doi.acm.org/10.1145/2756406.2756925
Hely New York, NY, USA
Kiadó ACM
Oldalszám 75-84
ISBN 978-1-4503-3594-2
Dátum 2015
Egyéb Series Title: JCDL '15
Citation Key: Gossen:2015:IIF:2756406.2756925
DOI 10.1145/2756406.2756925
Kivonat Researchers in the Digital Humanities and journalists need to monitor, collect and analyze fresh online content regarding current events such as the Ebola outbreak or the Ukraine crisis on demand. However, existing focused crawling approaches only consider topical aspects while ignoring temporal aspects and therefore cannot achieve thematically coherent and fresh Web collections. Especially Social Media provide a rich source of fresh content, which is not used by state-of-the-art focused crawlers. In this paper we address the issues of enabling the collection of fresh and relevant Web and Social Web content for a topic of interest through seamless integration of Web and Social Media in a novel integrated focused crawler. The crawler collects Web and Social Media content in a single system and exploits the stream of fresh Social Media content for guiding the crawler.
Kiadvány címe Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:22
Módosítás dátuma 2021. 08. 09. 8:43:22

Címkék:

  • web archives
  • social media
  • focused crawling
  • web crawling

Identifying Documents In-Scope of a Collection from Web Archives

Típus Dolgozat
Szerző Krutarth Patel
Szerző Cornelia Caragea
Szerző Mark E. Phillips
Szerző Nathaniel T. Fox
URL https://doi.org/10.1145/3383583.3398540
Sorozat JCDL '20
Hely New York, NY, USA
Kiadó Association for Computing Machinery
Oldalszám 167–176
ISBN 978-1-4503-7585-6
Dátum August 1, 2020
DOI 10.1145/3383583.3398540
Hozzáférés 2021. 07. 15. 2:00:00
Könyvtár Katalógus ACM Digital Library
Kivonat Web archive data usually contains high-quality documents that are very useful for creating specialized collections of documents, e.g., scientific digital libraries and repositories of technical reports. In doing so, there is a substantial need for automatic approaches that can distinguish the documents of interest for a collection out of the huge number of documents collected by web archiving institutions. In this paper, we explore different learning models and feature representations to determine the best performing ones for identifying the documents of interest from the web archived data. Specifically, we study both machine learning and deep learning models and "bag of words" (BoW) features extracted from the entire document or from specific portions of the document, as well as structural features that capture the structure of documents. We focus our evaluation on three datasets that we created from three different Web archives. Our experimental results show that the BoW classifiers that focus only on specific portions of the documents (rather than the full text) outperform all compared methods on all three datasets.
Kiadvány címe Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020
Hozzáadás dátuma 2021. 08. 09. 8:44:03
Módosítás dátuma 2021. 08. 09. 8:44:03

Címkék:

  • web archiving
  • digital libraries
  • text classification

iEcology: Harnessing Large Online Resources to Generate Ecological Insights

Típus Folyóiratcikk
Szerző Ivan Jarić
Szerző Ricardo A. Correia
Szerző Barry W. Brook
Szerző Jessie C. Buettel
Szerző Franck Courchamp
Szerző Enrico Di Minin
Szerző Josh A. Firth
Szerző Kevin J. Gaston
Szerző Paul Jepson
Szerző Gregor Kalinkat
Szerző Richard Ladle
Szerző Andrea Soriano-Redondo
Szerző Allan T. Souza
Szerző Uri Roll
URL http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=143557824&lang=hu&site=ehost-live
Kötet 35
Szám 7
Oldalszám 630-639
Kiadvány Trends in Ecology & Evolution
ISSN 01695347
Dátum July 2020
Egyéb Number: 7
Folyóirat rövid neve Trends in Ecology & Evolution
DOI 10.1016/j.tree.2020.03.003
Hozzáférés 2021. 07. 16. 11:17:26
Könyvtár Katalógus EBSCOhost
Kivonat Digital data are accumulating at unprecedented rates. These contain a lot of information about the natural world, some of which can be used to answer key ecological questions. Here, we introduce iEcology (i.e., internet ecology), an emerging research approach that uses diverse online data sources and methods to generate insights about species distribution over space and time, interactions and dynamics of organisms and their environment, and anthropogenic impacts. We review iEcology data sources and methods, and provide examples of potential research applications. We also outline approaches to reduce potential biases and improve reliability and applicability. As technologies and expertise improve, and costs diminish, iEcology will become an increasingly important means to gain novel insights into the natural world. iEcology is a new research approach that seeks to quantify patterns and processes in the natural world using data accumulated in digital sources collected for other purposes. iEcology studies have provided new insights into species occurrences, traits, phenology, functional roles, behavior, and abiotic environmental features. iEcology is expanding, and will be able to provide valuable support for ongoing research efforts, as comparatively low-cost research based on freely available data. We expect that iEcology will experience rapid development over coming years and become one of the major research approaches in ecology, enhanced by emerging technologies such as automated content analysis, apps, internet of things, ecoacoustics, web scraping, and open source hardware.
Rövid cím iEcology
Hozzáadás dátuma 2021. 08. 09. 8:44:38
Módosítás dátuma 2021. 08. 09. 8:44:38

Címkék:

  • TECHNOLOGICAL innovations
  • social media
  • digital data
  • internet
  • data mining
  • ARCHITECTURAL acoustics
  • biodiversity
  • biogeography
  • CONTENT analysis
  • culturomics
  • INTERNET of things
  • INTERNET privacy
  • PHENOLOGY
  • SPECIES distribution

If these crawls could talk: Studying and documenting web archives provenance

Típus Folyóiratcikk
Szerző Emily Maemura
Szerző Nicholas Worby
Szerző Ian Milligan
Szerző Christoph Becker
URL http://doi.wiley.com/10.1002/asi.24048
Kötet 69
Szám 10
Oldalszám 1223-1233
Kiadvány Journal of the Association for Information Science and Technology
ISSN 23301635
Dátum 2018-10
Egyéb Number: 10
DOI 10.1002/asi.24048
Kivonat The increasing use and prominence of web archives raises the urgency of establishing mechanisms for transparency in the making of web archives to facilitate the process of evaluating a web archive’s provenance, scoping, and absences. Some choices and process events are captured automatically, but their interactions are not currently well understood or documented. This study examined the decision space of web archives and its role in shaping what is and what is not captured in the web archiving process. By comparing how three different web archives collections were created and documented, we investigate how curatorial decisions interact with technical and external factors and we compare commonalities and differences. The findings reveal the need to understand both the social and technical context that shapes those decisions and the ways in which these individual decisions interact. Based on the study, we propose a framework for documenting key dimensions of a collection that addresses the situated nature of the organizational context, technical specificities, and unique characteristics of web materials that are the focus of a collection. The framework enables future researchers to undertake empirical work studying the process of creating web archives collections in different contexts.
Hozzáadás dátuma 2021. 08. 09. 8:43:26
Módosítás dátuma 2021. 08. 09. 8:43:26

IFLA könyvtári referenciamodell

Típus Folyóiratcikk
Szerző Pat Riva
Szerző Patrick Le Boeuf
Szerző Maja Žumer
Szerző Egységesítési Szerkesztőbizottsága
Oldalszám 99
Dátum 2017
Könyvtár Katalógus Zotero
Nyelv hu
Hozzáadás dátuma 2021. 08. 09. 8:43:49
Módosítás dátuma 2021. 08. 09. 8:43:49

IIPC Content Development Working Group

Típus Blogbejegyzés
URL http://netpreserve.org/about-us/working-groups/content-development-working-group/
Dátum 2019
Hozzáférés 2020. 08. 17. 18:02:07
Nyelv en-GB
Kivonat … Read More
Blog címe IIPC
Hozzáadás dátuma 2021. 08. 09. 8:43:46
Módosítás dátuma 2021. 08. 09. 8:43:46

IIPC portal

Típus Weboldal
Szerző IIPC
Dátum 2019
Hozzáférés 2019. 01. 28. 1:00:00
Hozzáadás dátuma 2021. 08. 09. 8:43:42
Módosítás dátuma 2021. 08. 09. 8:43:42

IIPC Training Working Group survey

Típus Weboldal
Szerző IIPC TWG
URL https://www.surveymonkey.com/r/V7MVXXW
Dátum 2017
Hozzáférés 2018. 06. 12. 2:00:00
Website címe IIPC Training Working Group Survey
Hozzáadás dátuma 2021. 08. 09. 8:43:42
Módosítás dátuma 2021. 08. 09. 8:43:42

Image Analytics in Web Archives

Típus Könyvfejezet
Szerző Eric Müller-Budack
Szerző Kader Pustu-Iren
Szerző Sebastian Diering
Szerző Matthias Springstein
Szerző Ralph Ewerth
Szerkesztő Daniel Gomes
Szerkesztő Elena Demidova
Szerkesztő Jane Winters
Szerkesztő Thomas Risse
URL https://doi.org/10.1007/978-3-030-63291-5_11
Hely Cham
Kiadó Springer International Publishing
Oldalszám 141-151
ISBN 978-3-030-63291-5
Dátum 2021
Egyéb DOI: 10.1007/978-3-030-63291-5_11
Hozzáférés 2021. 07. 15. 9:52:26
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat The multimedia content published on the World Wide Web is constantly growing and contains valuable information in various domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties, but unfortunately, they are rarely provided with appropriate metadata. This lack of structured data limits the exploration of the archives, and automated solutions are required to enable semantic search. While many approaches exploit the textual content of news in the Internet Archive to detect named entities and their relations, visual information is generally disregarded. In this chapter, we present an approach that leverages deep learning techniques for the identification of public personalities in the images of news articles stored in the Internet Archive. In addition, we elaborate on how this approach can be extended to enable detection of other entity types such as locations or events. The approach complements named entity recognition and linking tools for text and allows researchers and analysts to track the media coverage and relations of persons more precisely. We have analysed more than one million images from news articles in the Internet Archive and demonstrated the feasibility of the approach with two use cases in different domains: politics and entertainment.
Könyv címe The Past Web: Exploring Web Archives
Hozzáadás dátuma 2021. 08. 09. 8:43:59
Módosítás dátuma 2021. 08. 09. 8:43:59

Impact of URI Canonicalization on Memento Count

Típus Dolgozat
Szerző Matt Kelly
Szerző Lulwah M. Alkwai
Szerző Sawood Alam
Szerző Herbert Van de Sompel
Szerző Michael L. Nelson
Szerző Michele C. Weigle
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Hely United States, North America
Kiadó IEEE
Oldalszám 1-2
ISBN 978-1-5386-3861-3
Dátum 2017-06
DOI 10.1109/JCDL.2017.7991601
Kivonat Quantifying the captures of a URI over time is useful for researchers to identify the extent to which a Web page has been archived. Memento TimeMaps provide a format to list mementos (URI-Ms) for captures along with brief metadata, like Memento-Datetime, for each URI-M. However, when some URI-Ms are dereferenced, they simply provide a redirect to a different URI-M (instead of a unique representation at the datetime), often also present in the TimeMap. This infers that confidently obtaining an accurate count quantifying the number of non-forwarding captures for a URI-R is not possible using a TimeMap alone and that the magnitude of a TimeMap is not equivalent to the number of representations it identifies. In this work we discuss this particular phenomena in depth. We also perform a breakdown of the dynamics of counting mementos for a particular URI-R (google.com) and quantify the prevalence of the various canonicalization patterns that exacerbate attempts at counting using only a TimeMap. For google.com we found that 84.9% of the URI-Ms result in an HTTP redirect when dereferenced. We expand on and apply this metric to TimeMaps for seven other URI-Rs of large Web sites and thirteen academic institutions. Using a ratio metric DI for the number of URI-Ms without redirects to those requiring a redirect when dereferenced, five of the eight large web sites' and two of the thirteen academic institutions' TimeMaps had a ratio of ratio less than one, indicating that more than half of the URI-Ms in these TimeMaps result in redirects when dereferenced.
Kiadvány címe 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL)
Hozzáadás dátuma 2021. 08. 09. 8:42:07
Módosítás dátuma 2021. 08. 09. 8:42:07

Címkék:

  • Web archiving
  • Memento
  • Computer Sciences
  • HTTP
  • Canonicalization patterns
  • Data patterns
  • URI
  • URI-M
  • Web Archive
  • Redirection

Increasing Access to Web Archives: Archive-It and the Discovery Layer

Típus Folyóiratcikk
Szerző Christina A Beis
Szerző Kayla Harris
Szerző Stephanie Shreffler
Kötet 47
Szám 4
Kiadvány MAC Newsletter
Dátum 2020
Egyéb Number: 4
Könyvtár Katalógus Zotero
Nyelv en
Hozzáadás dátuma 2021. 08. 09. 8:44:07
Módosítás dátuma 2021. 08. 09. 8:44:07

Increasing Copyright Protection for Social Media Users by Expanding Social Media Platforms' Rights

Típus Folyóiratcikk
Szerző Ryan Wichtowski
URL http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=127588435&lang=hu&site=ehost-live
Kötet 16
Szám 1
Oldalszám 253-268
Kiadvány Duke Law & Technology Review
ISSN 23289600
Dátum January 2017
Egyéb Number: 1
Publisher: Duke University, School of Law
Folyóirat rövid neve Duke Law & Technology Review
Hozzáférés 2020. 08. 17. 9:32:41
Könyvtár Katalógus EBSCOhost
Kivonat Social media platforms allow users to share their creative works with the world. Users take great advantage of this functionality, as Facebook, Instagram, Flickr, Snapchat, and WhatsApp users alone uploaded 1.8 billion photos per day in 2014. Under the terms of service and terms of use agreements of most U.S. based social media platforms, users retain ownership of this content, since they only grant social media platforms nonexclusive licenses to their content. While nonexclusive licenses protect users vis-à-vis the social media platforms, these licenses preclude social media platforms from bringing copyright infringement claims on behalf of their users against infringers of user content under the Copyright Act of 1976. Since the average cost of litigating a copyright infringement case might be as high as two million dollars, the average social media user cannot protect his or her content against copyright infringers. To remedy this issue, Congress should amend 17 U.S.C. § 501 to allow social media platforms to bring copyright infringement claims against those who infringe their users' content. Through this amendment, Congress would create a new protection for social media users while ensuring that users retain ownership over the content they create.
Hozzáadás dátuma 2021. 08. 09. 8:43:43
Módosítás dátuma 2021. 08. 09. 8:43:43

Címkék:

  • COPYRIGHT — United States
  • COPYRIGHT lawsuits
  • INTERNET users
  • MINDEN Pictures Inc.
  • SOCIAL media
  • UNITED States. Copyrights (1976)

Index Maintenance for Time-travel Text Search

Típus Dolgozat
Szerző Avishek Anand
Szerző Srikanta Bedathur
Szerző Klaus Berberich
Szerző Ralf Schenkel
URL http://doi.acm.org/10.1145/2348283.2348318
Hely New York, NY, USA
Kiadó ACM
Oldalszám 235-244
ISBN 978-1-4503-1472-5
Dátum 2012
Egyéb Series Title: SIGIR '12
Citation Key: Anand:2012:IMT:2348283.2348318
DOI 10.1145/2348283.2348318
Kivonat Time-travel text search enriches standard text search by temporal predicates, so that users of web archives can easily retrieve document versions that are considered relevant to a given keyword query and existed during a given time interval. Different index structures have been proposed to efficiently support time-travel text search. None of them, however, can easily be updated as the Web evolves and new document versions are added to the web archive. In this work, we describe a novel index structure that efficiently supports time-travel text search and can be main- tained incrementally as new document versions are added to the web archive. Our solution uses a sharded index organiza- tion, bounds the number of spuriously read index entries per shard, and can be maintained using small in-memory buffers and append-only operations. We present experiments on two large-scale real-world datasets demonstrating that main- taining our novel index structure is an order of magnitude more efficient than periodically rebuilding one of the existing index structures, while query-processing performance is not adversely affected.
Kiadvány címe Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval
Hozzáadás dátuma 2021. 08. 09. 8:43:22
Módosítás dátuma 2021. 08. 09. 8:43:22

Címkék:

  • web archives
  • index maintenance
  • time-travel text search

Information Evolution in Wikipedia

Típus Dolgozat
Szerző Andrea Ceroni
Szerző Mihai Georgescu
Szerző Ujwal Gadiraju
Szerző Kaweh Djafari Naini
Szerző Marco Fisichella
URL http://doi.acm.org/10.1145/2641580.2641612
Hely New York, NY, USA
Kiadó ACM
Oldalszám 24:1–24:10
ISBN 978-1-4503-3016-9
Dátum 2014
Egyéb Series Title: OpenSym '14
Citation Key: Ceroni:2014:IEW:2641580.2641612
DOI 10.1145/2641580.2641612
Kivonat The Web of data is constantly evolving based on the dynamics of its content. Current Web search engine technologies consider static collections and do not factor in explicitly or implicitly available temporal information, that can be leveraged to gain insights into the dynamics of the data. In this paper, we hypothesize that by employing the temporal aspect as the primary means for capturing the evolution of entities, it is possible to provide entity-based accessibility to Web archives. We empirically show that the edit activity on Wikipedia can be exploited to provide evidence of the evolution of Wikipedia pages over time, both in terms of their content and in terms of their temporally defined relationships, classified in literature as events. Finally, we present results from our extensive analysis of a dataset consisting of 31,998 Wikipedia pages describing politicians, and observations from in-depth case studies. Our findings reflect the usefulness of leveraging temporal information in order to study the evolution of entities and breed promising grounds for further research.
Kiadvány címe Proceedings of The International Symposium on Open Collaboration
Hozzáadás dátuma 2021. 08. 09. 8:43:37
Módosítás dátuma 2021. 08. 09. 8:43:37

Címkék:

  • Wikipedia
  • Entity Evolution
  • Events
  • Temporal Information

Infrastructure for Supporting Exploration and Discovery in Web Archives

Típus Dolgozat
Szerző Jimmy Lin
Szerző Milad Gholami
Szerző Jinfeng Rao
URL http://doi.acm.org/10.1145/2567948.2579045
Hely New York, NY, USA
Kiadó ACM
Oldalszám 851-856
ISBN 978-1-4503-2745-9
Dátum 2014
Egyéb Series Title: WWW '14 Companion
Citation Key: Lin:2014:ISE:2567948.2579045
DOI 10.1145/2567948.2579045
Kivonat Web archiving initiatives around the world capture ephemeral web content to preserve our collective digital memory. However, unlocking the potential of web archives requires tools that support exploration and discovery of captured content. These tools need to be scalable and responsive, and to this end we believe that modern "big data" infrastructure can provide a solid foundation. We present Warcbase, an open-source platform for managing web archives built on the distributed datastore HBase. Our system provides a flexible data model for storing and managing raw content as well as metadata and extracted knowledge. Tight integration with Hadoop provides powerful tools for analytics and data processing. Relying on HBase for storage infrastructure simplifies the development of scalable and responsive applications. We describe a service that provides temporal browsing and an interactive visualization based on topic models that allows users to explore archived content.
Kiadvány címe Proceedings of the 23rd International Conference on World Wide Web
Hozzáadás dátuma 2021. 08. 09. 8:43:17
Módosítás dátuma 2021. 08. 09. 8:43:17

Címkék:

  • HBase
  • Hadoop

Innovation on the web: the end of the S-curve?

Típus Folyóiratcikk
Szerző Maria Priestley
Szerző T. J. Sluckin
Szerző Thanassis Tiropanis
URL https://doi.org/10.1080/24701475.2020.1747261
Kötet 4
Szám 4
Oldalszám 390-412
Kiadvány Internet Histories
ISSN 2470-1475
Dátum October 1, 2020
Egyéb Number: 4
Publisher: Routledge
_eprint: https://doi.org/10.1080/24701475.2020.1747261
DOI 10.1080/24701475.2020.1747261
Hozzáférés 2021. 07. 15. 11:45:07
Könyvtár Katalógus Taylor and Francis+NEJM
Kivonat Rigorous research into the historical past of Web technology-driven innovation becomes timely as technological growth and forecasting are attracting popular interest. Drawing on economic and management literature relating to the typical trends of technological innovation, we examine the long-term development of Web technology in a theoretically informed and empirical manner. An original longitudinal dataset of 20,493 Web-related US patents is used to trace the growth curve of Web technology between the years of 1990 through 2013. We find that the accumulation of corporate Web inventions followed an S-shaped curve which shifted to linear growth after year 2004. This transition is unusual in relation to the traditional S-curve model of technological development that typically approaches a limit. The point of inflection on the S-curve coincided reasonably closely with the timing of the dot-com crash in year 2000. Moreover, we find a complex bi-directional relationship between patenting rates in Web technology and movements in the NASDAQ composite stock index. The implications of these results are discussed in theoretical and practical terms for sustained technological growth. Specific recommendations for different stakeholders in commercial Web development are included.
Rövid cím Innovation on the web
Hozzáadás dátuma 2021. 08. 09. 8:44:29
Módosítás dátuma 2021. 08. 09. 8:44:29

Címkék:

  • empirical measurement
  • innovation
  • patents
  • technological revolutions
  • Web technology

Intelligent Crawling of Web Applications for Web Archiving

Típus Dolgozat
Szerző Muhammad Faheem
URL http://doi.acm.org/10.1145/2187980.2187996
Hely New York, NY, USA
Kiadó ACM
Oldalszám 127-132
ISBN 978-1-4503-1230-1
Dátum 2012
Egyéb Series Title: WWW '12 Companion
Citation Key: Faheem:2012:ICW:2187980.2187996
DOI 10.1145/2187980.2187996
Kivonat The steady growth of the World Wide Web raises challenges regarding the preservation of meaningful Web data. Tools used currently by Web archivists blindly crawl and store Web pages found while crawling, disregarding the kind of Web site currently accessed (which leads to suboptimal crawling strategies) and whatever structured content is contained in Web pages (which results in page-level archives whose content is hard to exploit). We focus in this PhD work on the crawling and archiving of publicly accessible Web applications, especially those of the social Web. A Web application is any application that uses Web standards such as HTML and HTTP to publish information on the Web, accessible by Web browsers. Examples include Web forums, social networks, geolocation services, etc. We claim that the best strategy to crawl these applications is to make the Web crawler aware of the kind of application currently processed, allowing it to refine the list of URLs to process, and to annotate the archive with information about the structure of crawled content. We add adaptive characteristics to an archival Web crawler: being able to identify when a Web page belongs to a given Web application and applying the appropriate crawling and content extraction methodology.
Kiadvány címe Proceedings of the 21st International Conference on World Wide Web
Hozzáadás dátuma 2021. 08. 09. 8:43:09
Módosítás dátuma 2021. 08. 09. 8:43:09

Címkék:

  • crawling
  • archiving
  • web application
  • extraction
  • xpath

Intelligent Event Focused Crawling

Típus Szakdolgozat
Szerző Mohamed Magdy Gharib Farag
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Dátum 2016
Kivonat There is need for an integrated event focused crawling system to collect Web data about key events. When an event occurs, many users try to locate the most up-to-date information about that event. Yet, there is little systematic collecting and archiving anywhere of information about events. We propose intelligent event focused crawling for automatic event tracking and archiving, as well as effective access. We extend the traditional focused (topical) crawling techniques in two directions, modeling and representing: events and webpage source importance.We developed an event model that can capture key event information (topical, spatial, and temporal). We incorporated that model into the focused crawler algorithm. For the focused crawler to leverage the event model in predicting a webpage's relevance, we developed a function that measures the similarity between two event representations, based on textual content.Although the textual content provides a rich set of features, we proposed an additional source of evidence that allows the focused crawler to better estimate the importance of a webpage by considering its website. We estimated webpage source importance by the ratio of number of relevant webpages to non-relevant webpages found during crawling a website. We combined the textual content information and source importance into a single relevance score.For the focused crawler to work well, it needs a diverse set of high quality seed URLs (URLs of relevant webpages that link to other relevant webpages). Although manual curation of seed URLs guarantees quality, it requires exhaustive manual labor. We proposed an automated approach for curating seed URLs using social media content. We leveraged the richness of social media content about events to extract URLs that can be used as seed URLs for further focused crawling.We evaluated our system through four series of experiments, using recent events: Orlando shooting, Ecuador earthquake, Panama papers, California shooting, Brussels attack, Paris attack, and Oregon shooting. In the first experiment series our proposed event model representation, used to predict webpage relevance, outperformed the topic-only approach, showing better results in precision, recall, and F1-score. In the second series, using harvest ratio to measure ability to collect relevant webpages, our event model-based focused crawler outperformed the state-of-the-art focused crawler (best-first search). The third series evaluated the effectiveness of our proposed webpage source importance for collecting more relevant webpages. The focused crawler with webpage source importance managed to collect roughly the same number of relevant webpages as the focused crawler without webpage source importance, but from a smaller set of sources. The fourth series provides guidance to archivists regarding the effectiveness of curating seed URLs from social media content (tweets) using different methods of selection.
Hozzáadás dátuma 2021. 08. 09. 8:42:07
Módosítás dátuma 2021. 08. 09. 8:42:07

Címkék:

  • Web Archiving
  • Digital Libraries
  • Event Modeling
  • Focused Crawling
  • Seed URLs Selection
  • Social Media Mining
  • Web Mining

International Initiatives and Advances in Brazil for Government Web Archiving

Típus Dolgozat
Szerző Jonas Ferrigolo Melo
Szerző Moisés Rockembach
Szerkesztő Edgar Bisset Álvarez
Sorozat Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
Hely Cham
Kiadó Springer International Publishing
Oldalszám 83-95
ISBN 978-3-030-77417-2
Dátum 2021
DOI 10.1007/978-3-030-77417-2_6
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat This study aimed to illustrate some government web archiving initiatives in several countries and stablish an overview of the Brazilian scenario with regard to the preservation of content published on government websites. In Brazil, although there is a robust set of laws that determine the State to manage, access and preserve its documents and information, there is still no policy for the preservation of web content. The result is the erasure and permanent loss of government information produced exclusively through websites. It is noticed that there are several government initiatives for web archiving around the world, which can be used as examples for the implementation of a Brazilian policy. It is concluded that the long-term maintenance of governmental information available on the web is fundamental for public debate and for monitoring governmental actions. To ensure the preservation of this content, the country must define its policy for the preservation of documents produced in a web environment.
Kiadvány címe Data and Information in Online Environments
Hozzáadás dátuma 2021. 08. 09. 8:43:53
Módosítás dátuma 2021. 08. 09. 8:43:53

Címkék:

  • Web archiving
  • Digital preservation
  • Websites
  • Government web archiving

International Internet Preservation Consortium (IIPC) repository

Típus Weboldal
URL https://digital.library.unt.edu/explore/partners/IIPC/
Dátum 2020
Hozzáférés 2020. 08. 20. 11:33:01
Nyelv en
Kivonat The mission of the IIPC is to acquire, preserve and make accessible knowledge and information from the Internet for future generations everywhere, promoting global exchange and international relations.
Website címe UNT Digital Library
Hozzáadás dátuma 2021. 08. 09. 8:43:51
Módosítás dátuma 2021. 08. 09. 8:43:51

Internet Archive joins history's great libraries

Típus Folyóiratcikk
Szerző Mick O'Leary
URL https://search.proquest.com/docview/214817883?accountid=27464
Kötet 20
Szám 10
Oldalszám 41
Kiadvány Information Today
Dátum 2003-11
Egyéb Number: 10
Publisher: Information Today, Inc.
Place: Medford
Nyelv English
Kivonat Brewster Kahle is a man of many roles: a famous Internet pioneer, a successful dot-com entrepreneur, a digital visionary, and a darned good librarian. Right now, he's best-known as the founder of Alexa and the WAIS system. However, with Kahle's creation of the Internet Archive (IA), the future may well ascribe greater importance to his work as a librarian. IA is the largest archival project in history. Kahle compares it – without presumption or exaggeration – to the ancient Library of Alexandria. It intends to do for the Internet what that great library did for antiquity: to capture and preserve the world's knowledge for everyone's benefit. IA has been hard at work for several years creating the largest database in the world. At first, it concentrated on preservation. Now, with that task well in hand, it's working on access tools for this unique information resource.
Hozzáadás dátuma 2021. 08. 09. 8:42:14
Módosítás dátuma 2021. 08. 09. 8:42:14

Címkék:

  • Digital libraries
  • Library And Information Sciences–Computer Applica
  • Online data bases
  • 9190:United States
  • 5240:Software & systems
  • 9120:Product specific
  • Software reviews
  • United States
  • US

Internet Archive, Reed Tech Agree

Típus Folyóiratcikk
Szerző Judy Duke
URL https://search.proquest.com/docview/1622279345?accountid=27464
Kötet 42
Szám 12
Oldalszám 6-7
Kiadvány Advanced Technology Libraries
ISSN 0044-636X, 0044-636X
Dátum 2013-12
Egyéb Number: 12
Publisher: Millwood Group Corp., Millwood NY
Nyelv English
Kivonat Internet Archive and Reed Technology and Information Services Inc., part of the LexisNexis family, have agreed to jointly market and sell Internet Archives Archive-It service and continue to support the growing community of organizations currently using the service. First launched at Internet Archive in early 2006, Archive-It has been providing a sophisticated and flexible solution to a broad range of organizations and institutions focused on creating and managing collections of Web content. Adapted from the source document.
Hozzáadás dátuma 2021. 08. 09. 8:43:04
Módosítás dátuma 2021. 08. 09. 8:43:04

Címkék:

  • Collaboration
  • Web archiving
  • Marketing
  • article
  • 13.1: INFORMATION STORAGE AND RETRIEVAL – ECONOMIC
  • Information industry

"Internet Archive": la conservación de lo efímero TT – "Internet Archive": the conservation of the ephemeral

Típus Folyóiratcikk
Szerző Ana Mayagoitia
Szerző Juan Manuel González Aguilar
URL https://search.proquest.com/docview/2050416699?accountid=27464
Kötet 40
Oldalszám 157-167
Kiadvány Documentación de las Ciencias de la Información
ISSN 0210-4210
Dátum 2017
Egyéb Publisher: Universidad Complutense de Madrid
Place: Madrid
DOI http://dx.doi.org/10.5209/DCIN.57196
Nyelv Spanish
Kivonat The ephemeral tends to be discarded, finding little room in traditional museums or archives. The emergence of digital archives and the acceptance of a sector in academia have helped to slowly modify the perception of ephemeral content. This article aims to analyze the evolution of the Internet Archive, a digital repository specialized in the compilation and conservation of ephemeral media. To conclude, a reflection is made about the future of digital preservation and the possibility of creating similar digital archives in Spanish-speaking countries.
Hozzáadás dátuma 2021. 08. 09. 8:42:26
Módosítás dátuma 2021. 08. 09. 8:42:26

Címkék:

  • Web archiving
  • Archives
  • Digital preservation
  • Digital archives
  • Internet
  • Journalism
  • Digital archive
  • Ephemeral patrimony
  • Internet archive
  • Museums
  • Public domain

Internet histories: the view from the design process

Típus Folyóiratcikk
Szerző Sandra Braman
URL http://www.tandfonline.com/doi/abs/10.1080/24701475.2017.1305716
Kötet 1
Szám 1-2
Oldalszám 70-78
Kiadvány Internet Histories
ISSN 2470-1475
Dátum 2017-01-02
Egyéb Number: 1-2
Publisher: Routledge
DOI 10.1080/24701475.2017.1305716
Kivonat The electrical engineers and computer scientists who have designed the Internet are among those who have written Internet history. They have done so within the technical document series created to provide a medium for and record of the design process, the Internet Requests for Comments (RFCs) as well as in other venues. Internet designers have explicitly written the network's history in documents explicitly devoted to history as well as indirectly in documents focused on technical matters. The Internet RFCs also provide data for research on Internet history and on large-scale sociotechnical infrastructure written by outsiders to the design process. Incorporating the history of the Internet as understood by those responsible for its design, whether in their own words or by treating the design conversation as data, makes visible some elements of that history not otherwise available, corrects misperceptions of factors underlying some of its features, and provides fascinating details on the people and events involved that are of interest to those seeking to understand the Internet. Within the RFCs, history has served both technical and social functions.
Hozzáadás dátuma 2021. 08. 09. 8:41:46
Módosítás dátuma 2021. 08. 09. 8:41:46

Interoperability for Accessing Versions of Web Resources with the Memento Protocol

Típus Könyvfejezet
Szerző Shawn M. Jones
Szerző Martin Klein
Szerző Herbert Van de Sompel
Szerző Michael L. Nelson
Szerző Michele C. Weigle
Szerkesztő Daniel Gomes
Szerkesztő Elena Demidova
Szerkesztő Jane Winters
Szerkesztő Thomas Risse
URL https://doi.org/10.1007/978-3-030-63291-5_9
Hely Cham
Kiadó Springer International Publishing
Oldalszám 101-126
ISBN 978-3-030-63291-5
Dátum 2021
Egyéb DOI: 10.1007/978-3-030-63291-5_9
Hozzáférés 2021. 07. 15. 9:52:26
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat The Internet Archive pioneered web archiving and remains the largest publicly accessible web archive hosting archived copies of web pages (Mementos) going back as far as early 1996. Its holdings have grown steadily since, and it hosts more than 881 billion URIs as of September 2019. However, the landscape of web archiving has changed significantly over the last two decades. Today we can freely access Mementos from more than 20 web archives around the world, operated by for-profit and nonprofit organisations, national libraries and academic institutions, as well as individuals. The resulting diversity improves the odds of the survival of archived records but also requires technical standards to ensure interoperability between archival systems. To date, the Memento Protocol and the WARC file format are the main enablers of interoperability between web archives. We describe a variety of tools and services that leverage the broad adoption of the Memento Protocol and discuss a selection of research efforts that would likely not have been possible without these interoperability standards. In addition, we outline examples of technical specifications that build on the ability of machines to access resource versions on the Web in an automatic, standardised and interoperable manner.
Könyv címe The Past Web: Exploring Web Archives
Hozzáadás dátuma 2021. 08. 09. 8:43:59
Módosítás dátuma 2021. 08. 09. 8:43:59

InterPlanetary Wayback: The Permanent Web Archive

Típus Dolgozat
Szerző Sawood Alam
Szerző Mat Kelly
Szerző Michael L Nelson
URL http://doi.acm.org/10.1145/2910896.2925467
Hely New York, NY, USA
Kiadó ACM
Oldalszám 273-274
ISBN 978-1-4503-4229-2
Dátum 2016
Egyéb Series Title: JCDL '16
Citation Key: Alam:2016:IWP:2910896.2925467
DOI 10.1145/2910896.2925467
Kivonat To facilitate permanence and collaboration in web archives, we built InterPlanetary Wayback to disseminate the contents of WARC files into the IPFS network. IPFS is a peer-to-peer content-addressable file system that inherently allows deduplication and facilitates opt-in replication. We split the header and payload of WARC response records before disseminating into IPFS to leverage the deduplication, build a CDXJ index, and combine them at the time of replay. From a 1.0 GB sample Archive-It collection of WARCs containing 21,994 mementos, we found that on an average, 570 files can be indexed and disseminated into IPFS per minute. We also found that in our naive prototype implementation, replay took on an average 370 milliseconds per request.
Kiadvány címe Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:15
Módosítás dátuma 2021. 08. 09. 8:43:15

Címkék:

  • web archives
  • memento
  • interplanetary wayback
  • ipfs
  • ipwb
  • p2p file system

Into the Dark Domain: The UK Web Archive as a Source for the Contemporary History of Public Health.

Típus Folyóiratcikk
Szerző Martin Gorsky
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 28
Szám 3
Oldalszám 596
Kiadvány Social History of Medicine
ISSN 0951631X
Dátum 2015-08
Egyéb Number: 3
Kivonat With the migration of the written record from paper to digital format, archivists and historians must urgently consider how web content should be conserved, retrieved and analysed. The British Library has recently acquired a large number of UK domain websites, captured 1996-2010, which is colloquially termed the Dark Domain Archive while technical issues surrounding user access are resolved. This article reports the results of an invited pilot project that explores methodological issues surrounding use of this archive. It asks how the relationship between UK public health and local government was represented on the web, drawing on the 'declinist' historiography to frame its questions. It points up some difficulties in developing an aggregate picture of web content due to duplication of sites. It also high lights their potential for thematic and discourse analysis, using both text and image, illustrated through an argument about the contradictory rationale for public health policy under New Labour. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:41:54
Módosítás dátuma 2021. 08. 09. 8:41:54

Címkék:

  • methodology
  • websites
  • BRITISH Library
  • HISTORY — Sources — Computer network resources
  • INTERNET — History
  • local government
  • public health
  • PUBLIC health — Computer network resources
  • PUBLIC health — History
  • WEBSITES — History

Introducing A Dark Web Archival Framework

Típus Folyóiratcikk
Szerző Justin F. Brunelle
Szerző Ryan Farley
Szerző Grant Atkins
Szerző Trevor Bostic
Szerző Marites Hendrix
Szerző Zak Zebrowski
URL http://arxiv.org/abs/2107.04070
Kiadvány arXiv:2107.04070 [cs]
Dátum 2021-07-08
Egyéb arXiv: 2107.04070
Hozzáférés 2021. 07. 16. 9:49:25
Könyvtár Katalógus arXiv.org
Kivonat We present a framework for web-scale archiving of the dark web. While commonly associated with illicit and illegal activity, the dark web provides a way to privately access web information. This is a valuable and socially beneficial tool to global citizens, such as those wishing to access information while under oppressive political regimes that work to limit information availability. However, little institutional archiving is performed on the dark web (limited to the Archive.is dark web presence, a page-at-a-time archiver). We use surface web tools, techniques, and procedures (TTPs) and adapt them for archiving the dark web. We demonstrate the viability of our framework in a proof-of-concept and narrowly scoped prototype, implemented with the following lightly adapted open source tools: the Brozzler crawler for capture, WARC file for storage, and pywb for replay. Using these tools, we demonstrate the viability of modified surface web archiving TTPs for archiving the dark web.
Hozzáadás dátuma 2021. 08. 09. 8:44:33
Módosítás dátuma 2021. 08. 09. 8:44:33

Címkék:

  • Computer Science – Digital Libraries

Introducing Web Archives as a New Library Service: the Experience of the National Library of France

Típus Folyóiratcikk
Szerző Sara Aubry
URL https://www.liberquarterly.eu/article/10.18352/lq.7987/
Kötet 20
Szám 2
Oldalszám 179
Kiadvány LIBER Quarterly
ISSN 2213-056X
Dátum 2010-09-29
Egyéb Number: 2
DOI 10.18352/lq.7987
Kivonat The collections held by the National Library of France (BnF) are part of the national heritage and include nearly 31 million documents of all types (books, journals, manuscripts, photographs, maps, etc.). New collection challenges have been posed by the emergence of the Internet. Within an international framework, the BnF is developing policy guidelines, workflows and tools to harvest relevant and representative segments of the French part of the Internet and organise their preservation and access. The Web archives of the French national domain were developed as a new service, released as a new application and made available to the public in April 2008. Since then, strategies have been and continue to be developed to involve librarians and reach out end users. This article will discuss the BnF experiment and will focus specifically on four issues: * collection building: Web archives as a new and challenging collection, * resource discovery: access services and tools for end users, * usage: facts and figures, * involvement: strategies to build a librarian community and reach out end users.
Hozzáadás dátuma 2021. 08. 09. 8:42:50
Módosítás dátuma 2021. 08. 09. 8:42:50

Címkék:

  • web archives
  • collection building
  • France
  • archiving websites
  • end users
  • resource discovery
  • usage

Introduction: Internet histories

Típus Folyóiratcikk
Szerző Niels Brügger
Szerző Gerard Goggin
Szerző Ian Milligan
Szerző Valérie Schafer
URL http://www.tandfonline.com/doi/abs/10.1080/24701475.2017.1317128
Kötet 1
Szám 1-2
Oldalszám 1-7
Kiadvány Internet Histories
ISSN 2470-1475
Dátum 2017-01-02
Egyéb Number: 1-2
Publisher: Routledge
DOI 10.1080/24701475.2017.1317128
Kivonat The ways in which historians define the Internet profoundly shape the histories we write. Many studies implicitly define the Internet in material terms, as a particular set of hardware and software, and consequently tend to frame the development of the Internet as the spread of these technologies from the United States. This essay explores implications of defining the Internet alternatively in terms of technology, use and local experience. While there is not a single “correct” definition, historians should be aware of the politics of the definitions they use.
Hozzáadás dátuma 2021. 08. 09. 8:41:45
Módosítás dátuma 2021. 08. 09. 8:41:45

Introduction: The Web’s first 25 years

Típus Folyóiratcikk
Szerző Niels Brügger
URL http://journals.sagepub.com/doi/10.1177/1461444816643787
Kötet 18
Szám 7
Oldalszám 1059-1065
Kiadvány New Media & Society
ISSN 1461-4448
Dátum 2016-08-08
Egyéb Number: 7
DOI 10.1177/1461444816643787
Kivonat In August 2016, we can celebrate the 25th anniversary of the World Wide Web. Or can we? There is no doubt that the World Wide Web – or simply: the Web – has played an important role in the communicative infrastructure of most societies since the mid-1990s, but when did the Web actually start? And how has the Web developed from its beginning until today? The six articles in this Special Issue/section revolve around one of these questions in various ways.
Hozzáadás dátuma 2021. 08. 09. 8:42:38
Módosítás dátuma 2021. 08. 09. 8:42:38

Investigation of the Currency, Disappearance and Half-Life of Urls of Web Resources Cited In Iranian Researchers: A Comparative Study.

Típus Folyóiratcikk
Szerző Oranus Tajedini
Szerző Ali Sadatmoosavi
Szerző Azita Ghazizade
Szerző Atefe Tajedini
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 16
Szám 1
Oldalszám 27-47
Kiadvány International Journal of Information Science & Management
ISSN 20088302
Dátum 2018-01
Egyéb Number: 1
Kivonat This research was intended to comparatively investigate the currency, disappearance and half-life of URLs of web resources cited in Iranian researchers' articles indexed in ISI in information science, psychology and management from 2009 to 2011. The research method was citation analysis. The statistical population of this research was all articles by Iranian researchers in psychology, information science and management from 2009 to 2011 which were indexed in SSCI. In order to extract bibliographic information of articles, ISI database was searched and the titles of the articles were extracted. After investigating the currency and disappearance of cited URLs and calculating the half-life of web resources, collected data were analyzed in accordance with research questions by means of Excel Software. The results of this research revealed that in articles written by Iranian researchers indexed in ISI in information science, psychology and management there were 6152, 3639 and 8926 citations, respectively, of which 13.7, 44.8 and 14.23 percent were online citations, respectively. The most frequently used domain in all three fields was .org. The most stable and persistent domain in psychology was .com, in information science was .org and in management was for those domains other than the mentioned domains. The most frequent file format was pdf in all three fields. In information science, pdf. Files were the most stable while in management, rtf files and in psychology, ppt files were the most stable ones, respectively. In the initial search for online citations in psychology, information science and management, respectively, 58, 82 and 88 percent of citations were accessible which were even increased after second check with due measurements to 95, 98 and 97 percent, respectively. The research results also demonstrated that most accessible internet addresses in investigated articles of all three fields were found in the cited internet address. The status of inaccessible internet addresses in all investigated articles regarding error messages also indicated that in psychology and management 404 error message (Not found) was the most frequent error with 34 and 22 percent, respectively and in information science, 403 error message (forbidden) was the most frequent error message with 21 percent. The average half-life of online citations calculated in all investigated articles was 2.6 years which was calculated as 3 years and 4 months in information science, 2 years and 5 months in management and 1 year and 9 months in psychology. The results of this research showed that decay of internet addresses should be regarded as a problem the most important reason of which is website reorganization and changes made to the names of internet domains. Some fields are more exposed to and affected by the consequences of decay of internet addresses. The influence of inactive links on the journals of a field is different based on the reliance of authors on internet based information. The absolute number of internet addresses also strengthens the problem of decay of internet addresses for the readers of the articles as compared with those journals whose authors have only cited a few online citations. The consequences of inactive links for those articles and resources which can be accessed through different ways or their print version is accessible are less serious. Tools like internet archives might make it possible to have a snapshot of the content of a site in a particular time. Google doesn't index dynamic pages or pages and sites which use robots.txt coding to prevent crawling. The best solution to improve the accessibility of internet resources is to request for all internet information be analyzed and recorded while examining the manuscripts. In so doing, the responsibility to archive information will be assigned to the publisher. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:43:24
Módosítás dátuma 2021. 08. 09. 8:43:24

Címkék:

  • Internet Archive
  • Information science
  • Web archives
  • Citation analysis
  • Citation of electronic information resources
  • Half-life of Web References
  • Uniform Resource Locators
  • URL Persistence
  • Web Citation Availability
  • Half-life of Web References.

InZeit: Efficiently Identifying Insightful Time Points

Típus Folyóiratcikk
Szerző Vinay Setty
Szerző Srikanta Bedathur
Szerző Klaus Berberich
Szerző Gerhard Weikum
URL http://dx.doi.org/10.14778/1920841.1921050
Kötet 3
Szám 1-2
Oldalszám 1605-1608
Kiadvány Proc. VLDB Endow.
ISSN 2150-8097
Dátum 2010
Egyéb Number: 1-2
Publisher: VLDB Endowment
Citation Key: Setty:2010:IEI:1920841.1921050
DOI 10.14778/1920841.1921050
Kivonat Web archives are useful resources to find out about the temporal evolution of persons, organizations, products, or other topics. However, even when advanced text search functionality is available, gaining insights into the temporal evolution of a topic can be a tedious task and often requires sifting through many documents. The demonstrated system named InZeit (pronounced "insight") assists users by determining insightful time points for a given query. These are the time points at which the top-k time-travel query result changes substantially and for which the user should therefore inspect query results. InZeit determines the m most insightful time points efficiently using an extended segment tree for in-memory bookkeeping.
Hozzáadás dátuma 2021. 08. 09. 8:43:05
Módosítás dátuma 2021. 08. 09. 8:43:05

Itsy-Bitsy Spider: A Look at Web Crawlers and Web Archiving

Típus Dolgozat
Szerző Caroline Oliveira
URL https://www.nyu.edu/tisch/preservation/program/student_work/2017fall/17f_1807_Oliveira_a2a.pdf
Dátum 2017
Hozzáférés 2020. 08. 17. 9:39:46
Kiadvány címe Digital Preservation: CINE-GT 1807
Hozzáadás dátuma 2021. 08. 09. 8:43:43
Módosítás dátuma 2021. 08. 09. 8:43:43

Journey to the past

Típus Dolgozat
Szerző Adam Jatowt
Szerző Yukiko Kawai
Szerző Satoshi Nakamura
Szerző Yutaka Kidawara
Szerző Katsumi Tanaka
URL http://doi.acm.org/10.1145/1149941.1149969
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 135
ISBN 1-59593-417-0
Dátum 2006
Egyéb Series Title: HYPERTEXT '06
Citation Key: Jatowt:2006:JPP:1149941.1149969
DOI 10.1145/1149941.1149969
Kivonat While the Internet community recognized early on the need to store and preserve past content of the Web for future use, the tools developed so far for retrieving information from Web archives are still difficult to use and far less efficient than those developed for the "live Web." We expect that future information retrieval systems will utilize both the "live" and "past Web" and have thus developed a general framework for a past Web browser. A browser built using this framework would be a client-side system that downloads, in real time, past page versions from Web archives for their customized presentation. It would use passive browsing, change detection and change animation to provide a smooth and satisfactory browsing experience. We propose a meta-archive approach for increasing the coverage of past Web pages and for providing a unified interface to the past Web. Finally, we introduce query-based and localized approaches for filtered browsing that enhance and speed up browsing and information retrieval from Web archives.
Kiadvány címe Proceedings of the seventeenth conference on Hypertext and hypermedia – HYPERTEXT '06
Hozzáadás dátuma 2021. 08. 09. 8:43:19
Módosítás dátuma 2021. 08. 09. 8:43:19

Címkék:

  • web archive
  • past web
  • past web browser

Just-in-time Recovery of Missing Web Pages

Típus Dolgozat
Szerző Terry L Harrison
Szerző Michael L Nelson
URL http://doi.acm.org/10.1145/1149941.1149971
Hely New York, NY, USA
Kiadó ACM
Oldalszám 145-156
ISBN 1-59593-417-0
Dátum 2006
Egyéb Series Title: HYPERTEXT '06
Citation Key: Harrison:2006:JRM:1149941.1149971
DOI 10.1145/1149941.1149971
Kivonat We present Opal, a light-weight framework for interactively locating missing web pages (http status code 404). Opal is an example of "in vivo" preservation: harnessing the collective behavior of web archives, commercial search engines, and research projects for the purpose of preservation. Opal servers learn from their experiences and are able to share their knowledge with other Opal servers by mutual harvesting using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Using cached copies that can be found on the web, Opal creates lexical signatures which are then used to search for similar versions of the web page. We present the architecture of the Opal framework, discuss a reference implementation of the framework, and present a quantitative analysis of the framework that indicates that Opal could be effectively deployed.
Kiadvány címe Proceedings of the Seventeenth Conference on Hypertext and Hypermedia
Hozzáadás dátuma 2021. 08. 09. 8:43:11
Módosítás dátuma 2021. 08. 09. 8:43:11

Címkék:

  • digital preservation
  • 404 web pages
  • apache web server

Keyphrase extraction and its applications to digital libraries

Típus Folyóiratcikk
Szerző Krutarth Indubhai Patel
URL https://krex.k-state.edu/dspace/handle/2097/41306
Dátum 2021
Egyéb Accepted: 2021-03-26T21:58:22Z
Hozzáférés 2021. 07. 15. 11:20:16
Könyvtár Katalógus krex.k-state.edu
Nyelv en_US
Kivonat Scholarly digital libraries provide access to scientific publications and comprise useful resources for researchers. Moreover, they are very useful in many applications such as document and citation recommendation, expert search, scientific paper summarization, collaborator recommendation, topic classification, and keyphrase extraction. Despite the advancements in search engine features, ranking methods, technologies, and the availability of programmable APIs, current-day open-access digital libraries still rely on crawl-based approaches for acquiring their underlying document collections. Furthermore, keyphrases associated with research papers provide an effective way to find useful information in the large and growing scholarly digital collections. Keyphrases are useful in many applications such as document indexing and summarization, topic tracking, contextual advertising, and opinion mining. However, keyphrases are not always provided with the papers, but they need to be extracted from their content. A growing number of scholarly digital libraries, museums, and archives around the world are embracing web archiving as a mechanism to collect born-digital material made available via the web. To create the specialized collection from the Web archived data, there is a substantial need for automatic approaches that can distinguish the documents of interest for a collection.
In this dissertation, we first explore keyphrase extraction as a supervised task and formulated as sequence labeling and utilize the power of Conditional Random Fields in capturing label dependencies through a transition parameter matrix consisting of the transition probabilities from one label to the neighboring label. Our proposed CRF-based supervised approach exploits word embeddings as features along with traditional, document-specific features. Our results on five datasets of research papers show that the word embeddings combined with document-specific features achieve high performance and outperform strong baselines for this task. We also propose KPRank, an unsupervised graph-based algorithm for keyphrase extraction that exploits both positional information and contextual word embeddings into a biased PageRank. Our experimental results on five benchmark datasets show that KPRank that uses contextual word embeddings with additional position signal outperforms previous approaches and strong baselines for this task. Furthermore, we investigate and contrast three supervised keyphrase extraction models to explore their deployment in CiteSeerX digital library for extracting high-quality keyphrases.
Further, we propose a novel search-driven framework for acquiring documents for such scientific portals. Within our framework, publicly-available research paper titles and author names are used as queries to a Web search engine. We were able to obtain ≈ 267,000 unique research papers through our fully-automated framework using ≈ 76,000 queries, resulting in almost 200,000 more papers than the number of queries. Furthermore, We propose a novel search-driven approach to build and maintain a large collection of homepages that can be used as seed URLs in any digital library including CiteSeerX to crawl scientific documents. We use Self-Training in order to reduce the labeling effort and to utilize the unlabeled data to train the efficient researcher homepage classifier. Our experiments on a large-scale dataset highlight the effectiveness of our approach, and position Web search as an effective method for acquiring authors' homepages.
Finally, we explore different learning models and feature representations to determine the best-performing ones for identifying the documents of interest from the web archived data. Specifically, we study both machine learning and deep learning models and "bag of words" (BoW) features extracted from the entire document or from specific portions of the document, as well as structural features that capture the structure of documents. Moreover, we explore dynamic fusion models to find, on the fly, the model or combination of models that perform best on a variety of document types. We proposed two dynamic classifier selection algorithms: Dynamic Classifier Selection for Document Classification (or DCSDC), and Dynamic Decision level Fusion for Document Classification (or DDFC). Our experimental results show that the approach that fuses different models outperforms individual models and other ensemble methods on all three datasets.
Hozzáadás dátuma 2021. 08. 09. 8:44:23
Módosítás dátuma 2021. 08. 09. 8:44:23

Kultura brytyjskiej sieci web TT – British culture web

Típus Folyóiratcikk
Szerző Josh Cowls
URL https://search.proquest.com/docview/1951541478?accountid=27464
Szám 172
Oldalszám 1
Kiadvány Elektroniczny Biuletyn Informacyjny Bibliotekarzy : EBIB
Dátum 2017
Egyéb Number: 172
Publisher: Stowarzyszenie Bibliotekarzy Polskich
Place: Comparative Media Studies Massachusetts Institute of Technology ; Comparative Media Studies Massachusetts Institute of Technology
Nyelv Polish
Kivonat Autor przedstawia brytyjski projekt BUDDAH, który polegał na tym, że naukowcy korzystając ze zgromadzonych zasobów archiwalnych pobranych z sieci robili humanistyczne badania naukowe. Chodzi­ło o stwierdzenie, czy jest sens w archiwizacji stron internetowych w celach badawczych. W artykule opisano wiele różnych badań, podejść metodologicznych, studiów przypadków oraz narzędzi technicznych, które stworzono, by zrealizować te badania.
Hozzáadás dátuma 2021. 08. 09. 8:42:33
Módosítás dátuma 2021. 08. 09. 8:42:33

Címkék:

  • Web archiving
  • Library And Information Sciences
  • 3.2:ARCHIVES

Labor Gone Digital (DigiFacket)! Experiences from Creating a Web Archive for Swedish Trade Unions

Típus Folyóiratcikk
Szerző Jenny Jansson
Szerző Katrin Uba
Szerző Jaanus Karo
URL https://elischolar.library.yale.edu/jcas/vol7/iss1/19
Kötet 7
Szám 1
Kiadvány Journal of Contemporary Archival Studies
ISSN 2380-8845
Dátum 2020-11-20
Egyéb Number: 1
Hozzáadás dátuma 2021. 08. 09. 8:44:09
Módosítás dátuma 2021. 08. 09. 8:44:09

Learning temporal-dependent ranking models

Típus Dolgozat
Szerző Miguel Costa
Szerző Francisco Couto
Szerző Mário Silva
URL http://dl.acm.org/citation.cfm?doid=2600428.2609619
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 757-766
ISBN 978-1-4503-2257-7
Dátum 2014
DOI 10.1145/2600428.2609619
Kivonat Web archives already hold together more than 534 billion files and this number continues to grow as new initiatives arise. Searching on all versions of these files acquired throughout time is challenging, since users expect as fast and precise answers from web archives as the ones provided by current web search engines. This work studies, for the first time, how to improve the search effectiveness of web archives, including the creation of novel temporal features that explore the correlation found between web document persistence and relevance. The persistence was analyzed over 14 years of web snapshots. Additionally, we propose a temporal-dependent ranking framework that exploits the variance of web characteristics over time influencing ranking models. Based on the assumption that closer periods are more likely to hold similar web characteristics, our framework learns multiple models simultaneously, each tuned for a specific period. Experimental results show significant improvements over the search effectiveness of single-models that learn from all data independently of its time. Thus, our approach represents an important step forward on the state-of-the-art IR technology usually employed in web archives.
Kiadvány címe Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval – SIGIR '14
Hozzáadás dátuma 2021. 08. 09. 8:43:23
Módosítás dátuma 2021. 08. 09. 8:43:23

Címkék:

  • web archives
  • temporal-dependent ranking

Legal deposit and collection development in a digital world

Típus Folyóiratcikk
Szerző Nicholas Joint
URL https://www.emeraldinsight.com/doi/10.1108/00242530610689310
Kötet 55
Szám 8
Oldalszám 468-473
Kiadvány Library Review
ISSN 0024-2535
Dátum 2006-10
Egyéb Number: 8
DOI 10.1108/00242530610689310
Kivonat Purpose – To compare and contrast national collection management principles for hard copy deposit collections and for digital deposit collections. Design/methodology/approach – A selective overview and summary of work to date on digital legal deposit and digital preservation. Findings – That the comprehensive nature of traditional print deposit collection often absolves national libraries from the more intractable problems of stock selection; whereas the difficulty of collecting the entire national digital web space means that intelligent selection is vital for the building of meaningful digital deposit collections. Research limitations/implications – These are indicative and partial insights based on small scale interrogation of trial digital deposit collections: the issue of collection development and selection biases in digital collection building needs greater in-depth research before hard and fast recommendations about collection management criteria can be arrived at. Practical implications – The principles outlined may offer practitioners in national libraries some useful insights into how to manage their digital deposit collections. Originality/value – This paper emphasises the social and political aspects of digital deposit issues, rather than the legal or technical aspects.
Hozzáadás dátuma 2021. 08. 09. 8:42:46
Módosítás dátuma 2021. 08. 09. 8:42:46

Címkék:

  • Digital libraries
  • National libraries
  • Collections management

Legal Issues Related to Whole-of-Domain Web Harvesting in Australia

Típus Folyóiratcikk
Szerző Laura Simes
Szerző Bob Pymm
URL http://www.tandfonline.com/doi/abs/10.1080/19322900902787227
Kötet 3
Szám 2
Oldalszám 129-142
Kiadvány Journal of Web Librarianship
ISSN 1932-2909
Dátum 2009-06-23
Egyéb Number: 2
DOI 10.1080/19322900902787227
Kivonat Selective archiving of Web sites in Australia has been under way since 1996. This approach has seen carefully selected sites preserved after site owners granted permission. The labor-intensive nature of this process means only a small number of sites can ever be acquired in this manner. An alternate approach is an automated “whole-of-domain” capture of sites, which has been undertaken in a number of countries, including Australia. This article considers the existing legal position in taking this approach and looks at how legal deposit and copyright legislation constrains the process. It also considers recent amendments to the Copyright Act to provide more flexibility along the lines of the U.S. fair-use approach and the possible impact these new provisions may have for those involved with large-scale Web archiving in Australia
Hozzáadás dátuma 2021. 08. 09. 8:42:46
Módosítás dátuma 2021. 08. 09. 8:42:46

Címkék:

  • legal deposit
  • digital preservation
  • Web harvesting
  • copyright
  • Internet archiving
  • fair use

Legibility Machines: Archival Appraisal and the Genealogies of Use – ProQuest

Típus Weboldal
URL https://www.proquest.com/openview/18394f8f0fe123c09f08114c7b3d36f0/1?pq-origsite=gscholar&cbl=18750&diss=y
Dátum 2021-07-15 08:48:15
Hozzáférés 2021. 07. 15. 10:48:15
Nyelv hu
Kivonat Explore millions of resources from scholarly journals, books, newspapers, videos and more, on the ProQuest Platform.
Rövid cím Legibility Machines
Hozzáadás dátuma 2021. 08. 09. 8:44:13
Módosítás dátuma 2021. 08. 09. 8:44:13

Leveraging Heritrix and the Wayback Machine on a Corporate Intranet: A Case Study on Improving Corporate Archives

Típus Folyóiratcikk
Szerző Justin F Brunelle
Szerző Krista Ferrante
Szerző Eliot Wilczek
Szerző Michele C Weigle
Szerző Michael L Nelson
URL https://search.proquest.com/docview/1806649179?accountid=27464
Kötet 22
Szám 1/2
Oldalszám 1
Kiadvány D-Lib Magazine
ISSN 1082-9873
Dátum 2016-01
Egyéb Number: 1/2
PMID: 1806649179
Publisher: Corporation for National Research Initiatives
Place: Reston
DOI 10.1045/january2016-brunelle
Nyelv English
Kivonat In this work, we present a case study in which we investigate using open-source, web-scale web archiving tools (i.e., Heritrix and the Wayback Machine installed on the MITRE Intranet) to automatically archive a corporate Intranet. We use this case study to outline the challenges of Intranet web archiving, identify situations in which the open source tools are not well suited for the needs of the corporate archivists, and make recommendations for future corporate archivists wishing to use such tools. We performed a crawl of 143,268 URIs (125 GB and 25 hours) to demonstrate that the crawlers are easy to set up, efficiently crawl the Intranet, and improve archive management. However, challenges exist when the Intranet contains sensitive information, areas with potential archival value require user credentials, or archival targets make extensive use of internally developed and customized web services. We elaborate on and recommend approaches for overcoming these challenges.
Hozzáadás dátuma 2021. 08. 09. 8:41:41
Módosítás dátuma 2021. 08. 09. 8:41:41

Címkék:

  • Web archiving
  • Library And Information Sciences–Computer Applica
  • Archives & records
  • 3.2:ARCHIVES
  • Archivists
  • Case studies
  • Open source software

Libraries and digital memory

Típus Folyóiratcikk
Szerző Bruce Massis
URL https://search.proquest.com/docview/1830312026?accountid=27464
Kötet 117
Szám 9/10
Oldalszám 673-676
Kiadvány New Library World
ISSN 03074803
Dátum 2016
Egyéb Number: 9/10
Publisher: Emerald Group Publishing Limited
Place: London
Nyelv English
Kivonat Purpose The purpose of this column is to consider the role of libraries in an effort to preserve and protect a collective digital memory. Design/methodology/approach This paper addresses literature review and commentary on this topic that has been addressed by professionals, researchers and practitioners. Findings Libraries and library consortia will help go forward into the future and expand as trusted repositories where digital memory can be preserved and shared. Originality/value The value in exploring this topic is to examine the library environment for collection, storage and dissemination of digital information.
Hozzáadás dátuma 2021. 08. 09. 8:41:44
Módosítás dátuma 2021. 08. 09. 8:41:44

Címkék:

  • Web archiving
  • Digital archives
  • Library And Information Sciences
  • Academic libraries
  • Digitization
  • Books
  • Internet
  • Library collections
  • National libraries
  • Social networks
  • Consortia
  • Museums
  • Funding
  • Industrialized nations
  • Oral tradition

Life Span of Web Pages: A Survey of 10 Million Pages Collected in 2001

Típus Dolgozat
Szerző Teru Agata
Szerző Yosuke Miyata
Szerző Emi Ishita
Szerző Atsushi Ikeuchi
Szerző Shuichi Ueda
URL http://dl.acm.org/citation.cfm?id=2740769.2740869
Hely Piscataway, NJ, USA
Kiadó IEEE Press
Oldalszám 463-464
ISBN 978-1-4799-5569-5
Dátum 2014
Egyéb Series Title: JCDL '14
Citation Key: Agata:2014:LSW:2740769.2740869
Kivonat Identifying and tracking new information on the Web is important in sociology, marketing, and survey research, since new trends might be apparent in the new information. Such changes can be observed by crawling the Web periodically. In practice, however, it is impossible to crawl the entire expanding Web repeatedly. This means that the novelty of a page remains unknown, even if that page did not exist in previous snapshots. In this paper, we propose a novelty measure for estimating the certainty that a newly crawled page appeared between the previous and current crawls. Using this novelty measure, new pages can be extracted from a series of unstable snapshots for further analysis and mining to identify new trends on the Web. We evaluated the precision, recall, and miss rate of the novelty measure using our Japanese web archive, and applied it to a Web archive search engine.
Kiadvány címe Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:15
Módosítás dátuma 2021. 08. 09. 8:43:15

Címkék:

  • web archiving
  • digital preservation
  • internet archive
  • web page life span

Linked Research on the Decentralised Web

Típus Szakdolgozat
Szerző Sarven Capadisli
URL https://bonndoc.ulb.uni-bonn.de/xmlui/handle/20.500.11811/8352
Hely Bonn
Dátum 2020
Könyvtár Katalógus Zotero
Nyelv en
Egyetem Universität Bonn
Hozzáadás dátuma 2021. 08. 09. 8:44:18
Módosítás dátuma 2021. 08. 09. 8:44:18

Linking Objects and their Stories: An API For Exploring Cultural Heritage Using Formal Concept Analysis

Típus Folyóiratcikk
Szerző Peter Eklund
Szerző Tim Wray
Szerző Jon Ducrou
URL http://www.jetwi.us/index.php?m=content&c=index&a=show&catid=157&id=883
Kötet 3
Szám 3
Kiadvány Journal of Emerging Technologies in Web Intelligence
ISSN 1798-0461
Dátum 2011-08-01
Egyéb Number: 3
DOI 10.4304/jetwi.3.3.239-252
Hozzáadás dátuma 2021. 08. 09. 8:41:49
Módosítás dátuma 2021. 08. 09. 8:41:49

Linking Twitter Archives with Television Archives

Típus Könyvfejezet
Szerző Zeynep Pehlivan
Szerkesztő Daniel Gomes
Szerkesztő Elena Demidova
Szerkesztő Jane Winters
Szerkesztő Thomas Risse
URL https://doi.org/10.1007/978-3-030-63291-5_10
Hely Cham
Kiadó Springer International Publishing
Oldalszám 127-139
ISBN 978-3-030-63291-5
Dátum 2021
Egyéb DOI: 10.1007/978-3-030-63291-5_10
Hozzáférés 2021. 07. 15. 9:52:26
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat Social media data has already established itself as an important data source for researchers working in a number of different domains. It has also attracted the attention of archiving institutions, many of which have already extended their crawling processes to capture at least some forms of social media data. However, far too little attention has been paid to providing access to this data, which has generally been collected using application programming interfaces (APIs). There is a growing need to contextualize the data gathered from APIs, so that researchers can make informed decisions about how to analyse it, and to develop efficient ways of providing access to it. This chapter will discuss one possible means of providing enhanced access: a new interface developed at the Institut national de l’audiovisuel (INA) that links Twitter and television archives to recreate the phenomenon of the “second screen”, or more precisely the experience of “social television”. The phrase “second screen” describes the increasingly ubiquitous activity of using a second computing device (commonly a mobile phone or tablet) while watching television. If the second device is used to comment on, like or retweet television-related content via social media, this results in the so-called social television. The analysis of this activity, and this data, offers a promising new avenue of research for scholars, especially those based on digital humanities. To the best of our knowledge, the work that will be discussed here is the first attempt at considering how best to recreate the experience of “social television” using archived data.
Könyv címe The Past Web: Exploring Web Archives
Hozzáadás dátuma 2021. 08. 09. 8:43:59
Módosítás dátuma 2021. 08. 09. 8:43:59

Living Movements, Living Archives: Selecting and Archiving Web Content During Times of Social Unrest

Típus Folyóiratcikk
Szerző Sylvie Rollason-Cass
Szerző Scott Reed
URL https://search.proquest.com/docview/1877779886?accountid=27464
Kötet 20
Szám 1-2
Oldalszám 241-247
Kiadvány New Review of Information Networking
ISSN 1361-4576
Dátum 2015
Egyéb Number: 1-2
Publisher: Taylor & Francis Ltd.
Place: Archive-It Internet Archive, San Francisco, California, USA ; Archive-It Internet Archive, San Francisco, California, USA
DOI http://dx.doi.org/10.1080/13614576.2015.1114839
Nyelv English
Kivonat The ease of creating and sharing content on the web has had a profound impact on the scope, pace, and mobility of social movements, as well as on how the documents and evidence of these movements are collected and preserved. This article will focus on the process of creating a web based archive around the #blacklivesmatter movement while exploring the concept of the "living archive" through collaborative collection building around social movements. By examining this and other event-based web collections, best practices and strategies to improve the process of selection and capture of web content in Living Archives are presented.
Hozzáadás dátuma 2021. 08. 09. 8:42:15
Módosítás dátuma 2021. 08. 09. 8:42:15

Címkék:

  • Web archiving
  • web archiving
  • Computers–Internet
  • 3.2:ARCHIVES
  • cultural responsibility
  • living archives
  • Social activism
  • social movements

Local Memory Project: Providing Tools to Build Collections of Stories for Local Events from Local Sources

Típus Dolgozat
Szerző Alexander C Nwala
Szerző Michele C Weigle
Szerző Michael L Nelson
Szerző Adam B Ziegler
Szerző Anastasia Aizman
URL http://dl.acm.org/citation.cfm?id=3200334.3200358
Hely Piscataway, NJ, USA
Kiadó IEEE Press
Oldalszám 219-228
ISBN 978-1-5386-3861-3
Dátum 2017
Egyéb Series Title: JCDL '17
Citation Key: Nwala:2017:LMP:3200334.3200358
Kivonat The national (non-local) news media has different priorities than the local news media. If one seeks to build a collection of stories about local events, the national news media may be insufficient, with the exception of local news which "bubbles" up to the national news media. If we rely exclusively on national media, or build collections exclusively on their reports, we could be late to the important milestones which precipitate major local events, thus, run the risk of losing important stories due to link rot and content drift. Consequently, it is important to consult local sources affected by local events. Our goal is to provide a suite of tools (beginning with two) under the umbrella of the Local Memory Project (LMP) to help users and small communities discover, collect, build, archive, and share collections of stories for important local events by leveraging local news sources. The first service (Geo) returns a list of local news sources (newspaper, TV and radio stations) in order of proximity to a user-supplied zip code. The second service (Local Stories Collection Generator) discovers, collects and archives a collection of news stories about a story or event represented by a user-supplied query and zip code pair. We evaluated 20 pairs of collections, Local (generated by our system) and non-Local, by measuring archival coverage, tweet index rate, temporal range, precision, and sub-collection overlap. Our experimental results showed Local and non-Local collections with archive rates of 0.63 and 0.83, respectively, and tweet index rates of 0.59 and 0.80, respectively. Local collections produced older stories than non-Local collections, at a higher precision (relevance) of 0.84 compared to a non-Local precision of 0.72. These results indicate that Local collections are less exposed, thus less popular than their nonLocal counterpart.
Kiadvány címe Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries
Hozzáadás dátuma 2021. 08. 09. 8:43:21
Módosítás dátuma 2021. 08. 09. 8:43:21

Címkék:

  • web archiving
  • collections building
  • digital collections
  • journalism
  • local news
  • news

Local Methods for Estimating Pagerank Values

Típus Dolgozat
Szerző Yen-Yu Chen
Szerző Qingqing Gan
Szerző Torsten Suel
URL http://doi.acm.org/10.1145/1031171.1031248
Hely New York, NY, USA
Kiadó ACM
Oldalszám 381-389
ISBN 1-58113-874-1
Dátum 2004
Egyéb Series Title: CIKM '04
Citation Key: Chen:2004:LME:1031171.1031248
DOI 10.1145/1031171.1031248
Kivonat The Google search engine uses a method called PageRank, together with term-based and other ranking techniques, to order search results returned to the user. PageRank uses link analysis to assign a global importance score to each web page. The PageRank scores of all the pages are usually determined off-line in a large-scale computation on the en- tire hyperlink graph of the web, and several recent studies have focused on improving the efficiency of this computa- tion, which may require multiple hours on a workstation. However, in some scenarios, such as online analysis of link evolution and mining of large web archives such as the In- ternet Archive, it may be desirable to quickly approximate or update the PageRanks of individual nodes without per- forming a large-scale computation on the entire graph. We address this problem by studying several methods for effi- ciently estimating the PageRank score of a particular web page using only a small subgraph of the entire web. In our model, we assume that the graph is accessible remotely via a link database (such as the AltaVista Connectivity Server) or is stored in a relational database that performs lookups on disks to retrieve node and connectivity information. We show that a reasonable estimate of the PageRank value of a node is possible in most cases by retrieving only a moderate number of nodes in the local neighborhood of the node.
Kiadvány címe Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management
Hozzáadás dátuma 2021. 08. 09. 8:43:38
Módosítás dátuma 2021. 08. 09. 8:43:38

Címkék:

  • search engines
  • pagerank
  • external memory algorithms
  • link database
  • link-based ranking
  • out-of-core

Long-term preservation at the National Library of France (BnF): Scalable Preservation and Archiving Repository (SPAR)

Típus Folyóiratcikk
Szerző Thomas Ledoux
URL https://search.proquest.com/docview/1124539611?accountid=27464
Szám 57
Oldalszám 18-20
Kiadvány International Preservation News
Dátum 2012-08
Egyéb Number: 57
PMID: 1124539611
Publisher: IFLA — International Federation of Library Associations and Institutions
Place: The Hague
Nyelv English
Kivonat The National Library of France (BnF) has the mission to collect, preserve and give access to all the published material in France. To this aim, the legal deposit has been extended to the different forms of publishing from the printed material in 1537, to electronic documents in 1992, as well as the Internet in 2006. To preserve all this digital cultural heritage, the BnF has designed a Scalable Preservation and Archiving Repository (SPAR). This central repository has to handle the diversity (media, formats, departments) by taking inspiration from good practices and standards. The key requirements of the system where: 1. OAIS compliance, 2. modularity and scalability, 3. abstraction, 4. use of well known formats and standards, 5. use of open-source technical building blocks.
Hozzáadás dátuma 2021. 08. 09. 8:42:01
Módosítás dátuma 2021. 08. 09. 8:42:01

Címkék:

  • Library And Information Sciences
  • Archives & records
  • Migration
  • Metadata
  • Information storage
  • Infrastructure
  • Product introduction

Lost but not forgotten: finding pages on the unarchived web

Típus Folyóiratcikk
Szerző Hugo C Huurdeman
Szerző Jaap Kamps
Szerző Thaer Samar
Szerző Arjen P de Vries
Szerző Anat Ben-David
Szerző Richard A Rogers
URL https://search.proquest.com/docview/1703890962?accountid=27464
Kötet 16
Szám 3-4
Oldalszám 247-265
Kiadvány International Journal on Digital Libraries
ISSN 1432-5012
Dátum 2015-09-03
Egyéb Number: 3-4
PMID: 1703890962
Publisher: Springer Science & Business Media
Place: Heidelberg
DOI 10.1007/s00799-015-0153-3
Nyelv English
Kivonat Issue Title: Focused Issue on Digital Libraries 2014 Web archives attempt to preserve the fast changing web, yet they will always be incomplete. Due to restrictions in crawling depth, crawling frequency, and restrictive selection policies, large parts of the Web are unarchived and, therefore, lost to posterity. In this paper, we propose an approach to uncover unarchived web pages and websites and to reconstruct different types of descriptions for these pages and sites, based on links and anchor text in the set of crawled pages. We experiment with this approach on the Dutch Web Archive and evaluate the usefulness of page and host-level representations of unarchived content. Our main findings are the following: First, the crawled web contains evidence of a remarkable number of unarchived pages and websites, potentially dramatically increasing the coverage of a Web archive. Second, the link and anchor text have a highly skewed distribution: popular pages such as home pages have more links pointing to them and more terms in the anchor text, but the richness tapers off quickly. Aggregating web page evidence to the host-level leads to significantly richer representations, but the distribution remains skewed. Third, the succinct representation is generally rich enough to uniquely identify pages on the unarchived web: in a known-item search setting we can retrieve unarchived web pages within the first ranks on average, with host-level representations leading to further improvement of the retrieval effectiveness for websites.
Hozzáadás dátuma 2021. 08. 09. 8:42:02
Módosítás dátuma 2021. 08. 09. 8:42:02

Címkék:

  • Web archiving
  • Web archives
  • Digital libraries
  • Library And Information Sciences–Computer Applica
  • World Wide Web
  • Digital archives
  • Information retrieval
  • Anchor text
  • Link evidence
  • Web crawlers

Lost in the Infinite Archive: The Promise and Pitfalls of Web Archives.

Típus Folyóiratcikk
Szerző Ian Milligan
URL http://10.0.13.38/ijhac.2016.0161
Kötet 10
Szám 1
Oldalszám 78-94
Kiadvány International Journal of Humanities & Arts Computing: A Journal of Digital Humanities
ISSN 17538548
Dátum 2016-03
Egyéb Number: 1
Publisher: Edinburgh University Press
Kivonat Contemporary and future historians need to grapple with and confront the challenges posed by web archives. These large collections of material, accessed either through the Internet Archive's Wayback Machine or through other computational methods, represent both a challenge and an opportunity to historians. Through these collections, we have the potential to access the voices of millions of non-elite individuals (recognizing of course the cleavages in both Web access as well as method of access). To put this in perspective, the Old Bailey Online currently describes its monumental holdings of 197,745 trials between 1674 and 1913 as the 'largest body of texts detailing the lives of non-elite people ever published.' GeoCities.com, a platform for everyday web publishing in the mid-to-late 1990s and early 2000s, amounted to over thirty-eight million individual webpages. Historians will have access, in some form, to millions of pages: written by everyday people of various classes, genders, ethnicities, and ages. While the Web was not a perfect democracy by any means – it was and is unevenly accessed across each of those categories – this still represents a massive collection of non-elite speech. Yet a figure like thirty-eight million webpages is both a blessing and a curse. We cannot read every website, and must instead rely upon discovery tools to find the information that we need. Yet these tools largely do not exist for web archives, or are in a very early state of development: what will they look like? What information do historians want to access? We cannot simply map over web tools optimized for discovering current information through online searches or metadata analysis. We need to find information that mattered at the time, to diverse and very large communities. Furthermore, web pages cannot be viewed in isolation, outside of the networks that they inhabited. In theory, amongst corpuses of millions of pages, researchers can find whatever they want to confirm. The trick is situating it into a larger social and cultural context: is it representative? Unique? In this paper, 'Lost in the Infinite Archive,' I explore what the future of digital methods for historians will be when they need to explore web archives. Historical research of periods beginning in the mid-1990s will need to use web archives, and right now we are not ready. This article draws on first-hand research with the Internet Archive and Archive-It web archiving teams. It draws upon three exhaustive datasets: the large Web ARChive (WARC) files that make up Wide Web Scrapes of the Web; the metadata-intensive WAT files that provide networked contextual information; and the lifted-straight-from-the-web guerilla archives generated by groups like Archive Team. Through these case studies, we can see – hands-on – what richness and potentials lie in these new cultural records, and what approaches we may need to adopt. It helps underscore the need to have humanists involved at this early, crucial stage. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:43:34
Módosítás dátuma 2021. 08. 09. 8:43:34

Címkék:

  • WEB archiving
  • RESEARCH
  • WORLD Wide Web
  • ARCHIVES — Computer network resources
  • WEB archives
  • archive
  • digital history
  • HISTORIANS
  • historical studies
  • WEB archives — Research
  • webscraping
  • world wide web
  • WORLD Wide Web — Research

Making Recommendations from Web Archives for

Típus Dolgozat
Szerző Lulwah M. Alkwai
Szerző Michael L. Nelson
Szerző Michele C. Weigle
URL https://doi.org/10.1145/3383583.3398533
Sorozat JCDL '20
Hely New York, NY, USA
Kiadó Association for Computing Machinery
Oldalszám 87–96
ISBN 978-1-4503-7585-6
Dátum August 1, 2020
DOI 10.1145/3383583.3398533
Hozzáférés 2021. 07. 15. 2:00:00
Könyvtár Katalógus ACM Digital Library
Kivonat When a user requests a web page from a web archive, the user will typically either get an HTTP 200 if the page is available, or an HTTP 404 if the web page has not been archived. This is because web archives are typically accessed by Uniform Resource Identifier (URI) lookup, and the response is binary: the archive either has the page or it does not, and the user will not know of other archived web pages that exist and are potentially similar to the requested web page. In this paper, we propose augmenting these binary responses with a model for selecting and ranking recommended web pages in a Web archive. This is to enhance both HTTP 404 responses and HTTP 200 responses by surfacing web pages in the archive that the user may not know existed. First, we check if the URI is already classified in DMOZ or Wikipedia. If the requested URI is not found, we use machine learning to classify the URI using DMOZ as our ontology and collect candidate URIs to recommended to the user. The classification is in two parts, a first-level classification and a deep classification. Next, we filter the candidates based on if they are present in the archive. Finally, we rank candidates based on several features, such as archival quality, web page popularity, temporal similarity, and URI similarity. We calculated the F1 score for different methods of classifying the requested web page at the first level. We found that using all-grams from the URI after removing numerals and the top-level domain (TLD) produced the best result with F1 =0.59. For the deep-level classification, we measured the accuracy at each classification level. For second-level classification, the micro-average F1=0.30 and for third-level classification, F1=0.15. We also found that 44.89% of the correctly classified URIs contained at least one word that exists in a dictionary and 50.07% of the correctly classified URIs contained long strings in the domain. In comparison with the URIs from our Wayback access logs, only 5.39% of those URIs contained only words from a dictionary, and 26.74% contained at least one word from a dictionary. These percentages are low and may affect the ability for the requested URI to be correctly classified.
Kiadvány címe Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020
Hozzáadás dátuma 2021. 08. 09. 8:44:17
Módosítás dátuma 2021. 08. 09. 8:44:17

Címkék:

  • web archiving
  • URI
  • classifying
  • recommending

Managing Duplicates in a Web Archive

Típus Dolgozat
Szerző Daniel Gomes
Szerző André L Santos
Szerző Mário J Silva
URL http://doi.acm.org/10.1145/1141277.1141465
Hely New York, NY, USA
Kiadó ACM
Oldalszám 818-825
ISBN 1-59593-108-2
Dátum 2006
Egyéb Series Title: SAC '06
Citation Key: Gomes:2006:MDW:1141277.1141465
DOI 10.1145/1141277.1141465
Kivonat Crawlers harvest the web by iteratively downloading documents referenced by URLs. It is frequent to find different URLs that refer to the same document, leading crawlers to download duplicates. Hence, web archives built through incremental crawls waste space storing these documents. In this paper, we study the existence of duplicates within a web archive and discuss strategies to eliminate them at storage level during the crawl. We present a storage system architecture that addresses the requirements of web archives and detail its implementation and evaluation. The system is now supporting an archive for the Portuguese web replacing previous NFS-based storage servers. Experimental results showed that the elimination of duplicates can improve storage throughput. The web storage system outperformed NFS based storage by 68% in read operations and by 50% in write operations.1
Kiadvány címe Proceedings of the 2006 ACM Symposium on Applied Computing
Hozzáadás dátuma 2021. 08. 09. 8:43:13
Módosítás dátuma 2021. 08. 09. 8:43:13

Managing Your Digital Afterlife

Típus Folyóiratcikk
Szerző Jessamyn West
URL https://search.proquest.com/docview/1918332139?accountid=27464
Kötet 37
Szám 5
Oldalszám 23-25
Kiadvány Computers in Libraries
ISSN 10417915
Dátum 2017-06
Egyéb Number: 5
Publisher: Information Today, Inc.
Place: Westport
Nyelv English
Kivonat More and more, people's lives are lived online. When the author's father died 6 years ago, they were pleased to find a Google Docs file with the usernames and passwords to every account he owned. He was an engineer, so this was not terribly surprising. Most of these were things such as bank accounts and cable subscriptions, but a few were email accounts and (small) social media profiles. This made a complicated time much simpler. What if they hadn't been able to access his information? Jan Zastrow has written a great article in this issue on digital estate planning, which touches on these same ideas. In this article are some specific tech tools you can use to help archive and prepare your legacy on social media sites and in content repositories.
Hozzáadás dátuma 2021. 08. 09. 8:42:23
Módosítás dátuma 2021. 08. 09. 8:42:23

Címkék:

  • Web archiving
  • Library And Information Sciences–Computer Applica
  • Digital archives
  • Electronic documents
  • Digital media
  • Social networks
  • 14.11:COMMUNICATIONS AND INFORMATION TECHNOLOGY –
  • Electronic mail
  • Passwords
  • Repositories
  • Subscriptions

Mapping of audiences for academic web archiving initiatives

Típus Folyóiratcikk
Szerző Marina Rodrigues Martins
Szerző Moisés Rockembach
URL http://www.scielo.br/j/interc/a/TCW4JZ5Y7PvWfYCYjWkbkbz/abstract/?lang=en
Kötet 43
Oldalszám 71-88
Kiadvány Intercom: Revista Brasileira de Ciências da Comunicação
ISSN 1809-5844, 1809-5844, 1980-3508
Dátum 2020-04-27
Egyéb Publisher: Sociedade Brasileira de Estudos Interdisciplinares da Comunicação (INTERCOM)
Folyóirat rövid neve Intercom, Rev. Bras. Ciênc. Comun.
DOI 10.1590/1809-5844202014
Hozzáférés 2021. 07. 15. 9:49:23
Könyvtár Katalógus SciELO
Nyelv en
Kivonat Resumo O estudo apresenta a potencial rede de públicos estratégicos da Universidade Federal do Rio Grande do Sul, visando promover iniciativas de arquivamento da web no âmbito acadêmico. Levou em conta o ambiente relacional projetado a partir dos Órgãos da Administração Superior e do Programa de Pós-Graduação em Comunicação, da Faculdade de Biblioteconomia e Comunicação da Universidade. Como referência, se observou as iniciativas implantadas e as estruturas organizacionais da Universidade de Columbia e da Universidade de Harvard. A metodologia englobou pesquisas bibliográfica, documental e de conteúdo. O entendimento sobre públicos ocorreu a partir dos enfoques da conceituação lógica, do poder e da comunicação. O estudo concluiu que os atores organizacionais exercem influência em diferentes níveis, cada um conforme suas responsabilidades. Quanto maior a quantidade de coleções arquivadas, mais complexas as redes de públicos envolvidos, de seus diferentes sujeitos dependem apoio financeiro, de infraestrutura, tecnológico, jurídico etc.
Hozzáadás dátuma 2021. 08. 09. 8:43:56
Módosítás dátuma 2021. 08. 09. 8:43:56

Címkék:

  • Web archiving
  • Web archive
  • Mapping audiences
  • Public profile
  • Public Relations

Mapping the UK Webspace: Fifteen Years of British Universities on the Web

Típus Dolgozat
Szerző Scott A Hale
Szerző Taha Yasseri
Szerző Josh Cowls
Szerző Eric T Meyer
Szerző Ralph Schroeder
Szerző Helen Margetts
URL http://doi.acm.org/10.1145/2615569.2615691
Hely New York, NY, USA
Kiadó ACM
Oldalszám 62-70
ISBN 978-1-4503-2622-3
Dátum 2014
Egyéb Series Title: WebSci '14
Citation Key: Hale:2014:MUW:2615569.2615691
DOI 10.1145/2615569.2615691
Kivonat This paper maps the national UK web presence on the basis of an analysis of the .uk domain from 1996 to 2010. It reviews previous attempts to use web archives to understand national web domains and describes the dataset. Next, it presents an analysis of the .uk domain, including the overall number of links in the archive and changes in the link density of different second-level domains over time. We then explore changes over time within a particular second-level domain, the academic subdomain .ac.uk, and compare linking practices with variables, including institutional affiliation, league table ranking, and geographic location. We do not detect institutional affiliation affecting linking practices and find only partial evidence of league table ranking affecting network centrality, but find a clear inverse relationship between the density of links and the geographical distance between universities. This echoes prior findings regarding offline academic activity, which allows us to argue that real-world factors like geography continue to shape academic relationships even in the Internet age. We conclude with directions for future uses of web archive resources in this emerging area of research.
Kiadvány címe Proceedings of the 2014 ACM Conference on Web Science
Hozzáadás dátuma 2021. 08. 09. 8:43:12
Módosítás dátuma 2021. 08. 09. 8:43:12

Címkék:

  • big data
  • web archives
  • world wide web
  • academic web
  • hyperlink analysis
  • network analysis

Medical informatics labor market analysis using web crawling, web scraping, and text mining

Típus Folyóiratcikk
Szerző Jürgen Schedlbauer
Szerző Georgios Raptis
Szerző Bernd Ludwig
URL http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=150299043&lang=hu&site=ehost-live
Kötet 150
Oldalszám N.PAG-N.PAG
Kiadvány International Journal of Medical Informatics
ISSN 13865056
Dátum June 2021
Folyóirat rövid neve International Journal of Medical Informatics
DOI 10.1016/j.ijmedinf.2021.104453
Hozzáférés 2021. 07. 16. 10:59:53
Könyvtár Katalógus EBSCOhost
Kivonat <bold>Objectives: </bold>The European University Association (EUA) defines "employability" as a major goal of higher education. Therefore, competence-based orientation is an important aspect of education. The representation of a standardized job profile in the field of medical informatics, which is based on the most common labor market requirements, is fundamental for identifying and conveying the learning goals corresponding to these competences.<bold>Methods: </bold>To identify the most common requirements, we extracted 544 job advertisements from the German job portal, STEPSTONE. This process was conducted via a program we developed in R with the "rvest" library, utilizing web crawling, web extraction, and text mining. After removing duplicates and filtering for jobs that required a bachelor's degree, 147 job advertisements remained, from which we extracted qualification terms. We categorized the terms into six groups: professional expertise, soft skills, teamwork, processes, learning, and problem-solving abilities.<bold>Results: </bold>The results showed that only 45% of the terms are related to professional expertise, while 55% are related to soft skills. Studies of employee soft skills have shown similar results. The most prevalent terms were programming, experience, project, and server. Our second major finding is the importance of experience, further underlining how essential practical skills are.<bold>Conclusions: </bold>Previous studies used surveys and narrative descriptions. This is the first study to use web crawling, web extraction, and text mining. Our research shows that soft skills and specialist knowledge carry equal weight. The insights gained from this study may be of assistance in developing curricula for medical informatics.
Hozzáadás dátuma 2021. 08. 09. 8:44:36
Módosítás dátuma 2021. 08. 09. 8:44:36

Címkék:

  • Text mining
  • Competence-based education
  • Graduate employability
  • Medical informatics
  • Soft skills

MementoEmbed and Raintale for Web Archive Storytelling

Típus Folyóiratcikk
Szerző Shawn M. Jones
Szerző Martin Klein
Szerző Michele C. Weigle
Szerző Michael L. Nelson
URL http://arxiv.org/abs/2008.00137
Kiadvány arXiv:2008.00137 [cs]
Dátum 2020-07-31
Egyéb arXiv: 2008.00137
Hozzáférés 2021. 07. 15. 11:45:33
Könyvtár Katalógus arXiv.org
Kivonat For traditional library collections, archivists can select a representative sample from a collection and display it in a featured physical or digital library space. Web archive collections may consist of thousands of archived pages, or mementos. How should an archivist display this sample to drive visitors to their collection? Search engines and social media platforms often represent web pages as cards consisting of text snippets, titles, and images. Web storytelling is a popular method for grouping these cards in order to summarize a topic. Unfortunately, social media platforms are not archive-aware and fail to consistently create a good experience for mementos. They also allow no UI alterations for their cards. Thus, we created MementoEmbed to generate cards for individual mementos and Raintale for creating entire stories that archivists can export to a variety of formats.
Hozzáadás dátuma 2021. 08. 09. 8:44:29
Módosítás dátuma 2021. 08. 09. 8:44:29

Címkék:

  • Computer Science – Digital Libraries
  • H.3.7
  • Computer Science – Human-Computer Interaction
  • Computer Science – Information Retrieval
  • H.3.4
  • H.3.6

MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing

Típus Szakdolgozat
Szerző Sawood Alam
URL https://www.proquest.com/dissertations-theses/mementomap-web-archive-profiling-framework/docview/2478763660/se-2?accountid=15756
Hely Ann Arbor
Dátum 2020
Pontos lelőhely 2478763660
Egyéb ISBN: 9798557052580
Publication Title: ProQuest Dissertations and Theses
28259812
Típus Ph.D.
Nyelv English
Egyetem Old Dominion University
Kivonat With the proliferation of public web archives, it is becoming more important to better profile their contents, both to understand their immense holdings as well as to support routing of requests in Memento aggregators. A memento is a past version of a web page and a Memento aggregator is a tool or service that aggregates mementos from many different web archives. To save resources, the Memento aggregator should only poll the archives that are likely to have a copy of the requested Uniform Resource Identifier (URI). Using the Crawler Index (CDX), we generate profiles of the archives that summarize their holdings and use them to inform routing of the Memento aggregator's URI requests. Additionally, we use fulltext search (when available) or sample URI lookups to build an understanding of an archive's holdings. Previous work in profiling ranged from using full URIs (no false positives, but with large profiles) to using only top-level domains (TLDs) (smaller profiles, but with many false positives). This work explores strategies in between these two extremes.For evaluation we used CDX files from Archive-It, UK Web Archive, Stanford Web Archive Portal, and Arquivo.pt. Moreover, we used web server access log files from the Internet Archive's Wayback Machine, UK Web Archive, Arquivo.pt, LANL's Memento Proxy, and ODU's MemGator Server. In addition, we utilized historical dataset of URIs from DMOZ.In early experiments with various URI-based static profiling policies we successfully identified about 78% of the URIs that were not present in the archive with less than 1% relative cost as compared to the complete knowledge profile and 94% URIs with less than 10% relative cost without any false negatives. In another experiment we found that we can correctly route 80% of the requests while maintaining about 0.9 recall by discovering only 10% of the archive holdings and generating a profile that costs less than 1% of the complete knowledge profile.We created MementoMap, a framework that allows web archives and third parties to express holdings and/or voids of an archive of any size with varying levels of details to fulfil various application needs. Our archive profiling framework enables tools and services to predict and rank archives where mementos of a requested URI are likely to be present.In static profiling policies we predefined the maximum depth of host and path segments of URIs for each policy that are used as URI keys. This gave us a good baseline for evaluation, but was not suitable for merging profiles with different policies. Later, we introduced a more flexible means to represent URI keys that uses wildcard characters to indicate whether a URI key was truncated. Moreover, we developed an algorithm to rollup URI keys dynamically at arbitrary depths when sufficient archiving activity is detected under certain URI prefixes. In an experiment with dynamic profiling of archival holdings we found that a MementoMap of less than 1.5% relative cost can correctly identify the presence or absence of 60% of the lookup URIs in the corresponding archive without any false negatives (i.e., 100% recall). In addition, we separately evaluated archival voids based on the most frequently accessed resources in the access log and found that we could have avoided more than 8% of the false positives without introducing any false negatives.We defined a routing score that can be used for Memento routing. Using a cut-off threshold technique on our routing score we achieved over 96% accuracy if we accept about 89% recall and for a recall of 99% we managed to get about 68% accuracy, which translates to about 72% saving in wasted lookup requests in our Memento aggregator. Moreover, when using top-k archives based on our routing score for routing and choosing only the topmost archive, we missed only about 8% of the sample URIs that are present in at least one archive, but when we selected top-2 archives, we missed less than 2% of these URIs. We also evaluated a machine learning-based routing approach, which resulted in an overall better accuracy, but poorer recall due to low prevalence of the sample lookup URI dataset in different web archives.We contributed various algorithms, such as a space and time efficient approach to ingest large lists of URIs to generate MementoMaps and a Random Searcher Model to discover samples of holdings of web archives. We contributed numerous tools to support various aspects of web archiving and replay, such as MemGator (a Memento aggregator), InterPlanetary Wayback (a novel archival replay system), Reconstructive (a client-side request rerouting ServiceWorker), and AccessLog Parser. Moreover, this work yielded a file format specification draft called Unified Key Value Store (UKVS) that we use for serialization and dissemination of MementoMaps. It is a flexible and extensible file format that allows easy interactions with Unix text processing tools. UKVS can be used in many applications beyond MementoMaps.
Terjedelem 251
Archívum ProQuest One Academic
Hozzáadás dátuma 2021. 08. 09. 8:44:38
Módosítás dátuma 2021. 08. 09. 8:44:38

Címkék:

  • Web archiving
  • Information science
  • Memento
  • Computer science
  • 0984:Computer science
  • Query routing
  • Information technology
  • 0489:Information Technology
  • 0723:Information science
  • Memento routing
  • MementoMap
  • World wide web

MemGator – A Portable Concurrent Memento Aggregator

Típus Dolgozat
Szerző Sawood Alam
Szerző Michael L. Nelson
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 243-244
ISBN 978-1-4503-4229-2
Dátum 2016
DOI 10.1145/2910896.2925452
Kivonat The Memento protocol makes it easy to build a uniform lookup service to aggregate the holdings of web archives. However, there is a lack of tools to utilize this capability in archiving applications and research projects. We created MemGator, an open source, easy to use, portable, concurrent, cross-platform, and self-documented Memento aggregator CLI and server tool written in Go. MemGator implements all the basic features of a Memento aggregator (e.g., TimeMap and TimeGate) and gives the ability to customize various options including which archives are aggregated. It is being used heavily by tools and services such as Mink, WAIL, OldWeb. today, and archiving research projects and has proved to be reliable even in conditions of extreme load.
Kiadvány címe Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries – JCDL '16
Hozzáadás dátuma 2021. 08. 09. 8:42:04
Módosítás dátuma 2021. 08. 09. 8:42:04

Címkék:

  • Memento
  • Web Archiving
  • Computer science
  • Computing and Processing
  • Aggregates
  • Aggregator
  • Concurrent computing
  • MemGator
  • Protocols
  • Reliability
  • Servers
  • Stress

Memory Entanglements and Collection Development in a Transnational Media Landscape

Típus Dolgozat
Szerző Eva Maria Häusner
Kiadó IFLA
Oldalszám 5
Dátum 2017
Kivonat Defining a national domain is the crux of the matter of every National Library’s mission. The National Library of Sweden collects, preserves, registers, and guarantees access to all materials published and distributed in Sweden, printed, audio-visual and since 2012, even electronic. Furthermore the National Library of Sweden collects Suecana, foreign publications which possess historical significance to Sweden and even Swedish literature in translation. Collection strategies have to be updated and developed to fit the times: Digitalization and media convergence presuppose a new concept and new definition of the national domain. How should the National Library work with selection and collection strategies in a way that to make sure that the Suecana-collection and the Swedish collection are truly representative and relevant? This paper describes difficulties inherent to defining a national domain in today’s media landscape and presents s
Kiadvány címe IFLA 2017
Hozzáadás dátuma 2021. 08. 09. 8:41:44
Módosítás dátuma 2021. 08. 09. 8:41:44

Címkék:

  • National libraries
  • collection development
  • digitalization
  • international collaboration
  • Sweden

Memory Hole or Right to Delist? Implications of the Right to be Forgotten for Web Archiving ; Trou mémoriel ou droit au déréférencement ? Les implications du droit à l’oubli pour l’archivage du Web

Típus Folyóiratcikk
Szerző Melanie Dulong de Rosnay
Szerző Andrés Guadamuz
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Dátum 2017
Egyéb Publisher: HAL CCSD
Place: France, Europe
Kivonat International audience ; This article studies the possible impact of the “right to be forgotten” (RTBF) on the preservation of native digital heritage. It analyses the extent to which archival practices may be affected by the new right, and whether the web may become impossible to preserve for future generations, risking to disappear from memories and history since no version would be available in public or private archives. Collective rights to remember and to memory, free access to information and freedom of expression, seem to clash with private individuals’ right to privacy. After a presentation of core legal concepts of privacy, data protection and freedom of expression, we analyse the case of the European Union Court of Justice vs. Google concerning the right to be forgotten, and look deeper into the controversies generated by the decision. We conclude that there is no room for concern for archives and for the right to remember given the restricted application of RTBF. ; Cet article étudie l’impact possible du « droit à l’oubli » (RTBF) sur la préservation du patrimoine numérique natif. Il analyse si les pratiques d'archivage sont susceptibles d’être affectées par le nouveau droit et s’il pourrait devenir impossible de préserver le Web pour les générations futures, avec le risque pour certains contenus de disparaître de la mémoire et de l’histoire si aucune version n’était disponible dans les archives publiques ou privées. Le droit collectif au souvenir et à la mémoire, l’accès libre à l'information et la liberté d'expression semblent entrer en conflit avec les droits individuels à la vie privée. Après une présentation des concepts juridiques fondamentaux de la vie privée, de la protection des données personnelles et de la liberté d'expression, nous analysons l’arrêt Google de la Cour de Justice de l’Union Européenne et le droit à l’oubli, et examinons les controverses qui ont été générées par la décision. On conclut que les archives et le droit au souvenir ne seront pas affectés par le droit à l’oubli, étant donné son application restreinte.
Hozzáadás dátuma 2021. 08. 09. 8:43:23
Módosítás dátuma 2021. 08. 09. 8:43:23

Címkék:

  • Google
  • Wikipedia
  • web archives
  • [ SHS.DROIT ] Humanities and Social Sciences/Law
  • [ SHS.INFO ] Humanities and Social Sciences/Librar
  • [ SHS.SCIPO ] Humanities and Social Sciences/Polit
  • data protection
  • digital archives
  • memory
  • privacy
  • right to be forgotten
  • right to remember

Memory of the World, Documentary Heritage and Digital Technology: Critical Perspectives BT – The UNESCO Memory of the World Programme: Key Aspects and Recent Developments

Típus Könyvfejezet
Szerző Anca Claudia Prodan
Szerkesztő Ray Edmondson
Szerkesztő Lothar Jordan
Szerkesztő Anca Claudia Prodan
URL https://doi.org/10.1007/978-3-030-18441-4_11
Hely Cham
Kiadó Springer International Publishing
Oldalszám 159-174
ISBN 978-3-030-18441-4
Dátum 2020
Egyéb DOI: 10.1007/978-3-030-18441-4_11
Kivonat This chapter explores the potential that critically oriented perspectives hold for broadened insights about the heritage value of digital documents. Digital technology has significantly changed the way documents are conceptualized, created, accessed, transmitted and preserved, and digital documents are characterized by features that challenge established perspectives. Although any of these features may hold heritage significance, digital documentary heritage is poorly represented in the context of the UNESCO Memory of the World Programme (MoW), in particular on the International Memory of the World Register, which contains a selection of some of the most globally representative documents in any form, including the digital. Observing that libraries and archives, and their underlying disciplines, which have informed MoW, have been dominated by positivism, this chapter builds on the assumption that approaching documents too narrowly entails the risk of overlooking the manifold significance they could have. Consequently, I suggest that moving away from positivism and adopting critical perspectives might help us understand more comprehensively the manifold heritage significance of digital documents. For illustration, I am using the example of software, and I discuss how the adoption of critical perspectives enables broadened insights about the significance of software, not just as a component in a digital document but also as a document in its own right.
Könyv címe The UNESCO Memory of the World Programme. Heritage Studies.
Hozzáadás dátuma 2021. 08. 09. 8:43:39
Módosítás dátuma 2021. 08. 09. 8:43:39

Címkék:

  • Critical code studies
  • Critical perspectives
  • Definitions
  • Digital documentary heritage
  • Software heritage
  • Software studies

Metadata

Típus Könyv
Szerző Marcia Zeng
Szerző Jian Quin
Kiadó Facet Publishing
Dátum 2016
Hozzáadás dátuma 2021. 08. 09. 8:43:42
Módosítás dátuma 2021. 08. 09. 8:43:42

Metadata for a Web Archive: PREMIS and XMP as Tools for the Task.

Típus Folyóiratcikk
Szerző Laurentia M Romaniuk
URL http://search.ebscohost.com/login.aspx?direct=true&db=lxh&AN=97212804&lang=hu&site=ehost-live
Oldalszám 1-20
Kiadvány Library Philosophy & Practice
ISSN 15220222
Dátum 2014-02-26
Kivonat In a time where websites are ever changing, what metadata standards and tools are best for ensuring that web archive objects (such as snapshots of websites) are readable for users of the future? Can the evolution of web interfaces be documented? Initiatives that explore these questions already exist such as the Internet Archive's Wayback Machine (which stores source code from websites along with images); however, other archive building solutions are also available but have yet to be explored. The field of digital asset management (DAM), for example, has long examined how assets (digital files) are stored, organized, retrieved, and preserved. Best practices related to the use of metadata standards and tools found in digital asset management are useful and relevant to web archive building. In order to better understand the practicality of implementing DAM best practices in building a web archive, a small project was performed which involved cross-walking two metadata standards, Adobe's eXtensible Metadata Platform (XMP) and PREservation Metadata: Implementation Strategies (PREMIS), and recording metadata related to snapshots of a website, the Perseus Digital Library, over a span of over a decade. The findings of this project showed that it is impossible, at least in part, to encode PREMIS within XMP. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:41:49
Módosítás dátuma 2021. 08. 09. 8:41:49

Címkék:

  • RESEARCH
  • Web archiving
  • Web archives
  • Digital libraries
  • Web
  • Metadata
  • Archive
  • Crosswalking
  • Digital Asset Management
  • Interface
  • PREMIS
  • Tags (Metadata)
  • XMP

Metadata Management and Future Plans to Generate Linked Open Data in the Hungarian Web Archiving Pilot Project

Típus Folyóiratcikk
Szerző Márton Németh
Szerző László Drótos
URL https://itlib.cvtisr.sk/buxus/docs/38-metadata.pdf
Kötet 2019
Szám 2
Kiadvány ITLIB
Dátum 2019
Egyéb Number: 2
Hozzáférés 2020. 08. 19. 16:31:33
Nyelv English
Kivonat In this article we would like to offer a short overview about the metadata management model of our web archiving
pilot project together with international recommendations as a major background of modelling. It is including an
outlook to the scope of metadata management (archive-level and website-level), an overview of major metadata
types and description of some major metadata fields (more than one hundred fields are available). Metadata
based full-text search and retrieval capabilities are also being described in the article.
The second chapter of the article points out that the absence of efficient and meaningful exploration methods of
the archived content is a really major hurdle in the way to turn web archives to a usable and useful information
resource. A major challenge in information science can be the adaptation of semantic web tools and methods to
web archive environments. The web archives must be a part of the linked data universe with advanced query and
integration capabilities, and must be able to directly exploitable by other systems and tools. We would like to
describe some basic considerations in order to successfully manage this semantic web integration process as
a plan to the future.
Hozzáadás dátuma 2021. 08. 09. 8:43:48
Módosítás dátuma 2021. 08. 09. 8:43:48

Metadata Mix and Match

Típus Folyóiratcikk
Szerző Karen Coyle
URL https://search.proquest.com/docview/1735033500?accountid=27464
Kötet 21
Szám 1
Oldalszám 8-11
Kiadvány Information Standards Quarterly
Dátum 2009
Egyéb Number: 1
Publisher: National Information Standards Organization
Place: Baltimore
Nyelv English
Kivonat The author was asked to consult with the Internet Archive's Open Library project primarily to lend her expertise in bibliographic data. To her dismay, the Open Library data did not look anything like library bibliographic data. She learned, however, that there were some good reasons for this. The first was that the Open Library was not limiting itself to library data. Another reason the Open Library does not limit itself to the more rigorous library data style was that the Open Library allows editing of its data by the general public: people with no particular bibliographic training. The most compelling reason to deviate from the standard view posited by library bibliographic data, however, has to do with the concept of linked data. It's an unfortunate fact that many systems combine data from different sources using only the "dumb down" method, reducing the metadata to the few matching elements and resulting in the least rich metadata record possible.
Hozzáadás dátuma 2021. 08. 09. 8:42:15
Módosítás dátuma 2021. 08. 09. 8:42:15

Címkék:

  • Web archiving
  • Library And Information Sciences
  • Metadata
  • Data bases
  • Bibliographic records
  • Curriculum development
  • Information sources
  • Library cataloging
  • Resource Description Framework-RDF
  • Standards

Methods and Approaches to Using Web Archives in Computational Communication Research

Típus Folyóiratcikk
Szerző Matthew S. Weber
URL https://www.tandfonline.com/doi/full/10.1080/19312458.2018.1447657
Kötet 12
Szám 2-3
Oldalszám 200-215
Kiadvány Communication Methods and Measures
ISSN 1931-2458
Dátum 2018-04-03
Egyéb Number: 2-3
DOI 10.1080/19312458.2018.1447657
Kivonat This article examines the role of web archives as a critical source of data for conducting computational communication research. Web archives are large-scale databases containing comprehensive records of websites showing how those websites have evolved over time. Recent communication scholarship using web archives is reviewed, demonstrating the breadth of research conducted in this space. Subsequently, a methodological framework is proposed for using web archives in computational communication research. As a source of data, web archives present a number of methodological challenges, particularly with regards to the accuracy and completeness of web archives. These problems are addressed in order to better inform future work in this area. The closing sections outline a forward-looking trajectory for computational communication research using web archives.
Hozzáadás dátuma 2021. 08. 09. 8:42:44
Módosítás dátuma 2021. 08. 09. 8:42:44

Methods of Web Philology: Computer Metadata and Web Archiving in the Primary Source Documents of Contemporary Esotericism

Típus Folyóiratcikk
Szerző Christopher Plaisance
URL http://10.0.6.22/ijsnr.v7i1.26074
Kötet 7
Szám 1
Oldalszám 43-68
Kiadvány International Journal for the Study of New Religions
ISSN 2041-9511
Dátum 2016-05-31
Egyéb Number: 1
Publisher: Equinox Publishing Group
DOI 10.1558/ijsnr.v7i1.26074
Kivonat This article explores the issues surrounding the critical analysis of first generation electronic objects within the context of the study of contemporary esoteric discourse. This is achieved through a detailed case study of Benjamin Rowe's work, A Short Course in Scrying, which is solely exemplified by digital witnesses. This article demonstrates that the critical analysis of these witnesses is only possible by adapting the general methods of textual scholarship to the specific techniques of digital forensics-particularly the analysis of computer metadata and web archives. The resulting method, here termed web philology, is applicable to the critical analysis by the scholar of religion of any primary source documents originating on the web as electronic objects. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:04
Módosítás dátuma 2021. 08. 09. 8:42:04

Címkék:

  • WEB archiving
  • methodology
  • contemporary esotericism
  • DIGITAL electronics
  • digital forensics
  • ESOTERICISM
  • METADATA
  • PHILOLOGY
  • textual scholarship
  • web philology

Micro Archives as Rich Digital Object Representations

Típus Dolgozat
Szerző Helge Holzmann
Szerző Mila Runnwerth
URL http://dl.acm.org/citation.cfm?doid=3201064.3201110
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 353-357
ISBN 978-1-4503-5563-6
Dátum 2018
DOI 10.1145/3201064.3201110
Kivonat Digital objects as well as real-world entities are commonly referred to in literature or on the Web by mentioning their name, linking to their website or citing unique identifiers, such as DOI and OR- CID, which are backed by a set of meta information. All of these methods have severe disadvantages and are not always suitable though: They are not very precise, not guaranteed to be persistent or mean a big additional effort for the author, who needs to collect the metadata to describe the reference accurately. Especially for complex, evolving entities and objects like software, pre-defined metadata schemas are often not expressive enough to capture its temporal state comprehensively. We found in previous work that a lot of meaningful information about software, such as a description, rich metadata, its documentation and source code, is usually avail- able online. However, all of this needs to be preserved coherently in order to constitute a rich digital representation of the entity. We show that this is currently not the case, as only 10% of the stud- ied blog posts and roughly 30% of the analyzed software websites are archived completely, i.e., all linked resources are captured as well. Therefore, we propose Micro Archives as rich digital object representations, which semantically and logically connect archived resources and ensure a coherent state. With Micrawler we present a modular solution to create, cite and analyze such Micro Archives. In this paper, we show the need for this approach as well as discuss opportunities and implications for various applications also beyond scholarly writing.
Kiadvány címe Proceedings of the 10th ACM Conference on Web Science – WebSci '18
Hozzáadás dátuma 2021. 08. 09. 8:42:40
Módosítás dátuma 2021. 08. 09. 8:42:40

Címkék:

  • Crawling
  • Data Representation
  • Scientific Workflow
  • Web Archives

Migrating Web Archives from HTML4 to HTML5: A Block-Based Approach and Its Evaluation

Típus Könyvfejezet
Szerző Andrés Sanoja
Szerző Stéphane Gançarski
Szerkesztő Mārīte Kirikova
Szerkesztő Kjetil Nørvåg
Szerkesztő George A Papadopoulos
URL http://link.springer.com/10.1007/978-3-319-66917-5_25
Hely Cham
Kiadó Springer International Publishing
Oldalszám 375-393
ISBN 978-3-319-66917-5
Dátum 2017
Egyéb DOI: 10.1007/978-3-319-66917-5_25
Kivonat Web archives (and the Web itself) are likely to suffer from format obsolescence. In a few years or decades, future Web browsers will no more be able to properly render Web pages written in HTML4 format. Thus we propose a migration tool from HTML4 to HTML5. This is challenging, because it requires to generate HTML5 semantic elements that do not exist in HTML4 pages. To solve this issue, we propose to use a Web page segmenter. Indeed, blocks generated by a segmenter are good candidates for being semantic elements as both reflect the content structure of the page. We use an evaluation framework for Web page segmentation, that helps defining and computing relevant metrics to measure the quality of the migration process. We ran experiments on a sample of 40 pages. The migrated pages we produce are compared to a ground truth. The automatic labeling of blocks is quite similar to the ground truth, though its quality depends on the type of page we migrate. When comparing the rendering of the original page and the rendering of its migrated version, we note some differences, mainly due to the fact that rendering engines do not (yet) properly render the content of semantic elements.
Könyv címe ADBIS 2017: Advances in Databases and Information Systems
Hozzáadás dátuma 2021. 08. 09. 8:43:28
Módosítás dátuma 2021. 08. 09. 8:43:28

Címkék:

  • archive
  • Blocks
  • Format obsolescence
  • HTML5
  • Migration
  • Segmentation
  • Web

Mining Relevant Time for Query Subtopics in Web Archives

Típus Dolgozat
Szerző Tu Ngoc Nguyen
Szerző Nattiya Kanhabua
Szerző Wolfgang Nejdl
Szerző Claudia Niederée
URL http://doi.acm.org/10.1145/2740908.2741702
Hely New York, NY, USA
Kiadó ACM
Oldalszám 1357-1362
ISBN 978-1-4503-3473-0
Dátum 2015
Egyéb Series Title: WWW '15 Companion
Citation Key: Nguyen:2015:MRT:2740908.2741702
DOI 10.1145/2740908.2741702
Kivonat With the reflection of nearly all types of social cultural, societal and everyday processes of our lives in the web, web archives from organizations such as the Internet Archive have the potential of becoming huge gold-mines for temporal content analytics of many kinds (e.g., on politics, social issues, economics or media). First hand evidences for such processes are of great benefit for expert users such as journalists, economists, historians, etc. However, searching in this unique longitudinal collection of huge redundancy (pages of near-identical content are crawled all over again) is completely different from searching over the web. In this work, we present our first study of mining the temporal dynamics of subtopics by leveraging the value of anchor text along the time dimension of the enormous web archives. This task is especially useful for one important ranking problem in the web archive context, the time-aware search result diversification. Due to the time uncertainty (the lagging nature and unpredicted behavior of the crawlers), identifying the trending periods for such temporal subtopics relying solely on the timestamp annotations of the web archive (i.e., crawling times) is extremely difficult. We introduce a brute-force approach to detect a time-reliable sub-collection and propose a method to leverage them for relevant time mining of subtopics. This is empirically found effective in solving the problem.
Kiadvány címe Proceedings of the 24th International Conference on World Wide Web
Hozzáadás dátuma 2021. 08. 09. 8:43:17
Módosítás dátuma 2021. 08. 09. 8:43:17

Címkék:

  • temporal ranking
  • anchor text mining
  • result diversification
  • temporal subtopic

Mining the information architecture of the WWW using automated website boundary detection.

Típus Folyóiratcikk
Szerző Ayesh Alshukri
Szerző Frans Coenen
URL http://10.0.12.161/WEB-170365
Kötet 15
Szám 4
Oldalszám 269-290
Kiadvány Web Intelligence (2405-6456)
ISSN 24056456
Dátum 2017-10
Egyéb Number: 4
Publisher: IOS Press
Kivonat The world wide web has two main forms of architecture, the first is that which is explicitly encoded into web pages, and the second is that which is implied by the web content, particularly pertaining to look and feel. The latter is exemplified by the concept of a website, a concept that is only loosely defined, although users intuitively understand it. The Website Boundary Detection (WBD) problem is concerned with the task of identifying the complete collection of web pages/resources that are contained within a single website. Whatever the case, the concept of a website is used with respect to a number of application domains including; website archiving, spam detection, and www analysis. In the context of such applications it is beneficial if a website can be automatically identified. This is usually done by identifying a website of interest in terms of its boundary, the so called WBD problem. In this paper seven WBD techniques are proposed and compared, four statistical techniques where the web data to be used is obtained apriori, and three dynamic techniques where the data to be used is obtained as the process progresses. All seven techniques are presented in detail and evaluated. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:06
Módosítás dátuma 2021. 08. 09. 8:42:06

Címkék:

  • web archiving
  • INTERNET
  • digital preservation
  • WEBSITES
  • SPAM (Email)
  • COMPUTER network resources
  • INTERNET content
  • random walk techniques
  • web graphs
  • web page clustering
  • Web structure mining
  • website boundary detection

Mirkwood: An Online Parallel Crawler BT – International Joint Conference: 12th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2019) and 10th International Conference on EUropean Transnational Education (I

Típus Könyvfejezet
Szerző Juan F García
Szerző Miguel V Carriegos
Szerkesztő Francisco Martínez Álvarez
Szerkesztő Alicia Troncoso Lora
Szerkesztő José António Sáez Muñoz
Szerkesztő Héctor Quintián
Szerkesztő Emilio Corchado
Hely Cham
Kiadó Springer International Publishing
Oldalszám 47-56
ISBN 978-3-030-20005-3
Dátum 2020
Kivonat In this research we present Mirkwood, a parallel crawler for fast and online syntactic analysis of websites. Configured by default to behave as a focused crawler, analysing exclusively a limited set of hosts, it includes seed extraction capabilities, which allows it to autonomously obtain high quality sites to crawl. Mirkwood is designed to run in a computer cluster, taking advantage of all the cores of its individual machines (virtual or physical), although it can also run on a single machine. By analysing sites online and not downloading the web content, we achieve crawling speeds several orders of magnitude faster than if we did, while assuring that the content we check is up to date. Our crawler relies on MPI, for the cluster of computers, and threading, for each individual machine of the cluster. Our software has been tested in several platforms, including the Supercomputer Calendula. Mirkwood is entirely written in Java language, making it multi–platform and portable.
Könyv címe International Joint Conference: 12th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2019) and 10th International Conference on EUropean Transnational Education (ICEUTE 2019). CISIS 2019, ICEUTE 2019. Adva
Hozzáadás dátuma 2021. 08. 09. 8:43:39
Módosítás dátuma 2021. 08. 09. 8:43:39

Címkék:

  • computation
  • Crawler Parallel
  • High performance computing

MIT's CWSpace project: packaging metadata for archiving educational content in DSpace

Típus Folyóiratcikk
Szerző William Reilly
Szerző Robert Wolfe
Szerző MacKenzie Smith
URL http://link.springer.com/10.1007/s00799-005-0131-2
Kötet 6
Szám 2
Oldalszám 139-147
Kiadvány International Journal on Digital Libraries
ISSN 1432-5012
Dátum 2006-04-20
Egyéb Number: 2
DOI 10.1007/s00799-005-0131-2
Hozzáadás dátuma 2021. 08. 09. 8:41:49
Módosítás dátuma 2021. 08. 09. 8:41:49

Mobile Mink

Típus Dolgozat
Szerző Wesley Jordan
Szerző Mat Kelly
Szerző Justin F. Brunelle
Szerző Laura Vobrak
Szerző Michele C. Weigle
Szerző Michael L. Nelson
URL http://dl.acm.org/citation.cfm?doid=2756406.2756956
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 243-244
ISBN 978-1-4503-3594-2
Dátum 2015
DOI 10.1145/2756406.2756956
Kivonat We describe the mobile app \emph{Mobile Mink} which extends Mink, a browser extension that integrates the live and archived web. Mobile Mink discovers mobile and desktop URIs and provides the user an aggregated TimeMap of both mobile and desktop mementos. Mobile Mink also allows users to submit mobile and desktop URIs for archiving at the Internet Archive and Archive.today. Mobile Mink helps to increase the archival coverage of the growing mobile web.
Kiadvány címe Proceedings of the 15th ACM/IEEE-CE on Joint Conference on Digital Libraries – JCDL '15
Hozzáadás dátuma 2021. 08. 09. 8:41:51
Módosítás dátuma 2021. 08. 09. 8:41:51

Mobilny pracownik – sprawozdanie z międzynarodowych warsztatów

Típus Folyóiratcikk
Szerző Joanna Radzicka
URL https://search.proquest.com/docview/1951541346?accountid=27464
Szám 172
Oldalszám 1
Kiadvány Elektroniczny Biuletyn Informacyjny Bibliotekarzy : EBIB
Dátum 2017
Egyéb Number: 172
Publisher: Stowarzyszenie Bibliotekarzy Polskich
Place: Biblioteka Politechniki Krakowskiej ; Biblioteka Politechniki Krakowskiej
Nyelv Polish
Hozzáadás dátuma 2021. 08. 09. 8:42:27
Módosítás dátuma 2021. 08. 09. 8:42:27

Címkék:

  • Library And Information Sciences

Modeling Updates of Scholarly Webpages Using Archived Data

Típus Dolgozat
Szerző Yasith Jayawardana
Szerző Alexander C. Nwala
Szerző Gavindya Jayawardena
Szerző Jian Wu
Szerző Sampath Jayarathna
Szerző Michael L. Nelson
Szerző C. Lee Giles
Oldalszám 1868-1877
Dátum 2020-12
DOI 10.1109/BigData50022.2020.9377796
Könyvtár Katalógus IEEE Xplore
Konferencia címe 2020 IEEE International Conference on Big Data (Big Data)
Kivonat The vastness of the web imposes a prohibitive cost on building large-scale search engines with limited resources. Crawl frontiers thus need to be optimized to improve the coverage and freshness of crawled content. In this paper, we propose an approach for modeling the dynamics of change in the web using archived copies of webpages. To evaluate its utility, we conduct a preliminary study on the scholarly web using 19,977 seed URLs of authors' homepages obtained from their Google Scholar profiles. We first obtain archived copies of these webpages from the Internet Archive (IA), and estimate when their actual updates occurred. Next, we apply maximum likelihood to estimate their mean update frequency (λ) values. Our evaluation shows that λ values derived from a short history of archived data provide a good estimate for the true update frequency in the short-term, and that our method provides better estimations of updates at a fraction of resources compared to the baseline models. Based on this, we demonstrate the utility of archived data to optimize the crawling strategy of web crawlers, and uncover important challenges that inspire future research directions.
Kiadvány címe 2020 IEEE International Conference on Big Data (Big Data)
Hozzáadás dátuma 2021. 08. 09. 8:44:31
Módosítás dátuma 2021. 08. 09. 8:44:31

Címkék:

  • History
  • Internet
  • Search engines
  • Big Data
  • Crawl Scheduling
  • Data models
  • Frequency estimation
  • Portable document format
  • Search Engines
  • Web Crawling

Multiple Media Analysis and Visualization for Understanding Social Activities

Típus Dolgozat
Szerző Masashi Toyoda
URL http://doi.acm.org/10.1145/2567948.2579040
Hely New York, NY, USA
Kiadó ACM
Oldalszám 825-826
ISBN 978-1-4503-2745-9
Dátum 2014
Egyéb Series Title: WWW '14 Companion
Citation Key: Toyoda:2014:MMA:2567948.2579040
DOI 10.1145/2567948.2579040
Kivonat The Web has involved diverse media services, such as blogs, photo/video/link sharing, social networks, and microblogs. These Web media react to and affect realworld events, while the mass media still has big influence on social activities. The Web and mass media now affect each other. Our use of media has evolved dynamically in the last decade, and this affects our societal behavior. For instance, the first photo of a plane crash landing during the "Miracle on the Hudson" on January 15, 2009 appeared and spread on Twitter and was then used in TV news. During the "Chelyabinsk Meteor" incident on February 15, 2013, many people reported videos of the incident on YouTube then mass media reused them on TV programs. Large scale collection, analysis, and visualization of those multiple media are strongly required for sociology, linguistics, risk management, and marketing researches. We are building a huge scale Japanese web archive, and various analytics engines with a large-scale display wall. Our archive consists of 30 billion web pages crawled for 14 years, 1 billion blog posts for 7 years, and 15 billion tweets for 3 years. In this talk, I present several analysis and visualization systems based on network analysis, natural language processing, image processing, and 3 dimensional visualization.
Kiadvány címe Proceedings of the 23rd International Conference on World Wide Web
Hozzáadás dátuma 2021. 08. 09. 8:43:20
Módosítás dátuma 2021. 08. 09. 8:43:20

Címkék:

  • web archive
  • multiple media analysis
  • visualization

Named Entity Evolution Analysis on Wikipedia

Típus Dolgozat
Szerző Helge Holzmann
Szerző Thomas Risse
URL http://doi.acm.org/10.1145/2615569.2615639
Hely New York, NY, USA
Kiadó ACM
Oldalszám 241-242
ISBN 978-1-4503-2622-3
Dátum 2014
Egyéb Series Title: WebSci '14
Citation Key: Holzmann:2014:NEE:2615569.2615639
DOI 10.1145/2615569.2615639
Kivonat Accessing Web archives raises a number of issues caused by their temporal characteristics. Additional knowledge is needed to find and understand older texts. Especially entities mentioned in texts are subject to change. Most severe in terms of information retrieval are name changes. In order to find entities that have changed their name over time, search engines need to be aware of this evolution. We tackle this problem by analyzing Wikipedia in terms of entity evolutions mentioned in articles. We present statistical data on excerpts covering name changes, which will be used to discover similar text passages and extract evolution knowledge in future work.
Kiadvány címe Proceedings of the 2014 ACM Conference on Web Science
Hozzáadás dátuma 2021. 08. 09. 8:43:38
Módosítás dátuma 2021. 08. 09. 8:43:38

Címkék:

  • named entity evolution
  • semantics
  • wikipedia

Named entity evolution recognition on the Blogosphere.

Típus Folyóiratcikk
Szerző Helge Holzmann
Szerző Nina Tahmasebi
Szerző Thomas Risse
URL http://10.0.3.239/s00799-014-0135-x
Kötet 15
Szám 2-4
Oldalszám 209-235
Kiadvány International Journal on Digital Libraries
ISSN 14325012
Dátum 2015-04
Egyéb Number: 2-4
Publisher: Springer Science & Business Media B.V.
Kivonat Advancements in technology and culture lead to changes in our language. These changes create a gap between the language known by users and the language stored in digital archives. It affects user's possibility to firstly find content and secondly interpret that content. In a previous work, we introduced our approach for named entity evolution recognition (NEER) in newspaper collections. Lately, increasing efforts in Web preservation have led to increased availability of Web archives covering longer time spans. However, language on the Web is more dynamic than in traditional media and many of the basic assumptions from the newspaper domain do not hold for Web data. In this paper we discuss the limitations of existing methodology for NEER. We approach these by adapting an existing NEER method to work on noisy data like the Web and the Blogosphere in particular. We develop novel filters that reduce the noise and make use of Semantic Web resources to obtain more information about terms. Our evaluation shows the potentials of the proposed approach. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:43:02
Módosítás dátuma 2021. 08. 09. 8:43:02

Címkék:

  • DIGITAL preservation
  • WEB archiving
  • BLOGS
  • Blogs
  • DBpedia
  • Named entity evolution
  • Semantic Web
  • SEMANTIC Web
  • WEB databases

National Libraries' Traditional Collection Policy Facing Web Archiving

Típus Folyóiratcikk
Szerző Rivka Shveiky
Szerző Judit Bar-Ilan
URL https://search.proquest.com/docview/1548796786?accountid=27464
Kötet 24
Szám 3
Oldalszám 37-72
Kiadvány Alexandria
ISSN 0955-7490
Dátum 2013
Egyéb Number: 3
PMID: 1548796786
Publisher: Sage Publications Ltd.
Place: London
Nyelv English
Kivonat One of the main missions of a national library is to preserve the national creative works in printed and non-printed formats. In the 1990s, national libraries began to harvest and archive the national body of creative work that was published on the internet. The aim of the study was to examine to what extent national libraries implement their general collection policy when they establish a national web archive. The study, which was based on a qualitative approach, had three phases: examining the characteristics of a traditional collection policy of a national library; identifying the characteristics of a collection policy of a national library’s web archive; and comparing the traditional collection characteristics with the national library’s web archive characteristics. The results showed that although the libraries that were studied were from different regions of the world and various cultures, the characteristics of their traditional collections are similar. In contrast, the difference between their web archives is more significant. National libraries do not apply the traditional policy to the internet, and struggle to shape new rules for coping with web contents.
Hozzáadás dátuma 2021. 08. 09. 8:42:01
Módosítás dátuma 2021. 08. 09. 8:42:01

Címkék:

  • Library And Information Sciences

National Web Archiving in Australia: Representing the Comprehensive

Típus Könyvfejezet
Szerző Paul Koerbin
Szerkesztő Daniel Gomes
Szerkesztő Elena Demidova
Szerkesztő Jane Winters
Szerkesztő Thomas Risse
URL https://doi.org/10.1007/978-3-030-63291-5_3
Hely Cham
Kiadó Springer International Publishing
Oldalszám 23-32
ISBN 978-3-030-63291-5
Dátum 2021
Egyéb DOI: 10.1007/978-3-030-63291-5_3
Hozzáférés 2021. 07. 15. 9:52:26
Könyvtár Katalógus Springer Link
Nyelv en
Kivonat National libraries have been at the forefront of web archiving since the activity commenced in the mid-1990s. This effort is built upon and sustained by their long-term strategic focus, curatorial experience and mandate to collect a nation’s documentary heritage. Nevertheless, their specific legal remit, resources and strategic priorities will affect the objectives and the outcomes of national web archiving programmes. The National Library of Australia’s web archiving programme, being among the earliest established and longest sustained activities, provides a case study on the origin and building of a practical approach to comprehensive national collecting and access.
Könyv címe The Past Web: Exploring Web Archives
Rövid cím National Web Archiving in Australia
Hozzáadás dátuma 2021. 08. 09. 8:43:58
Módosítás dátuma 2021. 08. 09. 8:43:58

Nationale Grenzen im World Wide Web – Erfahrungen bei der Webarchivierung in der Österreichischen Nationalbibliothek

Típus Folyóiratcikk
Szerző Michaela Mayr
Szerző Andreas Predikaka
URL https://search.proquest.com/docview/1780113609?accountid=27464
Kötet 40
Szám 1
Oldalszám 90-95
Kiadvány Bibliothek Forschung und Praxis
ISSN 1865-7648
Dátum 2016-01-01
Egyéb Number: 1
PMID: 1780113609
Publisher: Walter de Gruyter GmbH
Place: Berlin
DOI 10.1515/bfp-2016-0007
Nyelv English
Kivonat Since 2009, the Austrian National Library performed four broad crawls, based on the Austrian Media Act, which focused primarily on the top level domain .at. The analysis of the crawls indicates that the aspect of national borders for the cultural heritage within the World Wide Web plays an important role for collection methods.
Hozzáadás dátuma 2021. 08. 09. 8:41:41
Módosítás dátuma 2021. 08. 09. 8:41:41

Címkék:

  • Web archiving
  • World Wide Web
  • Library And Information Sciences
  • National libraries
  • 3.11:NATIONAL LIBRARIES AND STATE LIBRARIES
  • 14.11:COMMUNICATIONS AND INFORMATION TECHNOLOGY –
  • Webarchivierung
  • Austria
  • Broad Crawl
  • Domain Crawl
  • Österreichische Nationalbibliothek

NEAR-Miner: Mining Evolution Associations of Web Site Directories for Efficient Maintenance of Web Archives

Típus Folyóiratcikk
Szerző Ling Chen
Szerző Sourav S Bhowmick
Szerző Wolfgang Nejdl
URL http://dx.doi.org/10.14778/1687627.1687757
Kötet 2
Szám 1
Oldalszám 1150-1161
Kiadvány Proc. VLDB Endow.
ISSN 2150-8097
Dátum 2009
Egyéb Number: 1
Publisher: VLDB Endowment
Citation Key: Chen:2009:NME:1687627.1687757
DOI 10.14778/1687627.1687757
Kivonat Web archives preserve the history of autonomous Web sites and are potential gold mines for all kinds of media and business analysts. The most common Web archiving technique uses crawlers to automate the process of collecting Web pages. However, (re)downloading entire collection of pages periodically from a large Web site is unfeasible. In this paper, we take a step towards addressing this problem. We devise a data mining-driven policy for selectively (re)downloading Web pages that are located in hierarchical directory structures which are believed to have changed significantly (e.g., a substantial percentage of pages are inserted to/removed from the directory). Consequently, there is no need to download and maintain pages that have not changed since the last crawl as they can be easily retrieved from the archive. In our approach, we propose an off-line data mining algorithm called near-Miner that analyzes the evolution history of Web directory structures of the original Web site stored in the archive and mines negatively correlated association rules (near) between ancestor-descendant Web directories. These rules indicate the evolution correlations between Web directories. Using the discovered rules, we propose an efficient Web archive maintenance algorithm called warm that optimally skips the subdirectories (during the next crawl) which are negatively correlated with it in undergoing significant changes. Our experimental results with real data show that our approach improves the efficiency of the archive maintenance process significantly while sacrificing slightly in keeping the "freshness" of the archives. Furthermore, our experiments demonstrate that it is not necessary to discover nears frequently as the mining rules can be utilized effectively for archive maintenance over multiple versions.
Hozzáadás dátuma 2021. 08. 09. 8:43:17
Módosítás dátuma 2021. 08. 09. 8:43:17

Nearline Web Archiving

Típus Jelentés
Szerző Zhiwu Xie
Szerző Krati Nayyar
Szerző Edward A. Fox
Szerző
Szerző 3
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Hely United States, North America
Dátum 2016
Kivonat In this paper, we propose a modified approach to realtime transactional web archiving. It leverages the web caching infrastructure that is already prevalent on web servers. Instead of archiving web content at HTTP transaction time, in our approach the archiving happens when the cached copy expires and is about to be expunged. Before the deletion, all expired cache copies are combined and then sent to the web archive in small batches. Since the cache is purged at much lower frequency than HTTP transactions, the archival workload is also much lower than that for transactional archiving. To further decrease the processing load at the origin server, archival copy deduplication is carried out at the archive instead of at the origin server. It is crucial to note that the cache purging process is separate from those that serve the HTTP requests. It can be, and usually is set to lower priority. The archiving therefore occurs only when the server is not busy fulfilling its more mission critical tasks; this is much less disruptive to the origin server. This approach, however, does not guarantee that the freshest copy is archived, although the cache purging policy may be adjusted to attempt to bound the freshness of the archive.
Hozzáadás dátuma 2021. 08. 09. 8:42:58
Módosítás dátuma 2021. 08. 09. 8:42:58

Címkék:

  • Web archiving
  • Apache web server
  • Nearline web archiving
  • Web cache

Negotiating the Web of the Past

Típus Folyóiratcikk
Szerző Valerie Valérie Schafer
Szerző Francesca Musiani
Szerző Marguerite Borelli
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kiadvány French Journal for Media Research
Dátum 2016
Egyéb Publisher: HAL CCSD
Place: France, Europe
Nyelv English
Kivonat The material, practical, theoretical elements of Web archiving as an ensemble of practices and a terrain of inquiry are inextricably entwined. Thus, its processes and infrastructures – often discreet and invisible – are increasingly relevant. Approaches inspired by Science and Technology Studies (STS) can contribute to shed light on the shaping of Web archives.
Hozzáadás dátuma 2021. 08. 09. 8:43:33
Módosítás dátuma 2021. 08. 09. 8:43:33

Címkék:

  • Web archiving
  • Web archives
  • [ SHS.INFO ] Humanities and Social Sciences/Librar
  • Archivage du Web
  • Archives du Web
  • Born Digital Heritage
  • Born-Digital Heritage
  • gouvernance
  • governance
  • info:eu-repo/semantics/article
  • Patrimoine nativement numérique
  • STS

Nemzetközi körkép a webarchiválás gyakorlatáról

Típus Folyóiratcikk
Szerző Márton Németh
Kötet 63
Szám 4
Oldalszám 575-582
Kiadvány Könyvtári figyelő
ISSN 0023-3773
Dátum 2017
Egyéb Number: 4
Kivonat A webarchiválás olyan dinamikusan fejlődő terület, mely számos vonatkozásban már a korábbiakban is felbukkant a Könyvtári Figyelő hasábjain, különösen a nemzetközi szakirodalom szemlézése kapcsán. (Például 2014-ben Hegyközi Ilona tekintette át a webarchiválással kapcsolatos nemzetközi trendeket.) Úgy éreztük, eljött az ideje egy újabb összegzésnek. Ennek különös hangsúlyt ad, hogy számos korábbi kezdeményezést követően, idén tavasztól megteremtődtek az alapjai az OSZK fejlesztési projektjén belül egy olyan kísérleti projekt elindításának, melyben felmérjük a webarchiváláshoz szükséges hardver és szoftver igényeket, valamint szakmai ismereteket. A fő cél, hogy jól megalapozott módon integrálni tudjuk e területet hosszú távon is az OSZK szolgáltatási tevékenységei közé. Az OSZK Elektronikus Könyvtári Szolgáltatások Osztályán létrehoztunk egy Magyar Internet Archívum honlapot (http://mekosztaly.oszk.hu/mia), melyen tanulmányozhatók a webarchiválás különféle módszerei, alapfogalmai, meg a nemzetközi szakirodalom. Továbbá a projekttel kapcsolatos aktuális információkkal is szolgálunk és fel lehet iratkozni a webarchiválás szakmai kérdéseit tárgyaló levelezőlistára is. Ennek a cikknek nem az a célja tehát, hogy a webarchiválási tevékenységek szakmai alapjait járja körül (amelyre a honlapot böngészve nyílik lehetőség), hanem, hogy áttekintést adjunk a webarchiválási szolgáltatásokat megalapozó nemzetközi jó gyakorlatokból.
Hozzáadás dátuma 2021. 08. 09. 8:43:30
Módosítás dátuma 2021. 08. 09. 8:43:30

Címkék:

  • webarchiválás
  • nemzetközi körkép

Netlab Web Archiving Course Brochure

Típus Weboldal
Szerző Netlab
Dátum 2018
Hozzáférés 2019. 01. 28. 1:00:00
Website címe Course Brochure
Hozzáadás dátuma 2021. 08. 09. 8:43:42
Módosítás dátuma 2021. 08. 09. 8:43:42

Netlab-courses

Típus Weboldal
Szerző Netlab
URL http://www.netlab.dk/services/courses/
Dátum 2018
Hozzáférés 2019. 01. 28. 1:00:00
Website címe Netlab Course page
Hozzáadás dátuma 2021. 08. 09. 8:43:42
Módosítás dátuma 2021. 08. 09. 8:43:42

New medium, old archives? Exploring archival potential in The Live Art Collection of the UK Web Archive

Típus Folyóiratcikk
Szerző Vanessa Bartlett
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 10
Szám 1
Oldalszám 91-103
Kiadvány International Journal of Performance Arts and Digital Media
ISSN 1479-4713
Dátum 2014-01-02
Egyéb Number: 1
DOI 10.1080/14794713.2014.912504
Kivonat This article speculates about the new kinds of historical information that performance scholars may be able to preserve as a result of recent innovations in web archiving. Using The Live Art Collection of the UK Web Archive as its case study, the article draws on influences from oral history, new media theory and the digital humanities. Beginning with an assertion that the Web has a tendency to aggregate existing media forms into one archival location, the article makes the case that online writing is key to web archiving's potential to document new kinds of knowledge about performance and live art. Subsequently it points to limitations in the current archival structures of the collection and concludes that further innovation is required in order to maximize the scholarly potential of the material contained within it. Interviews with the team who manage and curate the collection are used throughout to support assertions about the collections intended use and functions. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:42:52
Módosítás dátuma 2021. 08. 09. 8:42:52

Címkék:

  • web archiving
  • DIGITAL libraries
  • WEB archives
  • DOCUMENTATION
  • ART museums
  • DIGITAL humanities
  • internet
  • live art
  • oral history
  • UK Web Archive

No Copies, No Comments

Típus Folyóiratcikk
Szerző Lauree Padgett
URL https://search.proquest.com/docview/1861789618?accountid=27464
Kötet 33
Szám 10
Oldalszám 19
Kiadvány Information Today
Dátum 2016-12
Egyéb Number: 10
Publisher: Information Today, Inc.
Place: Medford
Nyelv English
Kivonat MIA-Missing in Archives "Disappearing News Archives," an Online Searcher feature by Sarah Jane Davis, contains, in part, text Davis cites from a March 16, 2016, ResearchBuzz blog post by Tara Calishain, as well as additional comments Calishain emailed to Online Searcher editor-in-chief Marydee Ojala. There is an irreplaceable connection that comes from holding and reading an ink-lined paper, with a few crossed-out words and some smudges, or fingering a faded snapshot, yellowing and curling up at the edges, that was lovingly pressed into a page by hand, not automatically done with perfect precision via Shutterfly.
Hozzáadás dátuma 2021. 08. 09. 8:42:27
Módosítás dátuma 2021. 08. 09. 8:42:27

Címkék:

  • Web archiving
  • Library And Information Sciences–Computer Applica
  • Archives & records
  • Internet
  • Social networks
  • Turkey

No More 404s

Típus Dolgozat
Szerző Ke Zhou
Szerző Claire Grover
Szerző Martin Klein
Szerző Richard Tobin
URL http://dl.acm.org/citation.cfm?doid=2756406.2756940
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 233-236
ISBN 978-1-4503-3594-2
Dátum 2015
Egyéb Series Title: JCDL '15
Citation Key: Zhou:2015:NMP:2756406.2756940
DOI 10.1145/2756406.2756940
Kivonat The citation of resources is a fundamental part of scholarly discourse. Due to the popularity of the web, there is an increasing trend for scholarly articles to reference web resources (e.g. software, data). However, due to the dynamic nature of the web, the referenced links may become inaccessible ('rotten') sometime after publication, returning a "404 Not Found" HTTP error. In this paper we first present some preliminary findings of a study of the persistence and availability of web resources referenced from papers in a large-scale scholarly repository. We reaffirm previous research that link rot is a serious problem in the scholarly world and that current web archives do not always preserve all rotten links. Therefore, a more pro-active archival solution needs to be developed to further preserve web content referenced in scholarly articles. To this end, we propose to apply machine learning techniques to train a link rot predictor for use by an archival framework to prioritise pro-active archiving of links that are more likely to be rotten. We demonstrate that we can obtain a fairly high link rot prediction AUC (0.72) with only a small set of features. By simulation, we also show that our prediction framework is more effective than current web archives for preserving links that are likely to be rotten. This work has a potential impact for the scholarly world where publishers can utilise this framework to prioritise the archiving of links for digital preservation, especially when there is a large quantity of links to be archived.
Kiadvány címe Proceedings of the 15th ACM/IEEE-CE on Joint Conference on Digital Libraries – JCDL '15
Hozzáadás dátuma 2021. 08. 09. 8:43:26
Módosítás dátuma 2021. 08. 09. 8:43:26

Címkék:

  • digital preservation
  • repositories
  • web persistence

Not all mementos are created equal: measuring the impact of missing resources

Típus Folyóiratcikk
Szerző Justin F Brunelle
Szerző Mat Kelly
Szerző Hany Salaheldeen
Szerző Michele C Weigle
Szerző Michael L Nelson
URL https://search.proquest.com/docview/1703891222?accountid=27464
Kötet 16
Szám 3-4
Oldalszám 283-301
Kiadvány International Journal on Digital Libraries
ISSN 14325012
Dátum 2015-09
Egyéb Number: 3-4
Publisher: Springer Science & Business Media
Place: Heidelberg
DOI http://dx.doi.org/10.1007/s00799-015-0150-6
Nyelv English
Kivonat (ProQuest: … denotes formulae and/or non-USASCII text omitted; see image) Issue Title: Focused Issue on Digital Libraries 2014 Web archives do not always capture every resource on every page that they attempt to archive. This results in archived pages missing a portion of their embedded resources. These embedded resources have varying historic, utility, and importance values. The proportion of missing embedded resources does not provide an accurate measure of their impact on the Web page; some embedded resources are more important to the utility of a page than others. We propose a method to measure the relative value of embedded resources and assign a damage rating to archived pages as a way to evaluate archival success. In this paper, we show that Web users' perceptions of damage are not accurately estimated by the proportion of missing embedded resources. In fact, the proportion of missing embedded resources is a less accurate estimate of resource damage than a random selection. We propose a damage rating algorithm that provides closer alignment to Web user perception, providing an overall improved agreement with users on memento damage by 17 % and an improvement by 51 % if the mementos have a damage rating delta ……0.30. We use our algorithm to measure damage in the Internet Archive, showing that it is getting better at mitigating damage over time (going from a damage rating of 0.16 in 1998 to 0.13 in 2013). However, we show that a greater number of important embedded resources (2.05 per memento on average) are missing over time. Alternatively, the damage in WebCite is increasing over time (going from 0.375 in 2007 to 0.475 in 2014), while the missing embedded resources remain constant (13 % of the resources are missing on average). Finally, we investigate the impact of JavaScript on the damage of the archives, showing that a crawler that can archive JavaScript-dependent representations will reduce memento damage by 13.5 %.
Hozzáadás dátuma 2021. 08. 09. 8:42:16
Módosítás dátuma 2021. 08. 09. 8:42:16

Címkék:

  • Web archiving
  • Digital libraries
  • Digital preservation
  • Library And Information Sciences–Computer Applica
  • World Wide Web
  • Digital archives
  • Web architecture
  • Memento damage

Now You See It, Now You Don't. Unless …

Típus Folyóiratcikk
Szerző Shirley Duglin Kennedy
URL https://search.proquest.com/docview/1761628166?accountid=27464
Kötet 32
Szám 10
Oldalszám 8
Kiadvány Information Today
Dátum 2015-12
Egyéb Number: 10
Publisher: Information Today, Inc.
Place: Medford
Nyelv English
Kivonat According to Jill Lepore, the average life of a webpage is 100 days. As she notes, the embarrassing stuff seems to stick around a lot longer, but it's an indisputable fact that web-based content often goes missing: corporate reports, scholarly articles, government documents, working papers, maps, and creative works of all sorts. The Internet Archive and its Wayback Machine are pretty much universally loved by information professionals. You already know this, but aside from the Wayback Machine's valuable research function, the Internet Archive itself is a major time suck. Entertainment value aside, in late October, the Internet Archive announced on its blog that "with generous support from the Laura and John Arnold Foundation," it was planning to build "the Next Generation Wayback Machine".
Hozzáadás dátuma 2021. 08. 09. 8:42:20
Módosítás dátuma 2021. 08. 09. 8:42:20

Címkék:

  • Web archiving
  • Library And Information Sciences–Computer Applica
  • Archives & records
  • Internet
  • Information professionals
  • 5.18:ELECTRONIC MEDIA
  • High density storage

Nuove prospettive per il web archiving: gli standard ISO 28500 (formato WARC) e ISO/TR 14873 sulla qualità del web archiving

Típus Jelentés
Szerző Stefano Allegrezza
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Hely Italy, Europe
Dátum 2015
Egyéb ISSN: 1972-6201
Kivonat Il Web archiving è un argomento di forte attualità in quanto, come è noto, se non si individuano in breve tempo soluzioni efficaci e sostenibili nel lungo periodo, si rischia di perdere per sempre quello che si è prodotto e pubblicato sul Web negli ultimi venti-trenta anni, dal momento che tale materiale è caratterizzato da un’estrema mutevolezza e dinamicità e spesso interi siti Web cambiano o scompaiono nel giro di poco tempo. Le soluzioni che sono state proposte fino ad oggi sono parziali e non sempre hanno raggiunto l’obiettivo. Tuttavia, recentemente ci sono state due novità che sembrerebbero poter assicurare prospettive migliori: si tratta da una parte della proposta di un formato elettronico specificatamente pensato per l’archiviazione del Web (il formato WARC), dall’altra della pubblicazione di una specifica norma ISO dedicata alla qualità nella conservazione del Web (ISO/TR 14873:2013). La rilevanza dell’argomento per il settore dei beni culturali è tale che è opportuno fare un po’ di chiarezza su queste tematiche analizzando sia lo stato dell’arte che le prospettive future.
Hozzáadás dátuma 2021. 08. 09. 8:43:00
Módosítás dátuma 2021. 08. 09. 8:43:00

Címkék:

  • web archiving
  • digital preservation
  • WARC
  • archiviazione del web
  • conservazione digitale

Observations on the development of non-print legal deposit in the UK

Típus Folyóiratcikk
Szerző Richard Gibby
Szerző Caroline Brazier
URL https://search.proquest.com/docview/1080973857?accountid=27464
Kötet 61
Szám 5
Oldalszám 362-377
Kiadvány Library Review
ISSN 00242535
Dátum 2012
Egyéb Number: 5
PMID: 1080973857
Publisher: Emerald Group Publishing Limited
Place: Bradford
DOI http://dx.doi.org/10.1108/00242531211280487
Nyelv English
Kivonat Purpose – The process of developing and implementing UK legislation for the legal deposit of electronic and other non-print publications has been lengthy and remains incomplete, although the Government has consulted on draft regulations for implementation in 2013. The purpose of this paper is to provide a short account of progress and review the experience, analysing several factors that have influenced the legislative process and helped shape the proposed regulations. It summarises the regulatory and non-regulatory steps taken by the UK legal deposit libraries to address the legitimate concerns of publishers and describes some of the practical implications of implementing legal deposit for non-print publications. Design/methodology/approach – The paper draws upon the personal experiences of the authors, who have been directly involved in the legislative process and negotiations with publishers and other stakeholders. Findings – The paper provides new information and a summary of key issues and outcomes, with explanations and some insights into the factors that have influenced them. Originality/value – This paper provides new information about the development of legal deposit in the UK and a review of the issues that have affected its progress.
Hozzáadás dátuma 2021. 08. 09. 8:41:59
Módosítás dátuma 2021. 08. 09. 8:41:59

Címkék:

  • Library And Information Sciences
  • Archives & records
  • United Kingdom–UK
  • Libraries
  • Metadata
  • Publications

Observing Web Archives: The Case for an Ethnographic Study of Web Archiving

Típus Dolgozat
Szerző Jessica Ogden
Szerző Susan Halford
Szerző Leslie Carr
URL http://doi.acm.org/10.1145/3091478.3091506
Hely New York, NY, USA
Kiadó ACM
Oldalszám 299-308
ISBN 978-1-4503-4896-6
Dátum 2017
Egyéb Series Title: WebSci '17
Citation Key: Ogden:2017:OWA:3091478.3091506
DOI 10.1145/3091478.3091506
Kivonat This paper makes the case for studying the work of web archivists, in an effort to explore the ways in which practitioners shape the preservation and maintenance of the archived Web in its various forms. An ethnographic approach is taken through the use of observation, interviews and documentary sources over the course of several weeks in collaboration with web archivists, engineers and managers at the Internet Archive – a private, non-profit digital library that has been archiving the Web since 1996. The concept of web archival labour is proposed to encompass and highlight the ways in which web archivists (as both networked human and non-human agents) shape and maintain the preserved Web through work that is often embedded in and obscured by the complex technical arrangements of collection and access. As a result, this engagement positions web archives as places of knowledge and cultural production in their own right, revealing new insights into the performative nature of web archiving that have implications for how these data are used and understood.1
Kiadvány címe Proceedings of the 2017 ACM on Web Science Conference
Hozzáadás dátuma 2021. 08. 09. 8:43:22
Módosítás dátuma 2021. 08. 09. 8:43:22

Címkék:

  • web archiving
  • information labour
  • knowledge production
  • materiality
  • sts

Offene Archive: Archive, Nutzer und Technologie im Miteinander

Típus Folyóiratcikk
Szerző Bastian Gillner
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Kötet 71
Szám 1
Oldalszám 13-21
Kiadvány OPEN ARCHIVES: ARCHIVES, USERS AND TECHNOLOGY INTERCONNECTED.
ISSN 00039500
Dátum 2018-01
Egyéb Number: 1
Kivonat The use of archives in the digital age is still a mostly analogue activity. This is not only due to the fact that the digitization of materials is costly and time-consuming, but also that there is a widely spread lack of interest in using the possibilities provided by the internet for the own agenda. For two decades the internet has primarily been a place for archives to present fixed (meta)data of archival materials. The concept of open archives strives to adapt the use of archives so far to the realities of the digital age. Its goal is to facilitate open data, focussing on users and using of digital tools. Only the interaction of those aspects can help show archives a way how to make the cultural heritage available to a large audience in a digital environment and how to make use of it in a variety of manners. [ABSTRACT FROM AUTHOR]
Hozzáadás dátuma 2021. 08. 09. 8:43:24
Módosítás dátuma 2021. 08. 09. 8:43:24

Címkék:

  • Web archives
  • Archives
  • Digital libraries
  • Digitization of archival materials
  • Open data movement
  • Preservation of cultural property

On Automatically Tagging Web Documents from Examples

Típus Dolgozat
Szerző Nicholas Joel Woodward
Szerző Weijia Xu
Szerző Kent Norsworthy
URL http://doi.acm.org/10.1145/2348283.2348494
Hely New York, NY, USA
Kiadó ACM
Oldalszám 1111-1112
ISBN 978-1-4503-1472-5
Dátum 2012
Egyéb Series Title: SIGIR '12
Citation Key: Woodward:2012:ATW:2348283.2348494
DOI 10.1145/2348283.2348494
Kivonat An emerging need in information retrieval is to identify a set of documents conforming to an abstract description. This task presents two major challenges to existing methods of document retrieval and classification. First, similarity based on overall content is less effective because there may be great variance in both content and subject of documents produced for similar functions, e.g. a presidential speech or a government ministry white paper. Second, the function of the document can be defined based on user interests or the specific data set through a set of existing examples, which cannot be described with standard categories. Additionally, the increasing volume and complexity of document collections demands new scalable computational solutions. We conducted a case study using web-archived data from the Latin American Government Documents Archive (LAGDA) to illustrate these problems and challenges. We propose a new hybrid approach based on Naïve Bayes inference that uses mixed n-gram models obtained from a training set to classify documents in the corpus. The approach has been developed to exploit parallel processing for large scale data set. The preliminary work shows promising results with improved accuracy for this type of retrieval problem.
Kiadvány címe Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval
Hozzáadás dátuma 2021. 08. 09. 8:43:36
Módosítás dátuma 2021. 08. 09. 8:43:36

Címkék:

  • web archive
  • na?ve bayesian classification

On Identifying the Bounds of an Internet Resource

Típus Dolgozat
Szerző Faryaneh Poursardar
Szerző Frank Shipman
URL http://dl.acm.org/citation.cfm?doid=2854946.2854982
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 305-308
ISBN 978-1-4503-3751-9
Dátum 2016
DOI 10.1145/2854946.2854982
Kivonat Systems for retrieving or archiving Internet resources often assume a URI acts as a delimiter for the resource. But there are many situations where Internet resources do not have a one-to-one mapping with URIs. For URIs that point to the first page of a document that has been broken up over multiple pages, users are likely to consider the whole article as the resource, even though it is spread across multiple URIs. Comments, tags, ratings, and advertising might or might not be perceived as part of the resource whether they are retrieved as part of the primary URI or accessed via a link. Similarly, whether content accessible via links, tabs, or other navigation av ailable at the primary URI is perceived as part of the resource may depend on the design of the website. We are examining what people believe are the bounds of Internet resources with the hope of informing systems that better match user perceptions. To unders tand this challenge we explore a situation where the user is assumed to have identified a resource by a URI, particularly for archiving. To begin to answer these questions, we asked 110 participan ts how desirable it would be for web contents related to an id entified archived resource to also be archived. Results indicate that the features important to this decision likely vary considerably from resource to resource.
Kiadvány címe Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval – CHIIR '16
Hozzáadás dátuma 2021. 08. 09. 8:41:51
Módosítás dátuma 2021. 08. 09. 8:41:51

On the Applicability of Delicious for Temporal Search on Web Archives

Típus Dolgozat
Szerző Helge Holzmann
Szerző Wolfgang Nejdl
Szerző Avishek Anand
URL http://dl.acm.org/citation.cfm?doid=2911451.2914724
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 929-932
ISBN 978-1-4503-4069-4
Dátum 2016
DOI 10.1145/2911451.2914724
Kiadvány címe Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval – SIGIR '16
Hozzáadás dátuma 2021. 08. 09. 8:41:51
Módosítás dátuma 2021. 08. 09. 8:41:51

Online British Official Publications from the University of Southampton

Típus Folyóiratcikk
Szerző Joy Caisley
Szerző Julian Ball
Szerző Matthew Phillips
URL https://search.proquest.com/docview/1803448852?accountid=27464
Kötet 32
Szám 2
Oldalszám 27-32
Kiadvány Refer
ISSN 01442384
Dátum 2016
Egyéb Number: 2
Publisher: Information Services Group, Chartered Institute of Library and Information Professionals
Place: London
Nyelv English
Kivonat The Library at the University of Southampton has a particularly strong collection of printed British Official Publications, known as the Ford Collection. The collection is named after the late Professor Percy Ford and his wife Dr Grace Ford who brought the collection to the University of Southampton in the 1950s from the Carlton Club and conducted research based on the collection. Hoping to increase both the appreciation and the use of official publications, Ford, the Fords compiled breviates or select lists, in seven volumes covering the years 1833-1983. These were not catalogues of all British Official Publications. Instead the Fords identified and summarised documents which have been, or might have been, the subject of legislation or have dealt with public policy, Ford. Although funding sources were for specific tasks and periods, the Library continues to work unfunded with these valuable digital collections in 2016 to ensure that they are made fully accessible for readers worldwide.
Hozzáadás dátuma 2021. 08. 09. 8:41:42
Módosítás dátuma 2021. 08. 09. 8:41:42

Címkék:

  • Collaboration
  • Web archiving
  • Library And Information Sciences
  • Academic libraries
  • Internet
  • Library collections
  • Metadata
  • Publications
  • 18th century
  • 20th century
  • Bibliographic records
  • Colleges & universities
  • Current awareness services
  • Funding

Only One Out of Five Archived Web Pages Existed as Presented

Típus Dolgozat
Szerző Scott G. Ainsworth
Szerző Michael L. Nelson
Szerző Herbert Van de Sompel
URL http://dl.acm.org/citation.cfm?doid=2700171.2791044
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 257-266
ISBN 978-1-4503-3395-5
Dátum 2015
DOI 10.1145/2700171.2791044
Kivonat When a user retrieves a page from a web archive, the page is marked with the acquisition datetime of the root resource, which effectively asserts "this is how the page looked at a that datetime." However, embedded resources, such as images, are often archived at different datetimes than the main page. The presentation appears temporally coherent, but is composed from resources acquired over a wide range of datetimes. We examine the completeness and temporal coherence of composite archived resources (composite mementos) under two selection heuristics. The completeness and temporal coherence achieved using a single archive was compared to the results achieved using multiple archives. We found that at most 38.7% of composite mementos are both temporally coherent and that at most only 17.9% (roughly 1 in 5) are temporally coherent and 100% complete. Using multiple archives increases mean completeness by 3.1-4.1% but also reduces temporal coherence.
Kiadvány címe Proceedings of the 26th ACM Conference on Hypertext & Social Media – HT '15
Hozzáadás dátuma 2021. 08. 09. 8:41:50
Módosítás dátuma 2021. 08. 09. 8:41:50

Ontology-Based Automatic Annotation: An Approach for Efficient Retrieval of Semantic Results of Web Documents

Típus Könyvfejezet
Szerző R Lakshmi Tulasi
Szerző Meda Sreenivasa Rao
Szerző K Ankita
Szerző R Hgoudar
Szerkesztő Suresh Chandra Satapathy
Szerkesztő V Kamakshi Prasad
Szerkesztő B Padmaja Rani
Szerkesztő Siba K Udgata
Szerkesztő K Srujan Raju
URL http://link.springer.com/10.1007/978-981-10-2471-9_32
Hely Singapore
Kiadó Springer Singapore
Oldalszám 331-339
ISBN 978-981-10-2471-9
Dátum 2017
Egyéb DOI: 10.1007/978-981-10-2471-9_32
Kivonat The Web contains large amount of data of unstructured nature which gives the relevant as well as irrelevant results. To remove the irrelevancy in results, a methodology is defined which would retrieve the semantic information. Semantic search directly deals with the knowledge base which is domain specific. Everyone constructs ontology knowledge base in their own way, which results in heterogeneity in ontology. The problem of heterogeneity can be resolved by applying the algorithm of ontology mapping. All the documents are collected by Web crawler from the Web and a document base is created. The documents are then given as an input for performing semantic annotation on the updated ontology. The results against the users query are retrieved from semantic information retrieval system after applying searching algorithm on it. The experiments conducted with this methodology show that the results thus obtained provide more accurate and precise information.
Könyv címe Proceedings of the First International Conference on Computational Intelligence and Informatics
Hozzáadás dátuma 2021. 08. 09. 8:43:27
Módosítás dátuma 2021. 08. 09. 8:43:27

Open Challenges for the Management and Preservation of Evolving Data on the Web

Típus Folyóiratcikk
Szerző Lars Gleim
Szerző Stefan Decker
URL http://ceur-ws.org/Vol-2821/paper9.pdf
Oldalszám 7
Dátum 2020
Könyvtár Katalógus Zotero
Nyelv en
Kivonat As the volume, variety, and velocity of data published on the Web continue to increase, the management, governance and preservation of these data play an increasingly important role. Data-driven decision making and algorithmic control systems rely on the persistent availability of critical information. However, to date, the free sharing, reuse and interoperability of data are hindered by a number of fundamental open challenges for the management and preservation of evolving data on the Web. In this work, we provide an overview of open challenges and recent efforts to address them. We then propose a data persistence layer for data management and preservation, paving the way for increased interoperability and compatibility.
Hozzáadás dátuma 2021. 08. 09. 8:44:05
Módosítás dátuma 2021. 08. 09. 8:44:05

Open data as political web archives : citizen involvement or reputation’s elected in a « digital public sphere » ?

Típus Jelentés
Szerző Mariannig Le Béchec
Szerző Isabelle Hare
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Hely France, Europe
Dátum 2015
Intézmény HAL CCSD
Kivonat International audience ; The access to digital data is an economic, social and political issue. Accessibility does not only focus on the online publication of these data in a database, but as well in the discourses made by stakeholders on the web, such as the French association Regards citoyens. Since 2009, this group aggregates data of the activity of French Deputies in the French National Assembly, on the website nosdeputes.fr. In this case, political people allow the circulation of data that are arranged by actors without professional requirements unlike journalists. We are here interested in the enrichment of public data by citizens who participate in the public sphere in a form that differs from the mass media. We do not want to comment this public sphere but to describe it from the devices, the mediations that connect institutions and citizens. Therefore, we discuss the opportunity that a website like nosdeputes.fr can become the holder of a "digital public sphere" and interrogate the form of citizen oversight it induces. The frame of data on nosdeputes.fr questions the relationship between citizens, media and elected officials. On the one hand, these devices change the relationship between citizens and political action. On the other hand, we can assume that these devices bring politicians to adapt some of their practices in the French National Assembly according to the electoral agenda. We do not focus on the influence of some actors but on the oversight of citizens induced by this device. For example, nosdeputes.fr has listed activities of the 577 French Deputies since 2009. This survey provides detailed analysis of political activity in National Assembly but it is also interested in the look of the "citizen", by the comments he leaves on MPs' action.
Hozzáadás dátuma 2021. 08. 09. 8:42:38
Módosítás dátuma 2021. 08. 09. 8:42:38

Címkék:

  • web archives
  • [ SHS.INFO ] Humanities and Social Sciences/Librar
  • digital public sphere
  • Iramuteq
  • open data

Optimizing Positional Index Structures for Versioned Document Collections

Típus Dolgozat
Szerző JInru He
Szerző Torsten Suel
URL http://doi.acm.org/10.1145/2348283.2348319
Hely New York, NY, USA
Kiadó ACM
Oldalszám 245-254
ISBN 978-1-4503-1472-5
Dátum 2012
Egyéb Series Title: SIGIR '12
Citation Key: He:2012:OPI:2348283.2348319
DOI 10.1145/2348283.2348319
Kivonat Versioned document collections are collections that contain multiple versions of each document. Important examples are Web archives, Wikipedia and other wikis, or source code and documents maintained in revision control systems. Versioned document collections can become very large, due to the need to retain past versions, but there is also a lot of redundancy between versions that can be exploited. Thus, versioned document collections are usually stored using special differential (delta) compression techniques, and a number of researchers have recently studied how to exploit this redundancy to obtain more succinct full-text index structures. In this paper, we study index organization and compression techniques for such versioned full-text index structures. In particular, we focus on the case of positional index structures, while most previous work has focused on the non-positional case. Building on earlier work in [zs:redun], we propose a framework for indexing and querying in versioned document collections that integrates non-positional and positional indexes to enable fast top-k query processing. Within this framework, we define and study the problem of minimizing positional index size through optimal substring partitioning. Experiments on Wikipedia and web archive data show that our techniques achieve significant reductions in index size over previous work while supporting very fast query processing.
Kiadvány címe Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval
Hozzáadás dátuma 2021. 08. 09. 8:43:11
Módosítás dátuma 2021. 08. 09. 8:43:11

Címkék:

  • index compression
  • inverted index
  • versioned documents

Out from the PLATO cave: uncovering the pre-Internet history of social computing

Típus Folyóiratcikk
Szerző Steve Jones
Szerző Guillaume Latzko-Toth
URL http://www.tandfonline.com/doi/abs/10.1080/24701475.2017.1307544
Kötet 1
Szám 1-2
Oldalszám 60-69
Kiadvány Internet Histories
ISSN 2470-1475
Dátum 2017-01-02
Egyéb Number: 1-2
Publisher: Routledge
DOI 10.1080/24701475.2017.1307544
Hozzáadás dátuma 2021. 08. 09. 8:41:45
Módosítás dátuma 2021. 08. 09. 8:41:45

Partnerships on Campus: Roles and Impacts on developing a New Online Research Resource at Boston College

Típus Folyóiratcikk
Szerző Kimberly C Kowal
Szerző Seth Meehan
URL https://search.proquest.com/docview/2024455820?accountid=27464
Kötet 88
Szám 3
Oldalszám 177-184
Kiadvány The Catholic Library World
ISSN 0008820X
Dátum 2018-03
Egyéb Number: 3
Publisher: Catholic Library Association
Place: Associate University Librarian Digital Initiatives & Services Boston College Libraries Boston College ; Associate Director Institute for Advanced Jesuit Studies Boston College ; Associate University Librarian Digital Initiatives & Services Boston College
Nyelv English
Kivonat A partnership at Boston College (BC) between the Libraries and the Institute for Advanced Jesuit Studies resulted in a blossoming of services and resources, made possible via a combination of discipline-focused scholarship and library digital expertise. With a shared, mission, the last three years have produced a number of programs and projects that relied upon a relationship of reciprocation, support, and ultimately the strategic directions guiding this Jesuit university.
Hozzáadás dátuma 2021. 08. 09. 8:42:18
Módosítás dátuma 2021. 08. 09. 8:42:18

Címkék:

  • Collaboration
  • Web archiving
  • Digital libraries
  • Digital preservation
  • Library And Information Sciences
  • Academic libraries
  • Boston Massachusetts
  • Digitization
  • Partnerships
  • Religious missions
  • Religious orders

Passing on the Lessons of the Great East Japan Earthquake to Future Generations—The National Diet Library Great East Japan Earthquake Archive

Típus Dolgozat
Szerző Sachiko INOUE
URL http://library.ifla.org/id/eprint/2217
Hely Kuala Lumpur
Kiadó IFLA
Dátum 2018
Kivonat In the aftermath of the Great East Japan Earthquake, which struck on March 11, 2011, the Japanese government recognized an urgent need to create a national archive of information about this unprecedented natural disaster, so that the learned lessons from this experience would not be lost. Having an obligation as a national library to collect, preserve, and share materials that record all aspects of Japan’s cultural heritage, the National Diet Library (NDL), in cooperation with other Japanese government agencies, has responded to this need by creating a portal site, called HINAGIKU, through which researchers can search and access a wide variety of earthquake archives. In this paper, I will report on our achievements as well as the challenges we face in configuring HINAGIKU to facilitate access to documentation published or archived primarily by the national and municipal government agencies. At present, HINAGIKU enables access to materials documenting both past experience and current disaster prevention planning via an integrated search functionality of multiple digital archives established by municipal governments, academic institutions, the Ministry of Internal Affairs and Communications, and other organizations as well as the NDL. Visitors to HINAGIKU are able to search records stored at the NDL and other institutions, and new knowledge generated from such research can also be integrated into HINAGIKU as new content. Over time, as interest in earthquake-related materials decreases, it becomes imperative that the NDL acquire and preserve these materials before such archives disappear. The NDL also has a role to play in handing down these most valuable records to future generations by managing issues related to copyright, personality rights, and secondary use, thereby making HINAGIKU even more useful.
Kiadvány címe IFLA WLIC 2018 – Kuala Lumpur, Malaysia – Transform Libraries, Transform Societies in Session 233 – Government Information and Official Publications.
Hozzáadás dátuma 2021. 08. 09. 8:43:30
Módosítás dátuma 2021. 08. 09. 8:43:30

Címkék:

  • disaster archive
  • Great East Japan Earthquake Archive
  • metadata
  • portal site
  • rights handling

Performance Measurement and Analysis of Transactional Web Archiving

Típus Szakdolgozat
Szerző Shivam Maharshi
URL http://search.ebscohost.com/login.aspx?authtype=ip,cookie,cpid&custid=s6213251&groupid=main&profile=eds
Dátum 2017
Kivonat Web archiving is necessary to retain the history of the World Wide Web and to study its evolution. It is important for the cultural heritage community. Some organizations are legally obligated to capture and archive Web content. The advent of transactional Web archiving makes the archiving process more efficient, thereby aiding organizations to archive their Web content. This study measures and analyzes the performance of transactional Web archiving systems. To conduct a detailed analysis, we construct a meaningful design space defined by the system specifications that determine the performance of these systems. SiteStory, a state-of-the-art transactional Web archiving system, and local archiving, an alternative archiving technique, are used in this research. We experimentally evaluate the performance of these systems using the Greek version of Wikipedia deployed on dedicated hardware on a private network. Our benchmarking results show that the local archiving technique uses a Web server’s resources more efficiently than SiteStory for one data point in our design space. Better performance than SiteStory in such scenarios makes our archiving solution favorable to use for transactional archiving. We also show that SiteStory does not impose any significant performance overhead on the Web server for the rest of the data points in our design space.
Hozzáadás dátuma 2021. 08. 09. 8:42:05
Módosítás dátuma 2021. 08. 09. 8:42:05

Címkék:

  • Digital Preservation
  • Web Archiving
  • Performance Benchmark

Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations

Típus Folyóiratcikk
Szerző Jonathan Zittrain
Szerző Kendra Albert
Szerző Lawrence Lessig
URL https://search.proquest.com/docview/1535097054?accountid=27464
Kötet 14
Szám 2
Oldalszám 88-99
Kiadvány Legal Information Management
ISSN 14726696
Dátum 2014-06
Egyéb Number: 2
Publisher: Cambridge University Press
Place: Cambridge
DOI http://dx.doi.org/10.1017/S1472669614000255
Nyelv English
Kivonat Abstract It has become increasingly common for a reader to follow a URL cited in a court opinion or a law review article, only to be met with an error message because the resource has been moved from its original online address. This form of reference rot, commonly referred to as 'linkrot', has arisen from the disconnect between the transience of online materials and the permanence of legal citation, and will only become more prevalent as scholarly materials move online. The present paper*, written by Jonathan Zittrain, Kendra Albert and Lawrence Lessig, explores the pervasiveness of linkrot in academic and legal citations, finding that more than 70% of the URLs within the Harvard Law Review and other journals, and 50% of the URLs within United States Supreme Court opinions, do not link to the originally cited information. In light of these results, a solution is proposed for authors and editors of new scholarship that involves libraries undertaking the distributed, long-term preservation of link contents. [PUBLICATION ABSTRACT]
Hozzáadás dátuma 2021. 08. 09. 8:43:04
Módosítás dátuma 2021. 08. 09. 8:43:04

Címkék:

  • web archiving
  • link rot
  • Library And Information Sciences
  • websites
  • legal citations

Perma.cc and Web Archival Dissonance with Copyright Law

Típus Folyóiratcikk
Szerző Paul Douglas Callister
URL https://doi.org/10.1080/0270319X.2021.1886785
Kötet 40
Szám 1
Oldalszám 1-57
Kiadvány Legal Reference Services Quarterly
ISSN 0270-319X
Dátum January 2, 2021
Egyéb Number: 1
Publisher: Routledge
_eprint: https://doi.org/10.1080/0270319X.2021.1886785
DOI 10.1080/0270319X.2021.1886785
Hozzáférés 2021. 07. 15. 11:20:43
Könyvtár Katalógus Taylor and Francis+NEJM
Kivonat Harvard’s Perma.cc offers the solution to link rot—the phenomenon that citations in academic journals to Web materials disappear with the passage of time, resulting in “broken links” and disappearance of material from the Web. This article will describe Perma.cc and outline the kinds of copyright issues that may arise, including heavy use of copyright statutes and case law. It will examine the kind of preservation use of copyrighted materials, with reference to fair use, and the library prerogatives as exceptions to the exclusive rights of authors of materials found on the Web. This analysis includes detailed analysis of “transformative use” and the four factors of 17 U.S.C. § 107. It will consider the liability of Perma.cc and participating libraries and institutions under theories of contributory infringement and vicarious liability, including as modified by 17 U.S.C. § 512(c) and (d), governing takedown notices. The article concludes that Perma.cc’s archival use is neither firmly grounded in existing fair use nor library exemptions; that Perma.cc, its “registrar” library, institutional affiliates, and its contributors have some (at least theoretical) exposure to risk; and that current copyright doctrines and law do not adequately address Web archival storage for scholarly purposes. In doing so, it will question what the role of the scholarly Perma.cc citation ought to play—confirmation of scholarly propositions or preservation of and access to Web materials. The material and conclusions in this article are important for legal authors, law review editors, and librarians (especially those who use, support, or are considering partnering with Perma.cc) so that they might better assess copyright compliance, especially when selecting materials for archiving, such as articles from news sites, blogs, and professional and scholarly papers, articles, or books.
Hozzáadás dátuma 2021. 08. 09. 8:44:23
Módosítás dátuma 2021. 08. 09. 8:44:23

Címkék:

  • web archiving
  • link rot
  • copyright
  • broken links
  • fair use
  • contributory infringement
  • Contributory infringement (Copyright & trademark)
  • Fair use (Copyright)
  • library exceptions
  • Perma.cc
  • Scholarly periodicals
  • transformative use
  • vicarious liability

Persistent annotations deserve new URIs

Típus Dolgozat
Szerző Abdulla Alasaadi
Szerző Michael L. Nelson
URL http://portal.acm.org/citation.cfm?doid=1998076.1998113
Hely New York, New York, USA
Kiadó ACM Press
Oldalszám 195
ISBN 978-1-4503-0744-4
Dátum 2011
DOI 10.1145/1998076.1998113
Kivonat Some digital libraries support annotations, but sharing these annotations with other systems or across the web is difficult because of the need of special applications to read and decode these annotations. Due to the frequent change of web resources, the annotation's meaning can change if the underlying resources change. This project concentrates on minting a new URI for every annotation and creating a persistent and independent archived version of all resources. Users should be able to select a segment of an image or a video to be part of the annotation. The media fragment URIs described in the Open Annotation Collaboration data model can be used, but in practice they have limits, and they face the lack of support by the browsers. So in this project the segments of images, and videos can be used in the annotations without using media fragment URIs.
Kiadvány címe Proceeding of the 11th annual international ACM/IEEE joint conference on Digital libraries – JCDL '11
Hozzáadás dátuma 2021. 08. 09. 8:43:23
Módosítás dátuma 2021. 08. 09. 8:43:23

Címkék:

  • Web Archiving
  • URI
  • Reliability
  • Design
  • Annotation
  • Persistence