„Virginia Tech Events Archive” változatai közötti eltérés
(Új oldal, tartalma: „A Virginia Polytechnic Institute and State University (röviden: Virginia Tech) eseményalapú web- és Twitter-archívuma. Különböző válságok, katasztrófák és…”) |
(Nincs különbség)
|
A lap 2018. március 18., 18:18-kori változata
A Virginia Polytechnic Institute and State University (röviden: Virginia Tech) eseményalapú web- és Twitter-archívuma. Különböző válságok, katasztrófák és tragédiák (pl. iskolai lövöldözések) internetes lenyomatait gyűjtik az egész világból. (2015 szeptemberi adatok szerint 12 terabájt webanyag és több mint 1 milliárd tweet volt benne.) Részben az Archive-It szolgáltatást használják (2018 elején 66 eseményalapú részgyűjteményük volt), részben maguk is gyűjtenek a Heritrix-szel és más eszközökkel (köztük egy saját fejlesztésű focused crawlerrel), valamint a DMI-TCAT, a yourTwapperKeeper és a Social Feed Manager nevű Twitter-elemző és -archiváló eszközöket is használják. Az így összegyűlt adathalmazon különböző információkeresési, szövegbányászati, hálózatelemzési, számítógépes nyelvészeti, gépi tanulási, vizualizációs kutatásokat is végeznek egy Hadoop klaszteren.
4) Earliest in this series was SGER: DL-VT416: A Digital Library Testbed for Research Related to 4/16/2007 at Virginia Tech., supported by NSF IIS-0736055.
3) Precursor to IDEAL was the Crisis, Tragedy, and Recovery Network, CTRnet, supported by NSF IIS-0916733 from 2009 to 2013
This site covers 4 related projects funded by NSF:
1) Global Event and Trend Archive Research (GETAR) is supported by NSF (IIS-1619028 and 1619371) starting in late 2016. This project will devise interactive, integrated, digital library/archive systems coupled with linked and expert-curated webpage/tweet collections, covering key parts of the 1997-2020 timeframe, supporting research on urgent global challenge events and initiatives. It will allow diverse stakeholder communities to interactively: collect, organize, browse, visualize, study, analyze, summarize, and explore content and sources related to biodiversity, climate change, crises, disasters, elections, energy policy, environmental policy/planning, geospatial information, green engineering, human rights, inequality, migrations, nuclear power, population growth, resiliency, shootings, sustainability, violence, etc. GETAR will leverage VT research on digital libraries, natural language processing, HCI, information retrieval, machine learning, discovery analytics, and Web archiving.
2) Finishing up its 4th year is NSF grant IIS - 1319578, III: Small: Integrated Digital Event Archiving and Library (IDEAL). Final Report.
The Integrated Digital Event Archive and Library (IDEAL) system addresses the need for combining the best of digital library and archive technologies in support of stakeholders who are remembering and/or studying important events. It extends the work at Virginia Tech on the Crisis, Tragedy, and Recovery network (see http://www.ctrnet.net) to handle government and community events, in addition to a range of significant natural or manmade disasters. It addresses needs of those interested in emergency preparedness/response, digital government, and the social sciences. It proves the effectiveness of the 5S (Societies, Scenarios, Spaces, Structures, Streams) approach to intelligent information systems by crawling and archiving events of broad interest. It leverages and extends the capabilities of the Internet Archive to develop spontaneous event collections that can be permanently archived as well as searched and accessed, and of the LucidWorks Big Data software that supports scalable indexing, analyzing, and accessing of very large collections. Through a new model-based approach to intelligent focused crawling, it improves the quality (e.g., accuracy, coverage, and elimination of noise) of collections of webpages so as to ensure comprehensiveness, balance, and low bias, as is needed for scholarly study of historically important events by social scientists. It incorporates a range of visualization capabilities in support of key stakeholder communities, including archivists, librarians, researchers, scholars, and the general public. IDEAL connects the processing of tweets and webpages, combining informal and formal media, to automatically detect important events, as well as to support building collections on chosen general or specific topics. It supports integration of multiple types and at multiple levels, including key models about the event it is crawling (event models), the sources of information about the event (source models), the mechanisms used for disseminating information about the event (publishing venue models), and the entities related to the event (society /organization models). Integrated services include topic identification, categorization (building upon special ontologies being devised), sentiment analysis, and visualization of data, information, and context.
The IDEAL website (http://www.eventsarchive.org) supports searching, browsing, analyzing, and visualizing of event collections (of both tweets and webpages), as well as access to project software, methods, findings, publications, and other results. Usage is encouraged of the integrated system along with a growing number of collections, as well as of particular tools such as for focused crawling, which should aid curators to avoid non-relevant content while including a broader range of sources, improving significantly upon current crawling and archiving methods. Important data and information on events of interest are saved rather than lost, helping preserve our history and culture, in support of public interest, education, policy making, historical analyses, and comparative studies. Students studying sociology, human-computer interaction, digital libraries, information retrieval, computational linguistics, multimedia, and hypertext are gaining experience and contributing in scholarly studies, algorithms, software, interfaces, and big data handling.