Harvests by theme, genre and geographical location

The websites that make up the thematic sub-collections are selected by librarians and the sites are being archived several times per year. They typically contain websites and blogs, that is they do not contain social media that cannot be harvested by a robot, as well as online periodicals because they are kept separate. In addition to the websites of institutions, organizations and companies, the pages of professionals and artists working on that topic can also be included in the sub-collections.

The table below contains the existing and planned thematic sub-collections, and at the end we have included four genre-based selections as well: the electronic periodicals (magazines, news portals, journals, newsletters, etc.) and the collections of Facebook, Instagram, Twitter pages and podcasts, as well as the Transcarpathian sub-collection created in March 2022, which was the first geographically delimited part of the web archive.

A seed list consists from several hundred to several thousand URL addresses. We are permanently updating, expanding these lists and adding new topics to the archive every few months. Click on the abbreviations in the table to see the seed URLs. The materials of these selective collections are being stored in a closed archive in order to guarantee long-term preservation and research activities in the future. Only a small fraction of these selected websites is available through the demo collection that we have permission for public access from the copyright owner or for which no individual contract is required.

Quick links: SearchBrowseStatistics

 

Name of the sub-collection Abbreviation Metadata Frequency of harvests First harvest Last harvest
by topic:
Archives LEVTAR MIA_SET-00006.xml closed 2017-09-07 2020-05-22
Book and other publishers, vendors and resellers KONYVKIAD MIA_SET-00024.xml quarterly 2019-11-27
Cultural institutions, community centers, event venues MUVHAZ MIA_SET-00016.xml quarterly 2019-05-10
Fine arts, performing arts, music and cinema MUVESZ MIA_SET-00015.xml quarterly 2019-04-29
Government, municipalities, political and civil organizations KORMONKOR MIA_SET-00010.xml quarterly 2018-02-21
History, local and family history TORTENELEM MIA_SET-00030.xml quarterly 2021-04-30
Libraries, archives, museums and galleries KOZGYUJT MIA_SET-00040.xml quarterly 2018-07-24
Libraries, library science KONYVTAR MIA_SET-00002.xml closed 2017-06-23 2020-05-22
Literature, literary science and history IRODALOM MIA_SET-00013.xml quarterly 2018-07-24
Media, press, broadcasting MEDIA MIA_SET-00025.xml quarterly 2020-01-23
Museums, galleries, exhibitions MUZGAL MIA_SET-00007.xml closed 2017-09-07 2020-05-22
Natural and technical sciences TERMUSZ in preparation quarterly 2021-12-17
Public education and other training OKTAT MIA_SET-00028.xml quarterly 2020-05-16
Religions, belief systems, churches VALLAS MIA_SET-00020.xml quarterly 2019-07-15
Research institutes, scientific organizations KUTINT MIA_SET-00008.xml quarterly 2018-01-20
Sport, physical training SPORT MIA_SET-00029.xml quarterly 2020-09-18
Tourism, hospitality industry TURIZMUS MIA_SET-00038.xml quarterly 2021-01-05
Universities, colleges EGYETEM MIA_SET-00005.xml quarterly 2017-07-20
in preparation:
Healthcare, social sphere EGESZSEG
Humanities and social sciences TARSTUD
Industry, agriculture, labor IPARMEZ
Lifestyle, leisure, hobby ELETMOD
Service, trade, transport, marketing SZOLGKER
by genre:
Electronic periodicals ELPERI MIA_SET-00003.xml quarterly 2017-06-23
Instagram pages INSTA in preparation yearly 2020-02-01
Facebook pages FACEBOOK in preparation yearly 2020-09-21
Twitter accounts TWITTER in preparation yearly 2021-04-28
Podcasts PODCAST in preparation yearly not harvested
yet separately
by geographical location:
Transcarpathia KARPATALJA in preparation weekly 2022-03-08