Thematic and genre-based harvests

The websites that make up the thematic sub-collections are selected by librarians and the sites are being archived several times per year. They typically contain websites and blogs, that is they do not contain social media that cannot be harvested by a robot, as well as online periodicals because they are kept separate. In addition to the websites of institutions, organizations and companies, the pages of professionals and artists working on that topic can also be included in the sub-collections. A seed list consists from several hundred to several thousand URL addresses. We are permanently updating, expanding these lists and adding new topics to the archive every few months.

The spreadsheet below shows the sub-collections from the archive by the status of early 2020. At the end of it, you can find two collections being selected by genre that contains websites of electronic periodicals (magazines, news portals, journals, e-periodicals, newsletters etc.) and Instagram pages. The materials of these selective collections are being stored in a closed archive in order to guarantee long-term preservation and research activities in the future. Only a small fraction of these selected websites is available through the demo collection that we have permission for public access from the copyright owner.

Quick links: SearchBrowseStatistics


Name of the sub-collection Abbreviation Metadata Frequency of harvests First harvest Last harvest
by topic:
Archives LEVTAR MIA_SET-00006.xml closed 2017-09-07 2020-05-22
Book and other publishers, vendors and resellers KONYVKIAD MIA_SET-00024.xml quarterly 2019-11-27
Cultural institutions, community centers, event venues MUVHAZ MIA_SET-00016.xml quarterly 2019-05-10
Fine arts, performing arts, music and cinema MUVESZ MIA_SET-00015.xml quarterly 2019-04-29
Government, municipalities, political and civil organizations KORMONKOR MIA_SET-00010.xml quarterly 2018-02-21
History, local and family history TORTENELEM MIA_SET-00030.xml quarterly 2021-04-30
Libraries, archives, museums and galleries KOZGYUJT MIA_SET-00040.xml quarterly 2018-07-24
Libraries, library science KONYVTAR MIA_SET-00002.xml closed 2017-06-23 2020-05-22
Literature, literary science and history IRODALOM MIA_SET-00013.xml quarterly 2018-07-24
Media, press, broadcasting MEDIA MIA_SET-00025.xml quarterly 2020-01-23
Museums, galleries, exhibitions MUZGAL MIA_SET-00007.xml closed 2017-09-07 2020-05-22
Public education and other training OKTAT MIA_SET-00028.xml quarterly 2020-05-16
Religions, belief systems, churches VALLAS MIA_SET-00020.xml quarterly 2019-07-15
Research institutes, scientific organizations KUTINT MIA_SET-00008.xml quarterly 2018-01-20
Sport, physical training SPORT MIA_SET-00029.xml quarterly 2020-09-18
Tourism, hospitality industry TURIZMUS MIA_SET-00038.xml quarterly 2021-01-05
Universities, colleges EGYETEM MIA_SET-00005.xml quarterly 2017-07-20
in preparation:
Healthcare, social sphere EGESZSEG
Humanities and social sciences TARSTUD
Industry, agriculture, labor IPARMEZ
Lifestyle, leisure, hobby ELETMOD
Natural and technical sciences TERMUSZ
Service, trade, transport, marketing SZOLGKER
by genre:
Electronic periodicals ELPERI MIA_SET-00003.xml quarterly 2017-06-23
Instagram pages INSTA in preparation yearly 2020-02-01
Facebook pages FACEBOOK in preparation yearly 2020-09-21
Twitter accounts TWITTER in preparation yearly 2021-04-28