Basic information and data

Aggregate data of the bulk harvests in 2023
    • The web archiving project has started at the beginning of 2017 in the National Széchényi Library,
      the test period lasted until 2019.
    • The aim is to preserve and make searchable documents and information sources produced and distributed in digital form.
    • The primary scope of collection is scientific, cultural, educational, and public sphere web content.
    • Types of archiving:
      – periodic harvests of selected Hungarian websites (by theme, genre, institution);
      – events related harvests (news portal sections, relevant websites and blogs);
      – snapshots of the Hungarian web space (servers under the .hu domain and other Hungarian related content).
    • Growth and total size of the non-public archive
      (data after compression)
    • The web archive uses open source, free software.
    • Only a small part of the collection is public, for legal reasons.
    • Statistics at the end of 2023:

Closed archive:
18 thematic sub-collections (e.g. literature, art, culture, religion, higher education, research, government, public collections)
6 sub-collection by genre (e-periodicals, news portals, podcasts, Facebook, Instagram, Twitter)
19 event-based sub-collections (eg elections, sport events, war, pandemic, Katalin Karikó)
approx. 80 thousand selected websites saved quarterly with frontpage screenshots
approx. 1,37 million semi-automatically collected sites saved semiannually with frontpage screenshots
approx. 100 terabytes total size

Public archives:
360 selected and licensed or not subject to licensing sites saved quarterly
102 NSZL websites saved 1-2 times
2 event-based sub-collection (Rákóczi Memorial Year, Foundation of the library by Ferenc Széchényi)
approx. 1.9 terabytes total size