Basic information and data

    Aggregate data of the bulk harvests in 2021

  • The web archiving project has started at the beginning of 2017 in the National Széchényi Library,
    the test period lasted until 2019.
  • The aim is to preserve and make searchable documents and information sources produced and distributed in digital form.
  • The primary scope of collection is scientific, cultural, educational, and public sphere web content.
  • Types of archiving:
    – periodic harvests of selected Hungarian websites (by theme, genre, institution);
    – events related harvests (news portal sections, relevant websites and blogs);
    – snapshots of the Hungarian web space (servers under the .hu domain and other Hungarian related content).
  • Growth and total size of the non-public archive
  • The web archive uses open source, free software.
  • Only a small part of the collection is public, for legal reasons.
  • Statistics at the end of 2021:
  • Closed archive:
    15 thematic sub-collections (e.g. literature, art, culture, religion, higher education, research, government, public collections)
    5 sub-collection by genre (e-periodicals, news portals, Facebook, Instagram, Twitter)
    13 event-based sub-collections (eg elections, sport events, pandemic)
    approx. 51,000 selected websites saved quarterly with frontpage screenshots
    approx. 446,000 semi-automatically collected sites saved semiannually with frontpage screenshots
    approx. 56 terabytes total size

    Public archives:
    307 selected and licensed or not subject to licensing sites saved quarterly
    99 NSZL websites saved 1-2 times
    1 event-based sub-collection (Rákóczi Memorial Year)
    approx. 6 terabytes total size