Harvests of the national web space

Beyond to selective (thematic or event-based) harvests we try to make snapshot harvests once or twice a year about a representatively large part of the Hungarian web space. It means to harvest several hundred thousand websites from the starting page at least to two level depth – excluding files by large size in order to spare storage space. The initial URLs can be collected from several resources: public lists of URL addresses from the Hungarian domain, those links that include Hungarian domains and sub-domains we could find by earlier harvests, the .hu “zonefile” from the Internet Archive, and those website addresses that have selected for thematic collections or recommended by the corresponding template (these include addresses beyond the .hu domain also!)

The spreadsheet below contains the already established harvests on web space level. The materials of these archived collections are being stored in a closed archive in order to guarantee long-term preservation and research activities in the future.

 

 

 

Start of harvest End of harvest Number of seed URLs Number of downloaded URLs
2022-12-02 2022-12-20 1 371 617 158 416 570
2022-06-24 2022-07-20 1 371 617 174 282 398
2021-12-26 2022-01-03   433 863   69 356 724
2021-07-07 2021-07-12   433 863    71 878 955
2020-12-30 2021-01-04   251 230    47 881 581
2020-06-30 2020-07-05   269 430    46 380 598
2019-12-23 2020-01-02   246 819  110 367 190
2018-09-24 2018-09-28   291 078  172 639 350