Harvests of the national web space

Beyond to selective (thematic or event-based) harvests we try to make snapshot harvests once or twice a year about a representatively large part of the Hungarian web space. It means to harvest several hundred thousand websites from the starting page at least to two level depth – excluding files by large size in order to spare storage space. The initial URLs can be collected from several resources: public lists of URL addresses from the Hungarian domain, those links that include Hungarian domains and sub-domains we could find by earlier harvests and those website addresses that have selected for thematic collections or recommended by the corresponding template (these include addresses beyond the .hu domain also!)

The spreadsheet below contains the already established harvests on web space level. The materials of these archived collections are being stored in a closed archive in order to guarantee long-term preservation and research activities in the future.

 

 

 

Start of harvest End of harvest Number of seed URLs Number of downloaded URLs
2021-07-07 2021-07-12 433 863   71 878 955
2020-12-30 2021-01-04 251 230   47 881 581
2020-06-30 2020-07-05 269 430   46 380 598
2019-12-23 2020-01-02 246 819 110 367 190
2018-09-24 2018-09-28 291 078 172 639 350