Harvests of the national web space

Beyond to selective (thematic or event-based) harvests we try to make snapshot harvests once or twice a year about a representatively large part of the Hungarian web space. It means to harvest several hundred thousand websites from the starting page at least to two level depth – excluding files by large size in order to spare storage space. The initial URLs can be collected from several resources: public lists of URL addresses from the Hungarian domain, those links that include Hungarian domains and sub-domains we could find by earlier harvests and those website addresses that have selected for thematic collections or recommended by the corresponding template (these include addresses beyond the .hu domain also!)

The spreadsheet below contains the already established harvests on web space level. The materials of these archived collections are being stored in a closed archive in order to guarantee long-term preservation and research activities in the future.

 

 

 

Start of harvest End of harvest         Number of seed URLs Number of downloaded URLs
09/24/2018 09/28/2018 291 078 172 639 350
12/23/2019 01/02/2020 246 819 110 367 190