Beyond to selective (thematic or event-based) harvests we try to make snapshot harvests once or twice a year about a representatively large part of the Hungarian web space. It means to harvest several hundred thousand websites from the starting page at least to two level depth – excluding files by large size in order to spare storage space. The initial URLs can be collected from several resources: public lists of URL addresses from the Hungarian domain, those links that include Hungarian domains and sub-domains we could find by earlier harvests, the .hu “zonefile” from the Internet Archive, and those website addresses that have selected for thematic collections or recommended by the corresponding template (these include addresses beyond the .hu domain also!)
The spreadsheet below contains the already established harvests on web space level. The materials of these archived collections are being stored in a closed archive in order to guarantee long-term preservation and research activities in the future.
Start of harvest | End of harvest | Number of seed URLs | Number of downloaded URLs |
2024-06-24 | 2024-07-23 | 865 982 | 129 728 757 |
2024-01-11 | 2024-02-03 | 1 371 617 | 138 409 426 |
2023-10-04 | 2023-10-31 | 992 303 | 126 850 047 |
2022-12-02 | 2022-12-20 | 1 371 617 | 158 416 570 |
2022-06-24 | 2022-07-20 | 1 371 617 | 174 282 398 |
2021-12-26 | 2022-01-03 | 433 863 | 69 356 724 |
2021-07-07 | 2021-07-12 | 433 863 | 71 878 955 |
2020-12-30 | 2021-01-04 | 251 230 | 47 881 581 |
2020-06-30 | 2020-07-05 | 269 430 | 46 380 598 |
2019-12-23 | 2020-01-02 | 246 819 | 110 367 190 |
2018-09-24 | 2018-09-28 | 291 078 | 172 639 350 |