Sydney's public institutions are sitting on hundreds of thousands of duplicate digital images — redundant scans, re-uploaded heritage photographs and repeated planning documents — that are quietly draining storage budgets and slowing access to records that residents, researchers and developers actually need. The problem has become acute enough that at least two major Sydney organisations have begun dedicated deduplication programs in 2026, raising questions about whether the city is acting fast enough compared to peers in London, Amsterdam and Singapore.
The pressure point is timing. Sydney's Metropolitan Local Aboriginal Land Council, the City of Sydney Council's digital archive division and the State Library of NSW have all expanded their digitisation programs over the past three years, pulling in material from suburban libraries, demolished buildings and community groups. More digitisation means more duplication — and without automated detection tools embedded at the point of ingestion, the problem compounds quickly. The City of Sydney's archive, based at the Customs House precinct on Alfred Street in the CBD, reportedly holds millions of image files across multiple storage environments, though the council has not publicly released a breakdown of what proportion are duplicates.
What Sydney Is Doing — and What It Isn't
The State Library of NSW, on Macquarie Street, has been running a quiet internal audit of its digital collections since at least late 2025, according to publicly available tender documents on the NSW Government eTendering portal. A contract published in March 2026 sought software capable of perceptual hashing — a technique that identifies visually similar images even when file names or metadata differ. That is a meaningful step. But the contract covered only the Library's photographic holdings, leaving the broader question of how council and university repositories handle duplicates largely unanswered.
Western Sydney University's Parramatta campus library, which holds significant community heritage collections from the Hawkesbury and Penrith regions, does not appear to have a publicly described deduplication program. The university has not responded to questions from The Daily Sydney.
Compare that with Amsterdam's Rijksmuseum, which completed a full deduplication sweep of its 700,000-image Rijksstudio collection in 2024, cutting redundant files by an estimated 18 percent. The British Library in London embedded automated duplicate detection into its digitisation pipeline in 2023 as part of a broader infrastructure overhaul. Singapore's National Library Board went further still, integrating AI-assisted deduplication across all thirteen of its digital repositories by mid-2025 — a project that officials there described publicly as a cost-avoidance measure worth millions of Singapore dollars annually.
The Cost of Doing Nothing
Cloud storage is not free. Amazon Web Services and Microsoft Azure — the two platforms most commonly used by NSW government agencies — charge in the range of $25 to $35 per terabyte per month for standard storage tiers, depending on configuration and data transfer volumes. For a mid-sized council archive holding, say, 50 terabytes of image data where 15 to 20 percent is duplicated, the annual waste runs to thousands of dollars. Across a city with thirty-three local government areas, the aggregate figure becomes meaningful.
Beyond cost, there is a practical consequence for users. Researchers at institutions like the Powerhouse Museum in Ultimo — which is in the middle of its own controversial physical relocation to Parramatta — have noted informally that search results across digitised heritage collections are cluttered by near-identical images, making genuine research slower and less reliable. The museum's digital team did not provide a formal comment.
For Sydney to close the gap on Amsterdam, London and Singapore, the most obvious path is coordinated policy rather than institution-by-institution tinkering. The NSW Department of Communities and Justice oversees record-keeping standards under the State Records Act 1998, and a revision of digital storage guidelines — last meaningfully updated before cloud storage became the norm — would give agencies a mandate to act together. Institutions waiting for that guidance should at least begin internal audits now. The cost of cataloguing what you have is almost always lower than the cost of storing what you don't need.