The Daily Sydney

Sydney news, every day

News

Sydney's duplicate image problem: how the city stacks up against London, Singapore and New York

From Parramatta council chambers to the State Library on Macquarie Street, Sydney's public institutions are grappling with a digital archiving headache that counterparts overseas have been wrestling with for years.

By Sydney News Desk · Published 5 July 2026, 5:06 am

3 min read

Sydney's duplicate image problem: how the city stacks up against London, Singapore and New York
Photo: Photo by Talha Resitoglu on Pexels

Sydney's major cultural and government institutions are sitting on tens of thousands of duplicate digital images — redundant scans, re-uploaded photographs and mirrored records clogging storage systems and distorting public archives. The problem is not unique to this city, but how local organisations handle it is increasingly drawing comparisons with approaches taken in London, Singapore and New York.

The issue has sharpened in 2026 as NSW government agencies accelerate their digital-first records mandates under the State Archives and Records Authority of New South Wales framework. Every duplicate filed under a permanent retention class costs money to store, complicates Freedom of Information searches, and, for cultural institutions, risks presenting the public with contradictory or mislabelled versions of the same image. For a city mid-way through infrastructure projects like Metro West and managing population growth concentrated in Western Sydney corridors, the administrative overhead is not trivial.

What Sydney institutions are actually doing

The City of Sydney Council's digital asset management system, overhauled in stages since 2023, now runs automated deduplication checks against incoming image uploads before they enter the master repository. The council manages visual records for areas stretching from Redfern to Pyrmont, and its library services branch — covering the Customs House Library at Alfred Street as well as neighbourhood branches — had accumulated duplicate scan rates estimated internally at above 30 per cent before the system change, according to publicly available council efficiency reports from 2024.

The State Library of New South Wales on Macquarie Street has taken a different tack, relying on human-reviewed batch processing tied to its Recollect and Libraries Tasmania consortium agreements. Archivists there work through the Picture Australia legacy dataset, a collection that spans colonial-era photographs through to digitised press imagery, identifying and flagging duplicates without deleting originals — a conservative approach driven by the risk of destroying a record that might turn out to be a variant rather than a true duplicate.

Parramatta City Council, managing one of the fastest-growing local government areas in the country, adopted a cloud-based perceptual hashing tool for its planning and heritage photography records in late 2025. Perceptual hashing compares image fingerprints rather than file names, catching re-saved JPEGs and resized versions that traditional checksum tools miss. A similar rollout happened at the Museum of Applied Arts and Sciences, which operates the Powerhouse Museum at Ultimo and the Castle Hill site, as part of a broader collection management push.

How that compares with London, Singapore and New York

London's approach has been more centralised. The Greater London Authority and its linked borough councils share a single image deduplication protocol tied to the London Metropolitan Archives at Clerkenwell. The protocol, in place since 2022, runs on open-source software and has processed more than 1.2 million records in its first three years of operation, according to figures the LMA published in its 2024-25 annual review.

Singapore's National Archives, housed at the old Hill Street Police Station building, went further — embedding AI-assisted duplicate detection directly into its ingest pipeline in 2023 as part of the National Library Board's digital preservation strategy. New submissions from government agencies are screened before storage is allocated, not after. That upstream approach has cut their reported duplicate rate on new ingests to under 4 per cent, the NLB stated in a 2025 parliamentary submission.

New York City's municipal archive on Chambers Street still leans heavily on manual review for legacy collections, though the New York Public Library's digital collections team has published open-source scripts on GitHub used by peer institutions globally, including at least one NSW regional council.

Sydney sits somewhere in the middle tier globally — ahead of many comparable cities on automation for new records, but carrying a significant legacy backlog. The State Archives authority's next compliance review cycle is due to report before December 2026, and institutions that have not yet benchmarked their duplicate rates against the authority's best-practice guidelines will face pressure to do so. For organisations still running ad-hoc file structures on shared drives — and there are still plenty in the Greater Sydney region — the practical first step is an audit using publicly available perceptual hashing tools before committing to any vendor solution.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Sydney

This article was produced by the The Daily Sydney editorial desk and covers news in Sydney. See our editorial standards for how we use AI.

The Daily Sydney brief

The day's Sydney news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Sydney and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Sydney news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Sydney and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Sydney

More in News

Enjoyed this story? Get tomorrow's briefing free.