What does it take to produce a national geocoded address database?

One thing that we can be certain of is that addresses are created and changed – a lot. This is why the process of maintaining G-NAF is an ongoing and iterative process that provides considerable value to the user. PSMA is continually testing incoming data to deliver a G-NAF in which users can have confidence.

> Back to Introduction to G-NAF

In 2014, the Australian and New Zealand Land Information Council (ANZLIC) commissioned the Cooperative Research Centre for Spatial Information (CRC SI) to undertake an independent review of the supply chain for geocoded addressing Australia. The resulting report, Optimising the Supply Chain for Geocoded Addressing in Australia* sets out the following key findings:

  • There is considerable variation in the understanding of terms and definitions of addressing throughout the supply chain.  This leads to inconsistencies in how addresses are created, interpreted and managed across the different state jurisdictions as well as variability between local government authorities and their respective state jurisdictions.
  • Local Government and hence the land valuation system is interested in the “property” address yet the surveyor and Land Titling system is interested in the “parcel” for identifying an associated address. This dichotomy of address references causes a large majority of the problems in resolving the correct or most likely address in the aggregation processes at state and national levels.
  • There is no legislation in place to control address creation through the naming and numbering of streets and properties in private estates and complexes such as retirement villages, universities and hospitals.
  • The supply chain is complex, non-linear and in many aspects convoluted.
  • Addresses have no authorised owner. PSMA develops a comprehensive, national geocoded address dataset yet has no control over the addresses themselves, which could be changed or misinterpreted by a new property owner without notice.  The G-NAF address set that it creates could be seen as “optimal “, but is not “authoritative” as there is no authority that is ultimately accountable.

This creates a very complex system. The following diagram from the report indicates the flow of address data between the various stakeholders in the supply chain:

Geocoded Address Supply Chain

Source: CRC SI, Optimising the Supply Chain for Geocoded Addressing in Australia, page 73

Participants in the supply chain collect address information for their own purposes which may not align or be consistent with the needs of a national geocoded address dataset.

Recognising the value of such a national asset, the governments of Australia opted to use PSMA as the vehicle to enable the collaboration necessary to construct a comprehensive database of addressing knowledge that connects the officially recognised address of government, the commonly used address adopted by citizens, and the precise latitude and longitude of the geocode.

The G-NAF production process

Why not simplify the process?

The distinctive feature of G-NAF is that it is a knowledge base of all addresses in use, not simply a list of official addresses. For this reason, the G-NAF production process does not make any assumptions about the quality of the input addresses. G-NAF independently examines and validates every candidate address.

This unique procedure produces a wealth of rich information detailing and storing the variations that exist between the addresses that people use and official principal addresses providing a much richer knowledge base to perform address validation and geocoding activities. The G-NAF production process also allows for validation or allocation of geocodes.

In the ten years that we’ve been producing G-NAF, PSMA has been able to condense hundreds of millions of checks and updates down to a 13 week process.

It is possible to further simplify the process by reducing the number of tasks we perform and thereby the richness of data within G-NAF; however, market feedback tells us that it is exactly this complexity that delivers the value not obtainable from a regular address list. So, at the moment quarterly updates are the best balance between currency and accuracy.

Addresses change

One thing that we can be certain of is that addresses are created and changed – a lot.

Events that can lead to changes include:

  • Governments gazette a new or changed suburb
  • Land use changes through new developments, urban infill, new commercial/industrial zones, land consolidations and subdivisions.
  • New or changed roads
  • The quality of the representation of suburbs, roads or cadastres changes.

Any of these events can lead to changes in addresses and their (spatial) representations.

This is why the process of maintaining G-NAF is an ongoing and iterative process that provides considerable value to the user. PSMA is continually testing incoming data to deliver a G-NAF in which users can have confidence.

From 30 million to 13 million

Every quarter, over 30 million addresses are supplied to PSMA from our contributors. In the November 2015 release, contributor data was supplied to PSMA in:

  • 343 files
  • 7,248 fields
  • 149,416,889 records
  • 24,822 MB data
  • multiple formats and data models

G-NAF contains approximately 13.5 million addresses.

The process to distil these addresses from the supplied data consists of two broad stages, executed over a 13 week period. The stages are:

  1. Contributor and reference data processing
  2. Validation of G-NAF data

Contributor and reference data processing

Addresses are a descriptive spatial reference which means they can be tested and validated against other spatial data.

All contributing data sets are deemed to have equal status in terms of authoritative data and are weighted equally. It is then possible to assess address usage based on the number of occurrences of an address in different datasets.

This analysis and comparison is conducted within a geospatial environment so that each component of each address can be matched and validated against the geospatial region to which it relates: state, locality, street and finally, property.

G-NAF processing

The second stage is about validation. In this stage, each component of every address is validated against the custodian (jurisdiction) supplied address as well as our reference datasets, which are PSMA’s roads, land parcels and suburb and localities datasets. Importantly it isn’t just the individual components of the address that are validated but the geospatial validation extends to combinations of address components. For instance does the road name with this street type appear in this locality in this state?

Through this process:

  • Geocodes are also confirmed, and in some cases assigned. Attributes are added indicating the geocode quality.
  • Addresses that contain invalid data are corrected but all information is stored in G-NAF to aid matching.
  • Multiple sources of addresses provide redundancy, confidence and enable the capture of alias and vanity address information.
  • Textual matching of addresses only occurs after the address has been spatially validated. 
  • Addresses from different sources found to be identical are merged into a single G-NAF record with feature level metadata capturing its linage & quality.
  • Addresses are also assigned a value indicating the degree of matching between contributors.

An iterative process

The G-NAF production process was founded on a continuous improvement model. Each release is enhanced and modified depending on the outputs from the previous iterations. Each iteration removes further anomalies and discrepancies from the candidate data. Now into the maintenance phase, each iteration constitutes an update as well as delivering incremental improvements using the benefits learned from all previous iterations.

G-NAF Production Process

Source: PSMA Australia

The Future

PSMA’s vision for the future of Australian addressing is to not only achieve continuous maintenance with the same robust approach to G-NAF, but to incorporate real-time address validation failure notifications from client address validation services. Such an approach closes the loop on the address maintenance process, empowers the citizen and guarantees the highest levels of quality and currency.

In 2014, we launched our web service, G-NAF Live. The service references a continuously maintained resource of jurisdictional data with a refresh rate equal to the highest update rate available in each jurisdiction (in some cases, daily), works with G-NAF and maintains linkages to other PSMA Australia datasets.

G-NAF Live references the most relevant address attributes rather than the full attribution of G-NAF. Through the multiple award-winning PSMA Cloud web interface, it allows users to customise and orchestrate address verification services with OGC Web Feature and Web Map services (WFS and WMS) into a single consumable workflow. 

PSMA Australia’s Chairman, Glenn Appleyard said, “PSMA Australia is driving forward with G-NAF Live and other ‘data as a service’ offerings, while continuing to explore the fringes of geocoded addressing and geospatial data management as we know it. With the growing interest in location data beyond specialist industries, PSMA has actively engaged with major users of location data as well as their value added reseller network to understand the market’s future needs for foundation spatial data.”

PSMA Australia CEO, Dan Paull added that “the most useful approach to our Australian geospatial future must be one that recognises that the digital age of geospatial is very early in its lifecycle. And as far as G-NAF has come in these past ten years, there is so much more ground ahead for this foundation data resource – as there is for the geospatial industry itself...”

 

* CRC SI (2015), Optimising the Supply Chain for Geocoded Addressing in Australia – Current State Supply Chain, Final Review 6.0, http://www.crcsi.com.au/assets/Resources/Geocoded-...

 

> Back to Introduction to G-NAF

Products: 
G-NAF
Scroll to top