4.10 Data management

Considering the cost and effort that goes into collecting it, monitoring data is commonly the most expensive asset of a mining project’s monitoring section, so it is astonishing to see how little attention is often given to optimising data storage and management systems and the uses to which the data could be put. To realise maximum value from the investment in data collection, database management systems must be in place to ensure not only that the data is accurate and readily accessible, but also that adequate security exists to prevent tampering or unauthorised access. A leading practice data management and reporting system automatically alerts operations staff if key parameters approaching performance limit values, and facilitates the production of timely and fit-for-purpose reports. The monitoring data can also be used to support research effort and to identify previously unrecognised relationships between monitoring parameters. This is leading practice data management; storing monitoring data in a spreadsheet on a local hard drive is not.

Adequate data management is the first step in data quality control. As the Australian guidelines for water quality monitoring and reporting note, ‘Once the “certified” data leave the laboratory, there is ample opportunity for “contamination” of results to occur’ (ANZECC–ARMCANZ 2000b). Data insertions, deletions and repetitions, the mixing of units of measurement and the mis-assignment of sites or dates can readily occur. Such data errors can be very difficult to detect without regular detailed checking of new sets of data by personnel who are familiar with the monitoring program. Rigorous data entry quality assurance and quality control, using a database with appropriate authorisations for access and tracking of edits and internal consistency checks, can eliminate or minimise such errors, and are well worth the cost and effort.

Because most data storage systems are primarily electronic, it is critical that they are adequately backed up (both onsite and offsite). Ideally, hard copies of the data should also be maintained. As with any aspect of quality management, good housekeeping is the essential element. The adequacy and quality of the backups, and their locations, should be regularly checked. This applies especially where multiple networked systems are involved and where there is potential for parts of the system not to be backed up as a result of software error or hardware failure.

For operations with long lives, it is important to use data storage software that is widely used, allows easy data transfer to another system, or both. Software systems evolve, and there is no guarantee that the software used today will continue to be supported or that future hardware and operating systems will be able to run it. The same applies to mass storage media and internal formatting used for the archiving of data.

For larger datasets, relational databases are generally better futureproofed because the data structures can be maintained in future software implementations, and robust data transfer systems are generally well developed for them. For smaller datasets and projects of short duration, standard spreadsheet formats can provide adequate futureproofing, but they might not be the best option if they do not allow easy data quality checking. Online database options are available even for small datasets, and free relational database software is readily available.

A number of relational database packages tailored for storing and reporting monitoring data are available. The leading packages include the ability to automate some aspects of data quality checking (for example, the ion balance in the case of water sample analysis) and provide for data quality scores to be associated with the stored measurements. Such features are highly recommended for leading practice data management. In selecting a monitoring data management package, it is essential that its suitability and coverage of data types are matched to the requirements of the monitoring program. It is ill-advised to design monitoring information content to suit the capabilities of the software, as that may mean that a number of important components of the monitoring program cannot be effectively incorporated into the data structure or need to be downgraded or summarised to be stored, potentially reducing the future utility of the data.

Flexibility and adaptability in data management systems are necessities in the selection of a monitoring data management package. The sites, parameters and precision of monitoring may change over time in response to changing management needs. The data management system needs to be flexible enough to accommodate such changes and maintain the right balance between standardisation to facilitate data quality management and adaptation to facilitate the optimisation of the monitoring program. Usually this will entail a multi-level security system, so that only an authorised and technically competent system manager is able to make the changes necessary to adapt the database structure to changing needs.

Most modern monitoring programs include the collection of different types of data, such as datasets of different sizes, continuous or semi-continuous time-series measurements, and discrete samples of a limited number of parameters. Alternatively, they may include datasets of different levels of complexity, such as biological measurements of several parameters for different body parts of individuals of several species from different taxonomic groupings, collected using several different sampling methods at a number of sites on a number of occasions, as well as spot water-quality measurements once a month with a few parameters per sample.

Leading practice use of the different datasets includes comparison and synthesis of results to provide multiple lines of evidence to assess the impact of the mining operation. Where possible, this should be facilitated by the use of a single data management system. However, it might not be possible to effectively include all types of monitoring data in a single system. In that case, standardisation of the use of some data elements across datasets, such as the use of common site code descriptors, is essential to facilitate the analysis of data stored in different databases. Commonly, the location of a compliance site may change through time, and the same station name is retained for reporting convenience. In such a case, it is essential that a record be kept of the changes that have occurred in the location, as previously unrecognised differences in behaviour may subsequently be found. There may also be cases in which the same site has been assigned different names as a result of data having been collected by different teams for different purposes (such as water quality and taxonomic identification). In such cases, it is essential that the data management system is capable of matching the various site names (aliases) to the location, so that all of the datasets for that site can be collated and reported consistently.

The data systems must be accessible to those who need to use them and sufficiently intuitive so that new users are able to use them quickly and access the monitoring data as they need it. Clarity in data management systems should extend to data sources, their quality and their relevance. Remember that the person in charge of the data now may not be the person responsible for it in years to come. Leading practice data management systems facilitate transfer of the monitoring knowledge base and should be person-independent.

Maintaining corporate memory of monitoring and auditing results can also be a big issue for a mining project. Procedures must be in place to ensure that monitoring techniques, locations, data and reports are securely recorded in a manner that will enable new staff to continue to implement and report monitoring programs without any loss of information or quality control.

A robust spatial database is also an essential requirement for keeping a record of the location of all monitoring sites. A common problem relating to spatial data management is that of different mapping datums being used, making data conversion necessary. This is a straightforward task if the process is known or well documented, but it can lead to serious errors if there is a high turnover of staff or if data points are plotted onto base imagery using GIS without rigorous review and checking.

The use of spatial data acquired by portable global positioning system (GPS) units is standard and usually includes inbuilt translation to a range of datums, but once transferred to a database the selection of a common datum is essential for the accurate positioning of field points in the GIS and the relocation of sites by new samplers.

GIS software can also be used to point and click on specific monitoring details (contained in separate databases but linked to the GIS). Particularly large or complex sites may need data visualisation tools that provide a link between the spatial data and a range of conventional data sources in spreadsheets and databases. Leading practice requires good integration of monitoring data with GIS, web-based interface, site operational data and information management systems, or combinations of them. Increasingly, components of monitoring data are being made available to external parties (such as regulators and community groups) using web-based platforms. The availability of this facility should be considered when a data management and reporting system is being selected. Good data accessibility but secure data storage is the key.

Share this Page