Big Data Project
Big-Data Knowledge Discovery
The Big Data Knowledge Discovery project is developing new methods and tools for applying machine learning in the natural sciences, focusing particularly on disciplines where machine learning is not widely adopted. It is an interdisciplinary project with collaborators in Ecology (Macuarie University – Forest Ecosystems), Geophysics (Sydney University – Plate Tectonics), and non-linear Laser Physics (Macquarie University), as well as a data-centric financial services company (SIRCA).
Within the project there are sub-projects focused on the following problems:
- Improving our understanding of the motions of tectonic plates into the past and quantifying the uncertainty in these reconstructions. Understanding these motions will give us better tools for understanding where mineral deposits lie deep under the surface of Australia.
- Modeling how different plant traits (such as leaf size, wood density, and seed size) allows plants to effectively compete for resources in forest ecosystem. The eventual goal of this work is to understand the evolution of biodiversity in forest ecosystems, and how such systems may respond to external stresses.
- Non-linear lasers systems exhibit many of the complex dynamical behaviours that exist in other parts of nature, but can be experimentally controlled with a laboratory. We are developing new methods for measuring complexity in non-linear laser systems and efficiently exploring the parameter space of these systems, with the goal of extending these techniques to other complex systems.
These three problems are the specific focus of the collaborators in this project and various machine learning and data science techniques are being brought to bear on each of them. We are focusing particularly on Bayesian methods that allow noise in the measurements and the parameters of the system to be quantified and tracked, thereby giving some estimate of the uncertainties in our conclusions.
More specifically, some basic problems that underlie each field are being investigated from a machine learning and data science perspective:
- Bayesian methods for quantifying the uncertainty in estimation of processes involving non-linear differential systems.
- Efficient exploration of large dimensional parameter spaces using Bayesian experimental design and active learning.
- Data management and parallelization of single use codes across distributed datasets.
At the end of the project we expect to have released some general purpose software, tools, and methodologies that can be used to deal with these issues across the natural sciences and beyond.
For further information please contact:
Dr. Stephen Hardy
Technology Director – Computational Analytics
National ICT Australia
p: 02 9376 2017 m: 0403074365