Iron Population Clustering#

In data science, a domain defines the type of data value that is held by an attribute in a data model. For the Rocklea dome case we want to define domains in which samples would have a similar iron concentration. It can be two different domains for instance, one with a high iron concentration and the other one with a low iron concentration.

Different types of domain#

Different domains are defined by the way they are created. hose created through classical clustering show poor spatial contiguity compared to ones obtained by geostatistical clustering. Classical clustering does not include spatial information and tends to produce spatially scattered groups of samples which is a problem in mining exploitation whereas geostatistical clustering allows to have spacially contiguous clusters. This method allows to have better domains in mining exploration but is more complicated to use. An alternative approach consists in using classical clustering methods on spacially contiguous data set.

To visualize the different domains we can represent them through histograms.

Histograms#

A histogram is an approximate representation of the distribution of numerical data. To construct a histogram the first step is to divide the entire range of values into a series of intervals and then count how many values fall into each interval. The vertical y-axis represents the number count or percentage of occurences in the data for each column.

First we need to import the useful libraries and csv files.

import geolime as geo
from pyproj import CRS
import numpy as np
from sklearn.cluster import KMeans
import pyvista as pv

pv.set_jupyter_backend('panel')


geo.Project().set_crs(CRS("EPSG:20350"))
/tmp/ipykernel_2992/829011732.py:7: PyVistaDeprecationWarning: `panel` backend is deprecated and is planned for future removal.
  pv.set_jupyter_backend('panel')
dh = geo.read_file("../data/domained_drillholes.geo")
dh.user_properties()
['X_COLLAR',
 'Y_COLLAR',
 'Z_COLLAR',
 'X_M',
 'Y_M',
 'Z_M',
 'X_B',
 'Y_B',
 'Z_B',
 'X_E',
 'Y_E',
 'Z_E',
 'Fe_pct',
 'Al2O3',
 'SiO2_pct',
 'K2O_pct',
 'CaO_pct',
 'MgO_pct',
 'TiO2_pct',
 'P_pct',
 'S_pct',
 'Mn_pct',
 'Fe_ox_ai',
 'hem_over_goe',
 'kaolin_abundance',
 'kaolin_composition',
 'wmAlsmai',
 'wmAlsmci',
 'carbai3pfit',
 'carbci3pfit',
 'Sample_ID',
 'Fe',
 'Fe2o3',
 'P',
 'S',
 'SiO2',
 'MnO',
 'Mn',
 'CaO',
 'K2O',
 'MgO',
 'Na2O',
 'TiO2',
 'LOI_100',
 'Depth',
 'ellipsoidal_distance',
 'OreZone',
 'domain_code',
 'domain']

To determine the number of domains that we need we can look at the geology of the Rocklea dome area. The biggest part of the iron deposits have been made during the same era and form a layer of alternating shales. As this formation is the only one exploited for iron mining we can create two domains, one rich in iron and one poor in iron where the limit will be this iron rich formation. There is also a paleochannel very rich in iron but it takes its source in the precedent rich iron deposit so we can assume they both belongs to the same domain.

To create these two domains we can have a first look at the ferric distribution in the area.

geo.histogram_plot(data=[{"object": dh, "property": "Fe_pct"}], nbins=20)