In spite of the increasing international prevalence of big data, and the consequent exponential growth in data mining techniques, most large organisations still fail to network internal datasets in ways that can maximise commercial advantage. The two studies reported here demonstrate that it is possible to deploy informatics methodologies on randomly selected sub-sets of intranet data in order to extend development corpora. In each case, the first stage of the data mining process was to develop and quantify intranet thesauri. These thesauri were trained using previously verified datasets, which allowed them to mesh strategic paradigms successfully when applied to similar datasets sourced from elsewhere in the organisation. Both studies deploy relevance feedback infrastructures in order to sense-check emerging data, providing reliable validity ratings that enable these data to be used in commercially sensitive environments.
Randomly generated phrases: extend development corpora, mesh strategic paradigms, deploy relevance feedback infrastructures, quantify intranet thesauri, deploy informatics methodologies