What is the sensitivity and specificity of the peer review process?
Publication: Journal of Information & Knowledge Management
This paper proposes a new method of dimensionality reduction when performing Text Classification, by applying the discrete wavelet transform to the document-term frequencies matrix. We analyse the features provided by the wavelet coefficients from the different orientations: (1) The high energy coefficients in the horizontal orientation correspond to relevant terms in a single document. (2) The high energy coefficients in the vertical orientation correspond to relevant terms for a single document, but not for the others. (3) The high energy coefficients in the diagonal orientation correspond to relevant terms in a document in comparison to other terms. If we filter using the wavelet coefficients and fulfil these three conditions simultaneously, we can obtain a reduced vocabulary of the corpus, with less dimensions than in the original one. To test the success of the reduced vocabulary, we recoded the corpus with the new reduced vocabulary and we obtained a statistically relevant level of accuracy for document classification.