The data that describe corrosion phenomena in terms of individual cases and feature values specific a mapping onto an abstract multi dimensional feature space. Regions of high density, or clusters, significant areas of interest, and possibly, similar corrosion phenomena. This paper describes a method for calculating the similarity between cases based on a generalized measure of Euclidean distance. Evidence for a correlation between the similarity and observed corrosion phenomena is presented for a real-world database on chloride stress corrosion cracking (SCC) of Type 304 stainless steel in water. In contrast to conventional data analysis techniques, the method described can tolerate incomplete data. The performance of the method under different combinations of features was evaluated by
calculating the error rate, which is the quotient of the erroneous predictions over the total number of cases examined. The error rate determined by considering incomplete data on pH and oxygen content in addition to complete data on temperature and chloride content was half that determined by considering these complete data alone. Additional data on evaporative service were found to correlate poorly with SCC behavior for the case examined. These results illustrate the importance of feature selection in empirical modelling.
Keywords: aqueous corrosion, corrosion diita, empirical modeling, Euclidean distance, nearest neighbor, stainless steel stress corrosion cracking