Building Classifiers to Identify Split Files.

Green, P. D., Lane, P.C.R., Rainer, A. and Scholz, S. (2009) Building Classifiers to Identify Split Files. In: UNSPECIFIED.
Copy

We apply machine-learning techniques to help automate the process of mining the version history of software projects. Analysis of version histories is important in the study of software evolution. One of the associated problems is tracing program elements which have changed or moved as the result of file restructuring. As an initial application, we have developed classifiers to identify one such type of file change, `split files'. Our process involves extracting features through syntactic analysis of the original source code, and then training and evaluating classifiers against a set of data assessed by visual inspection. We analysed 266K files from 84 open-source projects, filtering out a set of candidate files for which our classifiers achieve either 89% overall accuracy, or a false positive rate of 5%.


picture_as_pdf
mldm-split-files.pdf
Available under Creative Commons: 4.0

View Download

Explore Further

Read more research from the creator(s):