According to WIRED, Rachel Greenstadt, an associate professor of computer science at Drexel University, and Aylin Caliskan, Greenstadt’s former PhD student and now an assistant professor at George Washington University, have developed a machine learning algorithm that can identify programmers by how they’ve written their code.
Whether it’s through the raw source code or a set of compiled binaries, the approach trains an algorithm to recognise a programmer’s coding structure based on examples of their work, and uses those to examine common traits found in the programmer’s style.
Programmers Can No Longer Hide Behind The Code
Caliskan, Greenstadt, and two other researchers have previously demonstrated that even small examples of code on the repository site GitHub can be enough to differentiate one coder from another with a high degree of accuracy.
In another example, Caliskan and a team of other researchers showed in a separate paper that it’s possible to completely de-anonymize a programmer using only their compiled binary code.