A new paper, to be presented at next week’s IEEE International Conference on Data Science and Advanced Analytics, details the evolution of the university’s Data Science Machine – a sort-of AI system that is adept at spotting trends and patterns in large chunks of data.
MIT ran the machine as a ringer in three human data science tournaments and had considerable success. Out of 906 human teams in the competition to find patterns in data fields, the computer system beat 615 of them.
The Data Science Machine managed to get within 87 and 96 per cent of the accurate answers submitted by human competitors. But, crucially, the Data Science Machine managed to do the job much faster than its fleshy competitors – human teams took weeks to divine patterns from the data while the computer took a maximum of 12 hours.
The competitive success of the Data Science Machine suggests it has a role alongside data scientists. Currently, data scientists are very involved in the feature generation and selection processes. Our results show that the Data Science Machine can automatically create features of value and figure out how to use those features in creating a model. Although humans beat the Data Science Machine for all datasets, the machine’s success-to-effort ratio suggests there is a place for it in data science
The Data Science Machine is able to derive predictive models from raw data automatically. To achieve this automation, they first propose and develop the Deep Feature Synthesis algorithm for automatically generating features for relational datasets. The algorithm follows relationships in the data to a base field, and then sequentially applies mathematical functions along that path to create the final feature. Second, we implement a generalizable machine learning pipeline and tune it using a novel Gaussian Copula process based approach. They entered the Data Science Machine in 3 data science competitions that featured 906 other data science teams. Their approach beats 615 teams in these data science competitions. In 2 of the 3 competitions they beat a majority of competitors, and in the third, they achieved 94% of the best competitor’s score. In the best case, with an ongoing competition, they beat 85.6% of the teams and achieved 95.7% of the top submissions score.