Largest Semantic Map of English Language and AI decipher complex multigene relationships

Artificial intelligence is helping to accelerate and deepen our understanding of genetics and getting better at understanding our language.

Cognition Technologies, a next-generation Semantic Natural Language Processing (NLP) company, has announced the release of the largest commercially available Semantic Map of the English language. The scope of Cognition’s Semantic Map is more than double the size of any other computational linguistic dictionary for English, and includes over 10 million semantic connections that are comprised of semantic contexts, meaning representations, taxonomy and word meaning distinctions.

The semantic map is reportedly the world’s largest, and gives computers a vocabulary more than 10 times as extensive as that of a typical US college graduate.

Artificial Intelligence was used to identify the genes and genetic interrelationships underlying the impact of calorie restriction on maximum lifespan.

– They pooled data from several different microarray studies
– They used an unusual algorithm to classify samples as CR or normal (for the computer scientists: they used genetic programming to learn an ensemble of classification rules).

Their algorithm votes for whether a sample is CR or normal based on the outputs of several short classification rules (short because each rule looks only at the expression levels of a few genes).
An advantage of this type of approach (over, say, a ‘black box’ neural network) is that the classification rules are easy to interpret biologically: you can search through them to identify important genes and genetic relationships. A gene is important for CR if it appears in many different rules, and two (or more) genes are related if they appear together in many rules.

The interpretation of genes with AI work was done by Biomind

Ben Goertzel, Biomind LLC, Rockville, Maryland.
Cassio Pennachin, Biomind LLC, Rockville, Maryland.
Maurício de Alvarenga Mudado, Biomind LLC, Rockville, Maryland.
Lúcio de Souza Coelho, Biomind LLC, Rockville, Maryland.

Novel artificial intelligence methodologies were applied to analyze gene expression microarray data gathered from mice under a calorie restriction (CR) regimen. The data were gathered from three previously published mouse studies; these datasets were merged together into a single composite dataset for the purpose of conducting a broader-based analysis. The result was a list of genes that are important for the impact of CR on lifespan, not necessarily in terms of their individual actions but in terms of their interactions with other genes. Furthermore, a map of gene interrelationships was provided, suggesting which intergene interactions are most important for the effect of CR on life extension. In particular our analysis showed that the genes Mrpl12, Uqcrh, and Snip1 play central roles regarding the effects of CR on life extension, interacting with many other genes (which the analysis enumerates) in carrying out their roles. This is the first time that the genes Snip1 and Mrpl12 have been identified in the context of aging. In a follow-up analysis aimed at validating these results, the analytic process was rerun with a fourth dataset included, yielding largely comparable results. Broadly, the biological interpretation of these analytical results suggests that the effects of CR on life extension are due to multiple factors, including factors identified in prior theories of aging, such as the hormesis, development, cellular, and free radical theories.