A subject rapidly views a small sample of photographs culled from a much larger database—as many as 10 pictures a second. The device transmits the data to a computer that ranks which photographs elicited the strongest cortical recognition responses. The computer looks for similarities in the visual characteristics of different high-ranking photographs, such as color, texture and the shapes of edges and lines.
Then it scans the much larger database—it could contain upward of 50 million images—and pulls out those that rank high in visual characteristics most highly correlated with the “aha” moments detected by the EEG.
System architecture for using EEG to bootstrap computer vision model construction. A sample set of images is taken from a database and the subject processes these images in RSVP mode while EEG is simultaneously recorded.The EEG is decoded and used to tag images in terms of how strong they grabbed the user’s attention. The images can be seen as being a small set of labeled images, some of which might be the images of interest (e.g., images of soldiers) and some of which just grabbed the users attention because of novelty (e.g., the fellow with the interesting hairstyle). These small sets of labeled images are used as training data in a transductive graphic model which operates in the features space of the image. The transductive model uses the limited training data and manifold structures in the image feature space to propagate the initial labels to the rest of the images in the database The system includes a self-tuning mechanism which enables removal of tagged by the EEG as being interesting, but that deviate from the manifold structures. For example, the image with the blue border can be interpreted as a false positive and removed based on self-tuning. The computer vision model is then used to predict the relevance (priority) scores of the rest of images in the database. Images taken from Caltech image database.
Computer vision is limited in its ability to identify interesting imagery, particularly as “interesting” might be defined by an individual. In this paper we describe our efforts in developing brain–computer interfaces (BCIs) which synergistically integrate computer vision and human vision so as to construct a system for image triage. Our approach exploits machine learning for real-time decoding of brain signals which are recorded noninvasively via electroencephalography (EEG). The signals we decode are specific for events related to imagery attracting a user’s attention. We describe two architectures we have developed for this type of cortically coupled computer vision and discuss potential applications and challenges for the future.
The researchers also have a highly efficient algorithm for searching their image database.
Hybrid iterative shrinkage (HIS), the resulting algorithm is comprised of
a fixed point continuation phase and an interior point phase. The first phase is based completely on memory efficient operations such as matrix-vector multiplications, while the second phase is based on a truncated Newton’s method. Furthermore, we show that various optimization techniques, including line search and continuation, can significantly accelerate convergence. The algorithm has global convergence at a geometric rate (a Q-linear rate in optimization terminology). We present a numerical comparison with several existing algorithms, using benchmark data from the UCI machine learning repository, and show our algorithm is the most computationally efficient without loss of accuracy.