Computational Cognition & Perception Lab
Research People Papers Teaching

Abstracts of Selected Publications

2024

Hu R, Jacobs RA. Does Stimulus Category Coherence Influence Visual Working Memory? A Rational Analysis. Cogn Sci. 2024 Sep;48(9):e13498. doi: 10.1111/cogs.13498.

Visual working memory (VWM) refers to the temporary storage and manipulation of visual information. Although visually different, objects we view and remember can share the same higher-level category information, such as an apple, orange, and banana all being classified as fruit. We study the influence of category information on VWM, focusing on the question of whether stimulus category coherence (i.e., whether all to-be-remembered items belong to the same semantic category) influences VWM performance. This question is addressed in two behavioral experiments using a change-detection paradigm and a rational analysis using an ideal observer based on a Bayesian model. Both experimental participants and the ideal observer often, but not always, performed numerically better on coherent trials (i.e., when all stimuli belonged to the same category). We hypothesize that the influence of category coherence information on VWM may be task-dependent and/or stimulus-dependent. In conditions when category coherence information is highly valuable for task performance, as indicated by the ideal observer, then participants tended to make use of it. However, when the ideal observer suggested this information was not crucial to performance, participants did not. In addition, both participants and the ideal observer showed a bias toward responding “same,” and often showed a stronger influence of category coherence on change trials. The consistencies between participant and ideal observer responses suggest participants often behaved as they did because these behaviors are optimal (or approximately so) for maximizing task performance. This may help explain conflicting results reported in the scientific literature.

Close Window

2023

German, J.S., Jacobs, R.A. (2023). Implications of capacity-limited, generative models for human vision. Behav Brain Sci. 2023 Dec 6;46:e391. doi: 10.1017/S0140525X23001772.

Although discriminative deep neural networks are currently dominant in cognitive modeling, we suggest that capacity-limited, generative models are a promising avenue for future work. Generative models tend to learn both local and global features of stimuli and, when properly constrained, can learn componential representations and response biases found in people's behaviors.

Close Window

German, J.S., Cui, G., Xu, C., Jacobs, R.A. (2023). Rapid runtime learning by curating small datasets of high-quality items obtained from memory. PLOS Computational Biology 19(10): e1011445. https://doi.org/10.1371/journal.pcbi.1011445

We propose the “runtime learning” hypothesis which states that people quickly learn to perform unfamiliar tasks as the tasks arise by using task-relevant instances of concepts stored in memory during mental training. To make learning rapid, the hypothesis claims that only a few class instances are used, but these instances are especially valuable for training. The paper motivates the hypothesis by describing related ideas from the cognitive science and machine learning literatures. Using computer simulation, we show that deep neural networks (DNNs) can learn effectively from small, curated training sets, and that valuable training items tend to lie toward the centers of data item clusters in an abstract feature space. In a series of three behavioral experiments, we show that people can also learn effectively from small, curated training sets. Critically, we find that participant reaction times and fitted drift rates are best accounted for by the confidences of DNNs trained on small datasets of highly valuable items. We conclude that the runtime learning hypothesis is a novel conjecture about the relationship between learning and memory with the potential for explaining a wide variety of cognitive phenomena.

Download PDF file | Close Window

2022

Sims, C.R., Lerch, R.A., Tarduno, J.A. et al. Conceptual knowledge shapes visual working memory for complex visual information. Sci Rep 12, 8088 (2022). https://doi.org/10.1038/s41598-022-12137-0

Human visual working memory (VWM) is a memory store people use to maintain the visual features of objects and scenes. Although it is obvious that bottom-up information influences VWM, the extent to which top-down conceptual information influences VWM is largely unknown. We report an experiment in which groups of participants were trained in one of two different categories of geologic faults (left/right lateral, or normal/reverse faults), or received no category training. Following training, participants performed a visual change detection task in which category knowledge was irrelevant to the task. Participants were more likely to detect a change in geologic scenes when the changes crossed a trained categorical distinction (e.g., the left/right lateral fault boundary), compared to within-category changes. In addition, participants trained to distinguish left/right lateral faults were more likely to detect changes when the scenes were mirror images along the left/right dimension. Similarly, participants trained to distinguish normal/reverse faults were more likely to detect changes when scenes were mirror images along the normal/reverse dimension. Our results provide direct empirical evidence that conceptual knowledge influences VWM performance for complex visual information. An implication of our results is that cognitive scientists may need to reconceptualize VWM so that it is closer to “conceptual short-term memory”.

Download PDF file | Close Window

2021

Hu, R. & Jacobs, R. A. (2021). Semantic influence on visual working memory of object identity and location. Cognition, Volume 217, 2021, 104891. https://doi.org/10.1016/j.cognition.2021.104891

Does semantic information—in particular, regularities in category membership across objects—influence visual working memory (VWM) processing? We predict that the answer is “yes”. Four experiments evaluating this prediction are reported. Experimental stimuli were images of real-world objects arranged in either one or two spatial clusters. On coherent trials, all objects belonging to a cluster also belonged to the same category. On incoherent trials, at least one cluster contained objects from different categories. Experiments using a change-detection paradigm (Experiments 1–3) and an experiment in which participants recalled the locations of objects in a scene (Experiment 4) yielded the same result: participants showed better memory performance on coherent trials than on incoherent trials. Taken as a whole, these experiments provide the best (perhaps only) data to date demonstrating that statistical regularities in semantic category membership improve VWM performance. Because a conventional perspective in cognitive science regards VWM as being sensitive solely to bottom-up visual properties of objects (e.g., shape, color, orientation), our results indicate that cognitive science may need to modify its conceptualization of VWM so that it is closer to “conceptual short-term memory”, a short-term memory store representing current stimuli and their associated concepts (Potter, 1993, Potter, 2012).

Download PDF file | Close Window

Wu M.H., Anderson, A.J., Jacobs, R.A., Raizada, R.D.S (2021). Analogy-Related Information Can be Accessed by Simple Addition and Subtraction of fMRI Activation Patterns, without Participants Performing any Analogy Task. Neurobiology of Language. https://doi.org/10.1162/nol_a_00045

Analogical reasoning, e.g., inferring that teacher is to chalk as mechanic is to wrench, plays a fundamental role in human cognition. However, whether brain activity patterns of individual words are encoded in a way that could facilitate analogical reasoning is unclear. Recent advances in computational linguistics have shown that information about analogical problems can be accessed by simple addition and subtraction of word embeddings (e.g., wrench = mechanic + chalk − teacher). Critically, this property emerges in artificial neural networks that were not trained to produce analogies but instead were trained to produce general-purpose semantic representations. Here, we test whether such emergent property can be observed in representations in human brains, as well as in artificial neural networks. fMRI activation patterns were recorded while participants viewed isolated words but did not perform analogical reasoning tasks. Analogy relations were constructed from word pairs that were categorically or thematically related, and we tested whether the predicted fMRI pattern calculated with simple arithmetic was more correlated with the pattern of the target word than other words. We observed that the predicted fMRI patterns contain information not only about the identity of the target word but also its category and theme (e.g., teaching-related). In summary, this study demonstrated that information about analogy questions can be reliably accessed with the addition and subtraction of fMRI patterns, and that, similar to word embeddings, this property holds for task-general patterns elicited when participants were not explicitly told to perform analogical reasoning.

Download PDF file | Close Window

Bates, C. J. & Jacobs, R. A. (2021). Optimal attentional allocation in the presence of capacity constraints in uncued and cued visual search. Journal of Vision, 21(5):3, 1-23. https://doi.org/10.1167/jov.21.5.3

The vision sciences literature contains a large diversity of experimental and theoretical approaches to the study of visual attention. We argue that this diversity arises, at least in part, from the field's inability to unify differing theoretical perspectives. In particular, the field has been hindered by a lack of a principled formal framework for simultaneously thinking about both optimal attentional processing and capacity-limited attentional processing, where capacity is limited in a general, task-independent manner. Here, we supply such a framework based on rate-distortion theory (RDT) and optimal lossy compression. Our approach defines Bayes-optimal performance when an upper limit on information processing rate is imposed. In this article, we compare Bayesian and RDT accounts in both uncued and cued visual search tasks. We start by highlighting a typical shortcoming of unlimited-capacity Bayesian models that is not shared by RDT models, namely, that they often overestimate task performance when information-processing demands are increased. Next, we reexamine data from two cued-search experiments that have previously been modeled as the result of unlimited-capacity Bayesian inference and demonstrate that they can just as easily be explained as the result of optimal lossy compression. To model cued visual search, we introduce the concept of a "conditional communication channel." This simple extension generalizes the lossy-compression framework such that it can, in principle, predict optimal attentional-shift behavior in any kind of perceptual task, even when inputs to the model are raw sensory data such as image pixels. To demonstrate this idea's viability, we compare our idealized model of cued search, which operates on a simplified abstraction of the stimulus, to a deep neural network version that performs approximately optimal lossy compression on the real (pixel-level) experimental stimuli.

Download PDF file | Close Window

2020

Wu M.H., Kleinschmidt D., Emberson L., Doko D., Edelman S., Jacobs R., Raizada R. (2020). Cortical Transformation of Stimulus Space in Order to Linearize a Linearly Inseparable Task. J Cogn Neurosci. 32(12):2342-2355. doi: 10.1162/jocn_a_01533.

The human brain is able to learn difficult categorization tasks, even ones that have linearly inseparable boundaries; however, it is currently unknown how it achieves this computational feat. We investigated this by training participants on an animal categorization task with a linearly inseparable prototype structure in a morph shape space. Participants underwent fMRI scans before and after 4 days of behavioral training. Widespread representational changes were found throughout the brain, including an untangling of the categories’ neural patterns that made them more linearly separable after behavioral training. These neural changes were task dependent, as they were only observed while participants were performing the categorization task, not during passive viewing. Moreover, they were found to occur in frontal and parietal areas, rather than ventral temporal cortices, suggesting that they reflected attentional and decisional reweighting, rather than changes in object recognition templates. These results illustrate how the brain can flexibly transform neural representational space to solve computationally challenging tasks

Download PDF file | Close Window

Bates, C. J. & Jacobs, R. A. (2020). Efficient data compression in perception and perceptual memory. Psychological Review, 127, 891-917. http://dx.doi.org/10.1037/rev0000197

Efficient data compression is essential for capacity-limited systems, such as biological perception and perceptual memory. We hypothesize that the need for efficient compression shapes biological systems in many of the same ways that it shapes engineered systems. If true, then the tools that engineers use to analyze and design systems, namely rate-distortion theory (RDT), can profitably be used to understand human perception and memory. The first portion of this article discusses how three general principles for efficient data compression provide accounts for many important behavioral phenomena and experimental results. We also discuss how these principles are embodied in RDT. The second portion notes that exact RDT methods are computationally feasible only in low-dimensional stimulus spaces. To date, researchers have used deep neural networks to approximately implement RDT in high-dimensional spaces, but these implementations have been limited to tasks in which the sole goal is compression with respect to reconstruction error. Here, we introduce a new deep neural network architecture that approximately implements RDT. An important property of our architecture is that it is trained "end-to-end," operating on raw perceptual input (e.g., pixel values) rather than intermediate levels of abstraction, as is the case with most psychological models. The article's final portion conjectures on how efficient compression can occur in memory over time, thereby providing motivations for multiple memory systems operating at different time scales, and on how efficient compression may explain some attentional phenomena such as RTs in visual search.

Download PDF file | Close Window

Bates, C.J., Sims, C.R., Jacobs, R.A. (2020). The importance of constraints on constraints Behavioral and Brain Sciences, 43, e3.

The "resource-rational" approach is ambitious and worthwhile. A shortcoming of the proposed approach is that it fails to constrain what counts as a constraint. As a result, constraints used in different cognitive domains often have nothing in common. We describe an alternative framework that satisfies many of the desiderata of the resource-rational approach, but in a more disciplined manner.

Download PDF file | Close Window

German, J.S., Jacobs, R.A. (2020). Can machine learning account for human visual object shape similarity judgments? Vision Research, 167, 87-99.

We describe and analyze the performance of metric learning systems, including deep neural networks (DNNs), on a new dataset of human visual object shape similarity judgments of naturalistic, part-based objects known as "Fribbles". In contrast to previous studies which asked participants to judge similarity when objects or scenes were rendered from a single viewpoint, we rendered Fribbles from multiple viewpoints and asked participants to judge shape similarity in a viewpoint-invariant manner. Metrics trained using pixel-based or DNN-based representations fail to explain our experimental data, but a metric trained with a viewpoint-invariant, part-based representation produces a good fit. We also find that although neural networks can learn to extract the part-based representation—and therefore should be capable of learning to model our data—networks trained with a "triplet loss" function based on similarity judgments do not perform well. We analyze this failure, providing a mathematical description of the relationship between the metric learning objective function and the triplet loss function. The poor performance of neural networks appears to be due to the nonconvexity of the optimization problem in network weight space. We conclude that viewpoint insensitivity is a critical aspect of human visual shape perception, and that neural network and other machine learning methods will need to learn viewpoint-insensitive representations in order to account for people's visual object shape similarity judgments.

Download PDF file | Close Window

2019

Bates, C. J., Lerch, R. A., Sims, C. R., & Jacobs, R. A. (2019). Adaptive allocation of human visual working memory capacity during statistical and categorical learning. Journal of Vision, 19(2):11, 1-23.

Human brains are finite, and thus have bounded capacity. An efficient strategy for a capacity-limited agent is to continuously adapt by dynamically reallocating capacity in a task-dependent manner. Here we study this strategy in the context of visual working memory (VWM). People use their VWM stores to remember visual information over seconds or minutes. However, their memory performances are often error- prone, presumably due to VWM capacity limits. We hypothesize that people attempt to be flexible and robust by strategically reallocating their limited VWM capacity based on two factors: (a) the statistical regularities (e.g., stimulus feature means and variances) of the to-be-remembered items, and (b) the requirements of the task that they are attempting to perform. The latter specifies, for example, which types of errors are costly versus irrelevant for task performance. These hypotheses are formalized within a normative computational modeling framework based on rate-distortion theory, an extension of conventional Bayesian approaches that uses information theory to study rate-limited (or capacity-limited) processes. Using images of plants that are naturalistic and precisely controlled, we carried out two sets of experiments. Experiment 1 found that when a stimulus dimension (the widths of plants' leaves) was assigned a distribution, subjects adapted their VWM performances based on this distribution. Experiment 2 found that when one stimulus dimension (e.g., leaf width) was relevant for distinguishing plant categories but another dimension (leaf angle) was irrelevant, subjects' responses in a memory task became relatively more sensitive to the relevant stimulus dimension. Together, these results illustrate the task-dependent robustness of VWM, thereby highlighting the dependence of memory on learning.

Download PDF file | Close Window

Jacobs, R. A. & Bates, C. J. (2019). Comparing the visual representations and performance of human and deep neural networks. Current Directions in Psychological Science, 28, 34-39.

Although deep neural networks (DNNs) are state-of-the-art artificial intelligence systems, it is unclear what insights, if any, they provide about human intelligence. We address this issue in the domain of visual perception. After briefly describing DNNs, we provide an overview of recent results comparing human visual representations and performance with those of DNNs. In many cases, DNNs acquire visual representations and processing strategies that are very different from those used by people. We conjecture that there are at least two factors preventing them from serving as better psychological models. First, DNNs are currently trained with impoverished data, such as data lacking important visual cues to three-dimensional structure, data lacking multisensory statistical regularities, and data in which stimuli are unconnected to an observer's actions and goals. Second, DNNs typically lack adaptations to capacity limits, such as attentional mechanisms, visual working memory, and compressed mental representations biased toward preserving task-relevant abstractions.

Download PDF file | Close Window

Jacobs, R. A. & Xu, C. (2019). Can multisensory training aid visual learning? A computational investigation. Journal of Vision, 19(11):1, 1-12.

Although real-world environments are often multisensory, visual scientists typically study visual learning in unisensory environments containing visual signals only. Here, we use deep or artificial neural networks to address the question, Can multisensory training aid visual learning? We examine a network's internal representations of objects based on visual signals in two conditions: (a) when the network is initially trained with both visual and haptic signals, and (b) when it is initially trained with visual signals only. Our results demonstrate that a network trained in a visual-haptic environment (in which visual, but not haptic, signals are orientation-dependent) tends to learn visual representations containing useful abstractions, such as the categorical structure of objects, and also learns representations that are less sensitive to imaging parameters, such as viewpoint or orientation, that are irrelevant for object recognition or classification tasks. We conclude that researchers studying perceptual learning in vision-only contexts may be overestimating the difficulties associated with important perceptual learning problems. Although multisensory perception has its own challenges, perceptual learning can become easier when it is considered in a multisensory setting.

Download PDF file | Close Window

2018

Chen, Q., Garcea, F. E., Jacobs, R. A., & Mahon, B. Z. (2018). Abstract representations of object directed action in the left inferior parietal lobule. Cerebral Cortex, 28, 2162-2174.

Prior neuroimaging and neuropsychological research indicates that the left inferior parietal lobule in the human brain is a critical substrate for representing object manipulation knowledge. In the present functional MRI study we used multivoxel pattern analyses to test whether action similarity among objects can be decoded in the inferior parietal lobule independent of the task applied to objects (identification or pantomime) and stimulus format in which stimuli are presented (pictures or printed words). Participants pantomimed the use of objects, cued by printed words, or identified pictures of objects. Classifiers were trained and tested across task (e.g., training data: pantomime; testing data: identification), stimulus format (e.g., training data: word format; testing format: picture) and specific objects (e.g., training data: scissors vs. corkscrew; testing data: pliers vs. screwdriver). The only brain region in which action relations among objects could be decoded across task, stimulus format and objects was the inferior parietal lobule. By contrast, medial aspects of the ventral surface of the left temporal lobe represented object function, albeit not at the same level of abstractness as actions in the inferior parietal lobule. These results suggest compulsory access to abstract action information in the inferior parietal lobe even when simply identifying objects.

Download PDF file | Close Window

2017

Erdogan, G. & Jacobs, R. A. (2017). Visual shape perception as Bayesian inference of 3D Object-Centered Shape Representations. Psychological Review, 124, 740-761.

Despite decades of research, little is known about how people visually perceive object shape. We hypothesize that a promising approach to shape perception is provided by a "visual perception as Bayesian inference" framework which augments an emphasis on visual representation with an emphasis on the idea that shape perception is a form of statistical inference. Our hypothesis claims that shape perception of unfamiliar objects can be characterized as statistical inference of 3D shape in an object-centered coordinate system. We describe a computational model based on our theoretical framework, and provide evidence for the model along two lines. First, we show that, counterintuitively, the model accounts for viewpoint-dependency of object recognition, traditionally regarded as evidence against people's use of 3D object-centered shape representations. Second, we report the results of an experiment using a shape similarity task, and present an extensive evaluation of existing models' abilities to account for the experimental data. We find that our shape inference model captures subjects' behaviors better than competing models. Taken as a whole, our experimental and computational results illustrate the promise of our approach and suggest that people's shape representations of unfamiliar objects are probabilistic, 3D, and object-centered.

Download PDF file | Close Window

Overlan, M. C., Jacobs, R. A., & Piantadosi, S. T. (2017). Learning abstract visual concepts via probabilistic program induction in a Language of Thought. Cognition, 168, 320-334.

The ability to learn abstract concepts is a powerful component of human cognition. It has been argued that variable binding is the key element enabling this ability, but the computational aspects of variable binding remain poorly understood. Here, we address this shortcoming by formalizing the Hierarchical Language of Thought (HLOT) model of rule learning. Given a set of data items, the model uses Bayesian inference to infer a probability distribution over stochastic programs that implement variable binding. Because the model makes use of symbolic variables as well as Bayesian inference and programs with stochastic primitives, it combines many of the advantages of both symbolic and statistical approaches to cognitive modeling. To evaluate the model, we conducted an experiment in which human subjects viewed training items and then judged which test items belong to the same concept as the training items. We found that the HLOT model provides a close match to human generalization patterns, significantly outperforming two variants of the Generalized Context Model, one variant based on string similarity and the other based on visual similarity using features from a deep convolutional neural network. Additional results suggest that variable binding happens automatically, implying that binding operations do not add complexity to peoples’ hypothesized rules. Overall, this work demonstrates that a cognitive model combining symbolic variables with Bayesian inference and stochastic program primitives provides a new perspective for understanding people’s patterns of generalization.

Download PDF file | Close Window

2016

Erdogan, G., Chen, Q., Garcea, F. E., Mahon, B. Z., & Jacobs, R. A. (2016). Multisensory part-based representations of objects in human lateral occipital cortex. Journal of Cognitive Neuroscience, 28, 869-881.

The format of high-level object representations in temporal-occipital cortex is a fundamental and as yet unresolved issue. Here we use fMRI to show that human lateral occipital cortex (LOC) encodes novel 3-D objects in a multisensory and part-based format. We show that visual and haptic exploration of objects leads to similar patterns of neural activity in human LOC and that the shared variance between visually and haptically induced patterns of BOLD contrast in LOC reflects the part structure of the objects. We also show that linear classifiers trained on neural data from LOC on a subset of the objects successfully predict a novel object based on its component part structure. These data demonstrate a multisensory code for object representations in LOC that specifies the part structure of objects.

Download PDF file | Close Window

Piantadosi, S. T. & Jacobs, R. A. (2016). Four problems solved by the probabilistic language of thought. Current Directions in Psychological Science, 25, 54-59.

We argue for the advantages of the probabilistic language of thought (pLOT), a recently emerging approach to modeling human cognition. Work using this framework demonstrates how the pLOT (a) refines the debate between symbols and statistics in cognitive modeling, (b) permits theories that draw on insights from both nativist and empiricist approaches, (c) explains the origins of novel and complex computational concepts, and (d) provides a framework for abstraction that can link sensation and conception. In each of these areas, the pLOT provides a productive middle ground between historical divides in cognitive psychology, pointing to a promising way forward for the field.

Download PDF file | Close Window

2015

Erdogan, G., Yildirim, I., & Jacobs, R. A. (2015). From sensory signals to modality-independent conceptual representations: A probabilistic language of thought approach. PLoS Computational Biology, 11(11), e1004610.

People learn modality-independent, conceptual representations from modality-specific sensory signals. Here, we hypothesize that any system that accomplishes this feat will include three components: a representational language for characterizing modality-independent representations, a set of sensory-specific forward models for mapping from modality-independent representations to sensory signals, and an inference algorithm for inverting forward models ó that is, an algorithm for using sensory signals to infer modality-independent representations. To evaluate this hypothesis, we instantiate it in the form of a computational model that learns object shape representations from visual and/or haptic signals. The model uses a probabilistic grammar to characterize modality-independent representations of object shape, uses a computer graphics toolkit and a human hand simulator to map from object representations to visual and haptic features, respectively, and uses a Bayesian inference algorithm to infer modality-independent object representations from visual and/or haptic signals. Simulation results show that the model infers identical object representations when an object is viewed, grasped, or both. That is, the model's percepts are modality invariant. We also report the results of an experiment in which different subjects rated the similarity of pairs of objects in different sensory conditions, and show that the model provides a very accurate account of subjects' ratings. Conceptually, this research significantly contributes to our understanding of modality invariance, an important type of perceptual constancy, by demonstrating how modality-independent representations can be acquired and used. Methodologically, it provides an important contribution to cognitive modeling, particularly an emerging probabilistic language-of-thought approach, by showing how symbolic and statistical approaches can be combined in order to understand aspects of human perception.

Download PDF file | Close Window

Yildirim, I. & Jacobs, R. A. (2015). Learning multisensory representations for auditory-visual transfer of sequence category knowledge: A probabilistic language of thought approach. Psychonomic Bulletin and Review, 22, 673-686.

If a person is trained to recognize or categorize objects or events using one sensory modality, the person can often recognize or categorize those same (or similar) objects and events via a novel modality. This phenomenon is an instance of cross-modal transfer of knowledge. Here, we study the Multisensory Hypothesis which states that people extract the intrinsic, modality-independent properties of objects and events, and represent these properties in multisensory representations. These representations underlie cross-modal transfer of knowledge. We conducted an experiment evaluating whether people transfer sequence category knowledge across auditory and visual domains. Our experimental data clearly indicate that we do. We also developed a computational model accounting for our experimental results. Consistent with the probabilistic language of thought approach to cognitive modeling, our model formalizes multisensory representations as symbolic ìcomputer programsî and uses Bayesian inference to learn these representations. Because the model demonstrates how the acquisition and use of amodal, multisensory representations can underlie cross-modal transfer of knowledge, and because the model accounts for subjectsí experimental performances, our work lends credence to the Multisensory Hypothesis. Overall, our work suggests that people automatically extract and represent objectsí and eventsí intrinsic properties, and use these properties to process and understand the same (and similar) objects and events when they are perceived through novel sensory modalities.

Download PDF file | Close Window

2014

Orhan, A. E. & Jacobs, R. A. (2014). Toward ecologically realistic theories in visual short-term memory research. Attention, Perception, and Psychophysics, 76, 1058-1070.

Recent evidence from neuroimaging and psychophysics suggests common neural and representational substrates for visual perception and visual short-term memory (VSTM). Visual perception is adapted to a rich set of statistical regularities present in the natural visual environment. Common neural and representational substrates for visual perception and VSTM suggest that VSTM is adapted to these same statistical regularities too. This paper discusses how the study of VSTM can be extended to stimuli that are ecologically more realistic than those commonly used in standard VSTM experiments, and what the implications of such an extension could be for our current view of VSTM. We advocate for the development of unified models of visual perception and VSTM—probabilistic and hierarchical in nature—incorporating prior knowledge of natural scene statistics.

Download PDF file | Close Window

Orhan, A. E., Sims, C. R., Jacobs, R. A., & Knill, D. C. (2014). The adaptive nature of visual working memory. Current Directions in Psychological Science, 23, 164-170.

A growing body of scientific evidence suggest that visual working memory and statistical learning are intrinsically linked. Although visual working memory is severely resource limited, in may cases, it makes efficient use of its available resources by adapting to statistical regularities in the visual environment. However, experimental evidence also suggests that there are clear limits and biases in statistical learning. This raises the intriguing possibility that performance limitations observed in visual working memory tasks can to some degree be explained in terms of limits and biases in statistical-learning ability, rather than limits in memory capacity.

Download PDF file | Close Window

2013

Orhan, A. E. & Jacobs, R. A. (2013). A probabilistic clustering theory of the organization of visual short-term memory. Psychological Review, 120, 297-328.

Experimental evidence suggests that the content of a memory for even a simple display encoded in visual short-term memory (VSTM) can be very complex. VSTM uses organizational processes that make the representation of an item dependent on the feature values of all displayed items as well as on these items' representations. Here, we develop a probabilistic clustering theory (PCT) for modeling the organization of VSTM for simple displays. PCT states that VSTM represents a set of items in terms of a probability distribution over all possible clusterings or partitions of those items. Because PCT considers multiple possible partitions, it can represent an item at multiple granularities or scales simultaneously. Moreover, using standard probabilistic inference, it automatically determines the appropriate partitions for the particular set of items at hand and the probabilities or weights that should be allocated to each partition. A consequence of these properties is that PCT accounts for experimental data that have previously motivated hierarchical models of VSTM, thereby providing an appealing alternative to hierarchical models with prespecified, fixed structures. We explore both an exact implementation of PCT based on Dirichlet process mixture models and approximate implementations based on Bayesian finite mixture models. We show that a previously proposed 2-level hierarchical model can be seen as a special case of PCT with a single cluster. We show how a wide range of previously reported results on the organization of VSTM can be understood in terms of PCT. In particular, we find that, consistent with empirical evidence, PCT predicts biases in estimates of the feature values of individual items and also predicts a novel form of dependence between estimates of the feature values of different items. We qualitatively confirm this last prediction in 3 novel experiments designed to directly measure biases and dependencies in subjects' estimates.

Download PDF file | Close Window

Sims, C. R., Neth, H., Jacobs, R. A., & Gray, W. D. (2013). Melioration as rational choice: Sequential decision making in uncertain environments. Psychological Review, 120, 139-154.

Melioration defined as choosing a lesser, local gain over a greater longer term gain is a behavioral tendency that people and pigeons share. As such, the empirical occurrence of meliorating behavior has frequently been interpreted as evidence that the mechanisms of human choice violate the norms of economic rationality. In some environments, the relationship between actions and outcomes is known. In this case, the rationality of choice behavior can be evaluated in terms of how successfully it maximizes utility given knowledge of the environmental contingencies. In most complex environments, however, the relationship between actions and future outcomes is uncertain and must be learned from experience. When the difficulty of this learning challenge is taken into account, it is not evident that melioration represents suboptimal choice behavior. In the present article, we examine human performance in a sequential decision-making experiment that is known to induce meliorating behavior. In keeping with previous results using this paradigm, we find that the majority of participants in the experiment fail to adopt the optimal decision strategy and instead demonstrate a significant bias toward melioration. To explore the origins of this behavior, we develop a rational analysis (Anderson, 1990) of the learning problem facing individuals in uncertain decision environments. Our analysis demonstrates that an unbiased learner would adopt melioration as the optimal response strategy for maximizing long-term gain. We suggest that many documented cases of melioration can be reinterpreted not as irrational choice but rather as globally optimal choice under uncertainty.

Download PDF file | Close Window

Yildirim, I. & Jacobs, R. A. (2013). Transfer of object category knowledge across visual and haptic modalities: Experimental and computational studies. Cognition, 126, 135-148.

We study people's abilities to transfer object category knowledge across visual and haptic domains. If a person learns to categorize objects based on inputs from one sensory modality, can the person categorize these same objects when the objects are perceived through another modality? Can the person categorize novel objects from the same categories when these objects are, again, perceived through another modality? Our work makes three contributions. First, by fabricating Fribbles (3-D, multi-part objects with a categorical structure), we developed visual-haptic stimuli that are highly complex and realistic, and thus more ecologically valid than objects that are typically used in haptic or visual-haptic experiments. Based on these stimuli, we developed the See and Grasp data set, a data set containing both visual and haptic features of the Fribbles, and are making this data set freely available on the world wide web. Second, complementary to previous research such as studies asking if people transfer knowledge of object identity across visual and haptic domains, we conducted an experiment evaluating whether people transfer object category knowledge across these domains. Our data clearly indicate that we do. Third, we developed a computational model that learns multisensory representations of prototypical 3-D shape. Similar to previous work, the model uses shape primitives to represent parts, and spatial relations among primitives to represent multi-part objects. However, it is distinct in its use of a Bayesian inference algorithm allowing it to acquire multisensory representations, and sensory-speci?c forward models allowing it to predict visual or haptic features from multisensory representations. The model provides an excellent qualitative account of our experimental data, thereby illustrating the potential importance of multisensory representations and sensory-specific forward models to multisensory perception.

Download PDF file | Close Window

2012

Evans, K. M., Jacobs, R. A., Tarduno, J. A., & Pelz, J. B. (2012). Collecting and analyzing eye-tracking data in outdoor environments. Journal of Eye Movement Research, 5(2):6, 1-19.

Natural outdoor conditions pose unique obstacles for researchers, above and beyond those inherent to all mobile eye-tracking research. During analyses of a large set of eye-tracking data collected on geologists examining outdoor scenes, we have found that the nature of calibration, pupil identification, fixation detection, and gaze analysis all require procedures different from those typically used for indoor studies. Here, we discuss each of these challenges and present solutions, which together define a general method useful for investigations relying on outdoor eye-tracking data. We also discuss recommendations for improving the tools that are available, to further increase the accuracy and utility of outdoor eye-tracking data.

Download PDF file | Close Window

Sims, C. R., Jacobs, R. A., & Knill, D. C. (2012). An ideal observer analysis of visual working memory. Psychological Review, 119, 807-830.

Limits in visual working memory (VWM) strongly constrain human performance across many tasks. However, the nature of these limits is not well understood. In this article we develop an ideal observer analysis of human VWM by deriving the expected behavior of an optimally performing but limited-capacity memory system. This analysis is framed around rate-distortion theory, a branch of information theory that provides optimal bounds on the accuracy of information transmission subject to a fixed information capacity. The result of the ideal observer analysis is a theoretical framework that provides a task-independent and quantitative definition of visual memory capacity and yields novel predictions regarding human performance. These predictions are subsequently evaluated and confirmed in 2 empirical studies. Further, the framework is general enough to allow the specification and testing of alternative models of visual memory (e.g., how capacity is distributed across multiple items). We demonstrate that a simple model developed on the basis of the ideal observer analysis---one that allows variability in the number of stored memory representations but does not assume the presence of a fixed item limit---provides an excellent account of the empirical data and further offers a principled reinterpretation of existing models of VWM.

Download PDF file | Close Window

Yildirim, I. & Jacobs, R. A. (2012). A rational analysis of the acquisition of multisensory representations. Cognitive Science, 36, 305-332.

How do people learn multisensory, or amodal, representations, and what consequences do these representations have for perceptual performance? We address this question by performing a rational analysis of the problem of learning multisensory representations. This analysis makes use of a Bayesian nonparametric model that acquires latent multisensory features that optimally explain the unisensory features arising in individual sensory modalities. The model qualitatively accounts for several important aspects of multisensory perception: (a) it integrates information from multiple sensory sources in such a way that it leads to superior performances in, for example, categorization tasks; (b) its performances suggest that multisensory training leads to better learning than unisensory training, even when testing is conducted in unisensory conditions; (c) its multisensory representations are modality invariant; and (d) it predicts "missing" sensory representations in modalities when the input to those modalities is absent. Our rational analysis indicates that all of these aspects emerge as part of the optimal solution to the problem of learning to represent complex multisensory environments.

Download PDF file | Close Window

2011

Yakushijin, R. & Jacobs, R. A. (2011). Are people successful at learning sequences of actions on a perceptual matching task? Cognitive Science, 35, 939-962.

We report the results of an experiment in which human subjects were trained to perform a perceptual matching task. Subjects were asked to manipulate comparison objects until they matched target objects using the fewest manipulations possible. An unusual feature of the experimental task is that efficient performance requires an understanding of the hidden or latent causal structure governing the relationships between actions and perceptual outcomes. We use two benchmarks to evaluate the quality of subjects' learning. One benchmark is based on optimal performance as calculated by a dynamic programming procedure. The other is based on an adaptive computational agent that uses a reinforcement-learning method known as Q-learning to learn to perform the task. Our analyses suggest that subjects were successful learners. In particular, they learned to perform the perceptual matching task in a near-optimal manner (i.e., using a small number of manipulations) at the end of training. Subjects were able to achieve near-optimal performance because they learned, at least partially, the causal structure underlying the task. In addition, subjects' performances were broadly consistent with those of model-based reinforcement-learning agents that built and used internal models of how their actions influenced the external environment. We hypothesize that people will achieve near-optimal performances on tasks requiring sequences of action—especially sensorimotor tasks with underlying latent causal structures—when they can detect the effects of their actions on the environment, and when they can represent and reason about these effects using an internal mental model.

Download PDF file | Close Window

Sims, C. R., Jacobs, R. A., & Knill, D. C. (2011). Adaptive allocation of vision under competing task demands. Journal of Neuroscience, 31, 928-943.

Human behavior in natural tasks consists of an intricately coordinated dance of cognitive, perceptual, and motor activities. Although much research has progressed in understanding the nature of cognitive, perceptual, or motor processing in isolation or in highly constrained settings, few studies have sought to examine how these systems are coordinated in the context of executing complex behavior. Previous research has suggested that, in the course of visually guided reaching movements, the eye and hand are yoked, or linked in a nonadaptive manner. In this work, we report an experiment that manipulated the demands that a task placed on the motor and visual systems, and then examined in detail the resulting changes in visuomotor coordination. We develop an ideal actor model that predicts the optimal coordination of vision and motor control in our task. On the basis of the predictions of our model, we demonstrate that human performance in our experiment reflects an adaptive response to the varying costs imposed by our experimental manipula- tions. Our results stand in contrast to previous theories that have assumed a fixed control mechanism for coordinating vision and motor control in reaching behavior.

Download PDF file | Close Window

Jacobs, R. A. & Kruschke, J. K. (2011). Bayesian learning theory applied to human cognition. Wiley Interdisciplinary Reviews: Cognitive Science, 2, 8-21.

Probabilistic models based on Bayesí rule are an increasingly popular approach to understanding human cognition. Bayesian models allow immense representational latitude and complexity. Because they use normative Bayesian mathematics to process those representations, they define optimal performance on a given task. This article focuses on key mechanisms of Bayesian information processing, and provides numerous examples illustrating Bayesian approaches to the study of human cognition. We start by providing an overview of Bayesian modeling and Bayesian networks. We then describe three types of information processing operationsóinference, parameter learning, and structure learningóin both Bayesian networks and human cognition. This is followed by a discussion of the important roles of prior knowledge and of active learning. We conclude by outlining some challenges for Bayesian models of human cognition that will need to be addressed by future research.

Download PDF file | Close Window

2010

Orhan, A. E., Michel, M. M., and Jacobs, R. A. (2010). Visual learning with reliable and unreliable features. Journal of Vision, 10(2):2, 1-15.

Existing studies of sensory integration demonstrate how the reliabilities of perceptual cues or features influence perceptual decisions. However, these studies tell us little about the influence of feature reliability on visual learning. In this article, we study the implications of feature reliability for perceptual learning in the context of binary classification tasks. We find that finite sets of training data (i.e., the stimuli and corresponding class labels used on training trials) contain different information about a learnerís parameters associated with reliable versus unreliable features. In particular, the statistical information provided by a finite number of training trials strongly constrains the set of possible parameter values associated with unreliable features, but only weakly constrains the parameter values associated with reliable features. Analyses of human subjectsí performances reveal that subjects were sensitive to this statistical information. Additional analyses examine why subjects were sub-optimal visual learners.

Download PDF file | Close Window

Jacobs, R. A. and Shams, L. (2010). Visual learning in multisensory environments. Topics in Cognitive Science, 2, 217-225.

We study the claim that multisensory environments are useful for visual learning because non-visual percepts can be processed to produce error signals that people can use to adapt their visual systems. This hypothesis is motivated by a Bayesian network framework. The framework is useful because it ties together three observations that have appeared in the literature: (a) signals from non-visual modalities can teach the visual system; (b) signals from nonvisual modalities can facilitate learning in the visual system; and (c) visual signals can become associated with (or be predicted by) signals from nonvisual modalities. Experimental data consistent with each of these observations are reviewed.

Download PDF file | Close Window

2009

Jacobs, R. A. (2009). Adaptive precision pooling of model neuron activities predicts the efficiency of human visual learning. Journal of Vision, 9(4):22, 1-15.

When performing a perceptual task, precision pooling occurs when an organism's decisions are based on the activities of a small set of highly informative neurons. The Adaptive Precision Pooling Hypothesis links perceptual learning and decision making by stating that improvements in performance occur when an organism starts to base its decisions on the responses of neurons that are more informative for a task than the responses that the organism had previously used. We trained human subjects on a visual slant discrimination task and found their performances to be suboptimal relative to an ideal probabilistic observer. Why were subjects suboptimal learners? Our computer simulation results suggest a possible explanation, namely that there are few neurons providing highly reliable information for the perceptual task, and that learning involves searching for these rare, informative neurons during the course of training. This explanation can account for several characteristics of human visual learning, including the fact that people often show large differences in their learning performances with some individuals showing no performance improvements, other individuals showing gradual improvements during the course of training, and still others showing abrupt improvements. The approach described here potentially provides a unifying framework for several theories of perceptual learning including theories stating that learning is due to adaptations of the weightings of read-out connections from early visual representations, external noise filtering or internal noise reduction, increases in the efficiency with which learners encode task-relevant information, and attentional selection of specific neural populations which should undergo adaptation.

Download PDF file | Close Window

2008

Chhabra, M. and Jacobs, R. A. (2008) Learning to combine motor primitives via greedy additive regression. Journal of Machine Learning Research, 9, 1535-1558.

The computational complexities arising in motor control can be ameliorated through the use of a library of motor synergies. We present a new model, referred to as the Greedy Additive Regression (GAR) model, for learning a library of torque sequences, and for learning the coefficients of a linear combination of sequences minimizing a cost function. From the perspective of numerical optimization, the GAR model is interesting because it creates a library of local features each sequence in the library is a solution to a single training task and learns to combine these sequences using a local optimization procedure, namely, additive regression. We speculate that learners with local representational primitives and local optimization procedures will show good performance on nonlinear tasks. The GAR model is also interesting from the perspective of motor control because it outperforms several competing models. Results using a simulated two-joint arm suggest that the GAR model consistently shows excellent performance in the sense that it rapidly learns to perform novel, complex motor tasks. Moreover, its library is overcomplete and sparse, meaning that only a small fraction of the stored torque sequences are used when learning a new movement. The library is also robust in the sense that, after an initial training period, nearly all novel movements can be learned as additive combinations of sequences in the library, and in the sense that it shows good generalization when an arm's dynamics are altered between training and test conditions, such as when a payload is added to the arm. Lastly, the GAR model works well regardless of whether motor tasks are specified in joint space or Cartesian space. We conclude that learning techniques using local primitives and optimization procedures are viable and potentially important methods for motor control and possibly other domains, and that these techniques deserve further examination by the artificial intelligence and cognitive science communities.

Download PDF file | Close Window

Clayards, M., Tanenhaus, M. K., Aslin, R. N., and Jacobs, R. A. (2008) Perception of speech reflects optimal use of probabilistic speech cues. Cognition, 108, 804-809.

Listeners are exquisitely sensitive to fine-grained acoustic detail within phonetic categories for sounds and words. Here we show that this sensitivity is optimal given the probabilistic nature of speech cues. We manipulated the probability distribution of one probabilistic cue, voice onset time (VOT), which differentiates word initial labial stops in English (e.g., beach and peach). Participants categorized words from distributions of VOT with wide or narrow variances. Uncertainty about word identity was measured by four-alternative forced-choice judgments and by the probability of looks to pictures. Both measures closely reflected the posterior probability of the word given the likelihood distributions of VOT, suggesting that listeners are sensitive to these distributions.

Download PDF file | Close Window

Michel, M. M. and Jacobs, R. A. (2008) Learning optimal integration of arbitrary features in a perceptual discrimination task. Journal of Vision, 8(2):3, 1-16.

A number of studies have demonstrated that people often integrate information from multiple perceptual cues in a statistically optimal manner when judging properties of surfaces in a scene. For example, subjects typically weight the information based on each cue to a degree that is inversely proportional to the variance of the distribution of a scene property given a cue's value. We wanted to determine whether subjects similarly use information about the reliabilities of arbitrary low-level visual features when making image-based discriminations, as in visual texture discrimination. To investigate this question, we developed a modification of the classification image technique and conducted two experiments that explored subjects‚ discrimination strategies using this improved technique. We created a basis set consisting of 20 low-level features and created stimuli by linearly combining the basis vectors. Subjects were trained to discriminate between two prototype signals corrupted with Gaussian feature noise. When we analyzed subjects‚ classification images over time, we found that they modified their decision strategies in a manner consistent with optimal feature integration, giving greater weight to reliable features and less weight to unreliable features. We conclude that optimal integration is not a characteristic specific to conventional visual cues or to judgments involving three-dimensional scene properties. Rather, just as researchers have previously demonstrated that people are sensitive to the reliabilities of conventionally defined cues when judging the depth or slant of a surface, we demonstrate that they are likewise sensitive to the reliabilities of arbitrary low-level features when making image-based discriminations.

Download PDF file | Close Window

2007

Chhabra, M., Jacobs, R. A., and Stefanovic, D. (2007) Behavioral shaping for geometric concepts. Journal of Machine Learning Research, 8, 1835-1865

In a search problem, an agent uses the membership oracle of a target concept to find a positive example of the concept. In a shaped search problem the agent is aided by a sequence of increasingly restrictive concepts leading to the target concept (analogous to behavioral shaping). The concepts are given by membership oracles, and the agent has to find a positive example of the target concept while minimizing the total number of oracle queries. We show that for the concept class of intervals on the real line an agent using a bounded number of queries per oracle exists. In contrast, for the concept class of unions of two intervals on the real line no agent with a bounded number of queries per oracle exists. We then investigate the (amortized) number of queries per oracle needed for the shaped search problem over other concept classes. We explore the following methods to obtain efficient agents. For axis-parallel rectangles we use a bootstrapping technique to obtain gradually better approximations of the target concept. We show that given rectangles R subset of A subset of d-dimensional real space one can obtain a rectangle A' superset of R with vol(A') / vol(R) <= 2, using only O(d x vol(A) / vol(R)) random samples from A. For ellipsoids of bounded eccentricity in d-dimensional real space we analyze a deterministic ray-shooting process which uses a sequence of rays to get close to the centroid. Finally, we use algorithms for generating random points in convex bodies (Dyer et al., 1991; Kannan et al., 1997) to give a randomized agent for the concept class of convex bodies. In the final section, we explore connections between our bootstrapping method and active learning. Specifically, we use the bootstrapping technique for axis-parallel rectangles to active learn axis-parallel rectangles under the uniform distribution in O(d ln(1/e)) labeled samples.

Download PDF file | Close Window

Ivanchenko, V. and Jacobs, R. A. (2007) Visual learning by cue-dependent and cue-invariant mechanisms. Vision Research, 47, 145-156

We examined learning at multiple levels of the visual system. Subjects were trained and tested on a same/different slant judgment task or a same/different curvature judgment task using simulated planar surfaces or curved surfaces defined by either stereo or monocular (texture and motion) cues. Taken as a whole, the results of four experiments are consistent with the hypothesis that learning takes place at both cue-dependent and cue-invariant levels, and that learning at these levels can have different generalization properties. If so, then cue-invariant mechanisms may mediate the transfer of learning from familiar cue conditions to novel cue conditions, thereby allowing perceptual learning to be robust and efficient. We claim that learning takes place at multiple levels of the visual system, and that a comprehensive understanding of visual perception requires a good understanding of learning at each of these levels.

Download PDF file | Close Window

Michel, M. M. and Jacobs, R. A. (2007) Parameter learning but not structure learning: A Bayesian network model of constraints on early perceptual learning. Journal of Vision, 7(1):4, 1-18

Visual scientists have shown that people are capable of perceptual learning in a large variety of circumstances. Are there constraints on such learning? We propose a new constraint on early perceptual learning, namely, that people are capable of parameter learning (they can modify their knowledge of the prior probabilities of scene variables or of the statistical relationships among scene and perceptual variables that are already considered to be potentially dependent) but they are not capable of structure learningVthey cannot learn new relationships among variables that are not considered to be potentially dependent, even when placed in novel environments in which these variables are strongly related. These ideas are formalized using the notation of Bayesian networks. We report the results of five experiments that evaluate whether subjects can demonstrate cue acquisition, which means that they can learn that a sensory signal is a cue to a perceptual judgment. In Experiment 1, subjects were placed in a novel environment that resembled natural environments in the sense that it contained systematic relationships among scene and perceptual variables that are normally dependent. In this case, cue acquisition requires parameter learning and, as predicted, subjects succeeded in learning a new cue. In Experiments 2ˆ5, subjects were placed in novel environments that did not resemble natural environments--they contained systematic relationships among scene and perceptual variables that are not normally dependent. Cue acquisition requires structure learning in these cases. Consistent with our hypothesis, subjects failed to learn new cues in Experiments 2ˆ5. Overall, the results suggest that the mechanisms of early perceptual learning are biased such that people can only learn new contingencies between scene and sensory variables that are considered to be potentially dependent.

Download PDF file | Close Window

2006

Michel, M.M. and Jacobs, R.A. (2006) The costs of ignoring high-order correlations in populations of model neurons. Neural Computation, 18, 660-682

Investigators debate the extent to which neural populations use pairwise and higher-order statistical dependencies among neural responses to represent information about a visual stimulus. To study this issue, three statistical decoders were used to extract the information in the responses of model neurons about the binocular disparities present in simulated pairs of left-eye and right-eye images: (1) the full joint probability decoder considered all possible statistical relations among neural responses as potentially important; (2) the dependence tree decoder also considered all possible relations as potentially important, but it approximated high-order statistical correlations using a computationally tractable procedure; and (3) the independent response decoder, which assumed that neural responses are statistically independent, meaning that all correlations should be zero and thus can be ignored. Simulation results indicate that high-order correlations among model neuron responses contain significant information about binocular disparities and that the amount of this high-order information increases rapidly as a function of neural population size. Furthermore, the results highlight the potential importance of the dependence tree decoder to neuroscientists as a powerful but still practical way of approximating high-order correlations among neural responses.

Download PDF file | Close Window

Chhabra, M. and Jacobs, R.A. (2006) Near-optimal human adaptive control across different noise environments. The Journal of Neuroscience, 26, 10883-10887

A person learning to control a complex system needs to learn about both the dynamics and the noise of the system. We evaluated human subjects' abilities to learn to control a stochastic dynamic system under different noise conditions. These conditions were created by corrupting the forces applied to the system with noise whose magnitudes were either proportional or inversely proportional to the sizes of subjects' control signals. We also used dynamic programming to calculate the mathematically optimal control laws of an "ideal actor" for each noise condition. The results suggest that people learned control strategies tailored to the specific noise characteristics of their training conditions. In particular, as predicted by the ideal actors, they learned to use smaller control signals when forces were corrupted by proportional noise and to use larger signals when forces were corrupted by inversely proportional noise, thereby achieving levels of performance near the information-theoretic upper bounds. We conclude that subjects learned to behave in a near-optimal manner, meaning that they learned to efficiently use all available information to plan and execute control policies that maximized performances on their tasks.

Download PDF file | Close Window

Chhabra, M. and Jacobs, R.A. (2006) Properties of synergies arising from a theory of optimal motor behavior. Neural Computation, 18, 2320-2342

We consider the properties of motor components, also known as synergies, arising from a computational theory (in the sense of Marr, 1982) of optimal motor behavior. An actor's goals were formalized as cost functions, and the optimal control signals minimizing the cost functions were calculated. Optimal synergies were derived from these optimal control signals using a variant of nonnegativematrix factorization. This was done using two different simulated two-joint arms—an arm controlled directly by torques applied at the joints and an arm in which forces were applied by muscles—and two types of motor tasks—reaching tasks and via-point tasks. Studies of themotor synergies reveal several interesting findings. First, optimal motor actions can be generated by summing a small number of scaled and time-shifted motor synergies, indicating that optimal movements can be planned in a low-dimensional space by using optimal motor synergies as motor primitives or building blocks. Second, some optimal synergies are task independent—they arise regardless of the task context—whereas other synergies are task dependent—they arise in the context of one task but not in the contexts of other tasks. Biological organisms use a combination of task-independent and task-dependent synergies. Our work suggests that this may be an efficient combination for generating optimal motor actions from motor primitives. Third, optimal motor actions can be rapidly acquired by learning new linear combinations of optimal motor synergies. This result provides further evidence that optimal motor synergies are useful motor primitives. Fourth, synergies with similar properties arise regardless if one uses an arm controlled by torques applied at the joints or an arm controlled by muscles, suggesting that synergies, when considered in "movement space," are more a reflection of task goals and constraints than of fine details of the underlying hardware.

Download PDF file | Close Window

2004

Aslin, R.N., Battaglia, P.W., and Jacobs, R.A. (2004) Depth-dependent contrast gain-control. Vision Research, 44, 685-693

Contrast adaptation that was limited to a small region of the peripheral retina was induced as observers viewed a multiple depth-plane textured surface. The small region undergoing contrast adaptation was present only in one depth-plane to determine whether contrast gain-control is depth-dependent. After adaptation, observers performed a contrast-matching task in both the adapted and a non-adapted depth-plane to measure the magnitude and spatial specificity of contrast adaptation. Results indicated that contrast adaptation was depth-dependent under full-cue (disparity, linear perspective, texture gradient) conditions; there was a highly significant change in contrast gain in the depth-plane of adaptation and no significant gain change in the unadapted depth-plane. A second experiment showed that under some monocular viewing conditions a similar change in contrast gain was present in the adapted depth-plane despite the absence of disparity information for depth. Two control experiments with no-depth displays showed that contrast adaptation can also be texture- and location-dependent, but the magnitude of these effects was significantly smaller than the depth-dependent effect. These results demonstrate that mechanisms of contrast adaptation are conditioned by 3-D and 2-D viewing contexts.

Download PDF file | Close Window

Battaglia, P.W., Jacobs, R.A., and Aslin, R.N. (2004) Depth-dependent blur adaptation. Vision Research, 44, 113-117

Variations in blur are present in retinal images of scenes containing objects at multiple depth planes. Here we examine whether neural representations of image blur can be recalibrated as a function of depth. Participants were exposed to textured images whose blur changed with depth in a novel manner. For one group of participants, image blur increased as the images moved closer; for the other group, blur increased as the images moved away. A comparison of post-test versus pre-test performances on a blur-matching task at near and far test positions revealed that both groups of participants showed significant experience-dependent recalibration of the relationship between depth and blur. These results demonstrate that blur adaptation is conditioned by 3D viewing contexts.

Download PDF file | Close Window

2003

Atkins, J.E., Jacobs, R.A., and Knill, D.C. (2003) Experience-dependent visual cue recalibration based on discrepancies between visual and haptic percepts. Vision Research, 43, 2603-2613

We studied the hypothesis that observers can recalibrate their visual percepts when visual and haptic (touch) cues are discordant and the haptic information is judged to be reliable. Using a novel visuo-haptic virtual reality environment, we conducted a set of experiments in which subjects interacted with scenes consisting of two fronto-parallel surfaces. Subjects judged the distance between the two surfaces based on two perceptual cues: a visual stereo cue obtained when viewing the scene binocularly and a haptic cue obtained when subjects grasped the two surfaces between their thumb and index fingers. Visual and haptic cues regarding the scene were manipulated independently so that they could either be consistent or inconsistent. Experiment 1 explored the effect of visuohaptic inconsistencies on depth-from-stereo estimates. Our findings suggest that when stereo and haptic cues are inconsistent, subjects recalibrate their interpretations of the visual stereo cue so that depth-from-stereo percepts are in greater agreement with depth-from-haptic percepts. In Experiment 2 the visuo-haptic discrepancy took a different form when the two surfaces were near the subject than when they were far from the subject. The results indicate that subjects recalibrated their interpretations of the stereo cue in a context-sensitive manner that depended on viewing distance, thereby making them more consistent with depth-from-haptic estimates at all viewing distances. Together these findings suggest that observers' visual and haptic percepts are tightly coupled in the sense that haptic percepts provide a standard to which visual percepts can be recalibrated when the visual percepts are deemed to be erroneous.

Download PDF file | Close Window

Battaglia, P.W., Jacobs, R.A, and Aslin, R.N. (2003) Bayesian integration of visual and auditory signals for spatial localization. Journal of the Optical Society of America A, 20, 1391-1397

Human observers localize events in the world by using sensory signals from multiple modalities. We evaluated two theories of spatial localization that predict how visual and auditory information are weighted when these signals specify different locations in space. According to one theory (visual capture), the signal that is typically most reliable dominates in a winner-take-all competition, whereas the other theory (maximum likelihood estimation) proposes that perceptual judgments are based on a weighted average of the sensory signals in proportion to each signal's relative reliability. Our results indicate that both theories are partially correct, in that relative signal reliability significantly altered judgments of spatial location, but these judgments were also characterized by an overall bias to rely on visual over auditory information. These results have important implications for the development of cue integration and for neural plasticity in the adult brain that enables humans to optimally integrate multimodal information.

Download PDF file | Close Window

Dominguez, M. and Jacobs, R.A. (2003) Developmental constraints aid the acquisition of binocular disparity sensitivities. Neural Computation, 15, 161-182

This article considers the hypothesis that systems learning aspects of visual perception may benefit from the use of suitably designed developmental progressions during training. We report the results of simulations in which four models were trained to detect binocular disparities in pairs of visual images. Three of the models were developmental models in the sense that the nature of their visual input changed during the course of training. These models received a relatively impoverished visual input early in training, and the quality of this input improved as training progressed. One model used a coarse-scale-to-multiscale developmental progression, another used a fine-scale-to-multiscale progression, and the third used a random progression. The final model was nondevelopmental in the sense that the nature of its input remained the same throughout the training period. The simulation results show that the two developmental models whose progressions were organized by spatial frequency content consistently outperformed the nondevelopmental and random developmental models. We speculate that the superior performance of these two models is due to two important features of their developmental progressions: (1) these models were exposed to visual inputs at a single scale early in training, and (2) the spatial scale of their inputs progressed in an orderly fashion from one scale to a neighboring scale during training. Simulation results consistent with these speculations are presented. We conclude that suitably designed developmental sequences can be useful to systems learning to detect binocular disparities. The idea that visual development can aid visual learning is a viable hypothesis in need of study.

Download PDF file | Close Window

Ivanchenko, V. and Jacobs, R.A. (2003) A developmental approach aids motor learning. Neural Computation, 15, 2051-2065

Bernstein (1967) suggested that people attempting to learn to perform a difficult motor task try to ameliorate the degrees-of-freedom problem through the use of a developmental progression. Early in training, people maintain a subset of their control parameters (e.g., joint positions) at constant settings and attempt to learn to perform the task by varying the values of the remaining parameters. With practice, people refine and improve this early-learned control strategy by also varying those parameters that were initially held constant. We evaluated Bernstein's proposed developmental progression using six neural network systems and found that a network whose training included developmental progressions of both its trajectory and its feedback gains outperformed all other systems. These progressions, however, yielded performance benefits only on motor tasks that were relatively difficult to learn. We conclude that development can indeed aid motor learning.

Download PDF file | Close Window

Jacobs, R.A. and Dominguez, M. (2003) Visual development and the acquisition of motion velocity sensitivities. Neural Computation, 15, 761-781

We consider the hypothesis that systems learning aspects of visual perception may benefit from the use of suitably designed developmental progressions during training. Four models were trained to estimate motion velocities in sequences of visual images. Three of the models were developmental models in the sense that the nature of their visual input changed during the course of training. These models received a relatively impoverished visual input early in training, and the quality of this input improved as training progressed. One model used a coarse-to-multiscale developmental progression (it received coarse-scale motion features early in training and finer-scale features were added to its input as training progressed), another model used a fine-to-multiscale progression, and the third model used a random progression. The final model was nondevelopmental in the sense that the nature of its input remained the same throughout the training period. The simulation results show that the coarse-to-multiscale model performed best. Hypotheses are offered to account for this model's superior performance, and simulation results evaluating these hypotheses are reported. We conclude that suitably designed developmental sequences can be useful to systems learning to estimate motion velocities. The idea that visual development can aid visual learning is a viable hypothesis in need of further study.

Download PDF file | Close Window

2002

Fine, I. and Jacobs, R.A. (2002) Comparing perceptual learning across tasks: A review. Journal of Vision, 2, 190-203

We compared perceptual learning in 16 psychophysical studies, ranging from low-level spatial frequency and orientation discrimination tasks to high-level object and face-recognition tasks. All studies examined learning over at least four sessions and were carried out foveally or using free fixation. Comparison of learning effects across this wide range of tasks demonstrates that the amount of learning varies widely between different tasks. A variety of factors seems to affect learning, including the number of perceptual dimensions relevant to the task, external noise, familiarity, and task complexity.

Download PDF file | Close Window

Jacobs, R.A. (2002) What determines visual cue reliability? Trends in Cognitive Sciences, 6, 345-350

Visual environments contain many cues to properties of an observed scene. To integrate information provided by multiple cues in an efficient manner, observers must assess the degree to which each cue provides reliable versus unreliable information. Two hypotheses are reviewed regarding how observers estimate cue reliabilities, namely that the estimated reliability of a cue is related to the ambiguity of the cue, and that people use correlations among cues to estimate cue reliabilities. Cue reliabilities are shown to be important both for cue combination and for aspects of visual learning.

Download PDF file | Close Window

Jacobs, R.A., Jiang, W., and Tanner, M.A. (2002) Factorial hidden Markov models and the generalized backfitting algorithm. Neural Computation, 14, 2415-2437

Previous researchers developed new learning architectures for sequential data by extending conventional hidden Markov models through the use of distributed state representations. Although exact inference and parameter estimation in these architectures is computationally intractable, Ghahramani and Jordan (1997) showed that approximate inference and parameter estimation in one such architecture, factorial hidden Markov models (FHMMs), is feasible in certain circumstances. However, the learning algorithm proposed by these investigators, based on variational techniques, is difficult to understand and implement and is limited to the study of real-valued data sets. This chapter proposes an alternative method for approximate inference and parameter estimation in FHMMs based on the perspective that FHMMs are a generalization of a well-known class of statistical models known as generalized additive models (GAMs; Hastie & Tibshirani, 1990). Using existing statistical techniques for GAMs as a guide, we have developed the generalized backfitting algorithm. This algorithm computes customized error signals for each hidden Markov chain of an FHMM and then trains each chain one at a time using conventional techniques from the hidden Markov models literature. Relative to previous perspectives on FHMMs, we believe that the viewpoint taken here has a number of advantages. First, it places FHMMs on firm statistical foundations by relating them to a class of models that are well studied in the statistics community, yet it generalizes this class of models in an interesting way. Second, it leads to an understanding of how FHMMs can be applied to many different types of time-series data, including Bernoulli and multinomial data, not just data that are real valued. Finally, it leads to an effective learning procedure for FHMMs that is easier to understand and easier to implement than existing learning procedures. Simulation results suggest that FHMMs trained with the generalized backfitting algorithm are a practical and powerful tool for analyzing sequential data.

Download PDF file | Close Window

Triesch, J., Ballard, D.H., and Jacobs, R.A. (2002) Fast temporal dynamics of visual cue integration. Perception, 31, 421-434

The integration of information from different sensors, cues, or modalities lies at the very heart of perception. We are studying adaptive phenomena in visual cue integration. To this end, we have designed a visual tracking task, where subjects track a target object among distractors and try to identify the target after an occlusion. Objects are defined by three different attributes (color, shape, size) which change randomly within a single trial. When the attributes differ in their reliability (two change frequently, one is stable), our results show that subjects dynamically adapt their processing. The results are consistent with the hypothesis that subjects rapidly re-weight the information provided by the different cues by emphasizing the information from the stable cue. This effect seems to be automatic, i.e. not requiring subjects' awareness of the differential reliabilities of the cues. The hypothesized re-weighting seems to take place in about 1 s. Our results suggest that cue integration can exhibit adaptive phenomena on a very fast time scale. We propose a probabilistic model with temporal dynamics that accounts for the observed effect.

Download PDF file | Close Window

2001

Atkins, J.E., Fiser, J., and Jacobs, R.A. (2001) Experience-dependent visual cue integration based on consistencies between visual and haptic percepts. Vision Research, 41, 449-461

We study the hypothesis that observers can use haptic percepts as a standard against which the relative reliabilities of visual cues can be judged, and that these reliabilities determine how observers combine depth information provided by these cues. Using a novel visuo-haptic virtual reality environment, subjects viewed and grasped virtual objects. In Experiment 1, subjects were trained under motion relevant conditions, during which haptic and visual motion cues were consistent whereas haptic and visual texture cues were uncorrelated, and texture relevant conditions, during which haptic and texture cues were consistent whereas haptic and motion cues were uncorrelated. Subjects relied more on the motion cue after motion relevant training than after texture relevant training, and more on the texture cue after texture relevant training than after motion relevant training. Experiment 2 studied whether or not subjects could adapt their visual cue combination strategies in a context-dependent manner based on context-dependent consistencies between haptic and visual cues. Subjects successfully learned two cue combination strategies in parallel, and correctly applied each strategy in its appropriate context. Experiment 3, which was similar to Experiment 1 except that it used a more naturalistic experimental task, yielded the same pattern of results as Experiment 1 indicating that the findings do not depend on the precise nature of the experimental task. Overall, the results suggest that observers can involuntarily compare visual and haptic percepts in order to evaluate the relative reliabilities of visual cues, and that these reliabilities determine how cues are combined during three-dimensional visual perception.

Download PDF file | Close Window

2000

Fine, I. and Jacobs, R.A. (2000) Perceptual learning for a pattern discrimination task. Vision Research, 40, 3209-3230

Our goal was to differentiate low and mid level perceptual learning. We used a complex grating discrimination task that required observers to combine information across wide ranges of spatial frequency and orientation. Stimuli were ‘wicker'-like textures containing two orthogonal signal components of 3 and 9 c/deg. Observers discriminated a 15% spatial frequency shift in these components. Stimuli also contained four noise components, separated from the signal components by at least 45° of orientation or ~2 octaves in spatial frequency. In Experiment 1 naive observers were trained for eight sessions with a four-alternative same-different forced choice judgment with feedback. Observers showed significant learning, thresholds dropped to 1/3 of their original value. In Experiment 2 we found that observers showed far less learning when the noise components were not present. Experiment 3 found, unlike many other studies, almost complete transfer of learning across orientation. The results of Experiments 2 and 3 suggest that, unlike many other perceptual learning studies, most learning in Experiment 1 occurs at mid to high levels of processing rather than within low level analyzers tuned for spatial frequency and orientation. Experiment 4 found that performance was more severely impaired by spatial frequency shifts in noise components of the same spatial frequency or orientation as the signal components (though there was significant variability between observers). This suggests that after training observers based their responses on mechanisms tuned for selective regions of Fourier space. Experiment 5 examined transfer of learning from a same-sign task (the two signal components both increased/decreased in spatial frequency) to an opposite-sign task (signal components shifted in opposite directions in frequency space). Transfer of learning from same-sign to opposite-sign tasks and vice versa was complete suggesting that observers combined information from the two signal components independently.

Download PDF file | Close Window

Meegan, D.V., Aslin, R.N., and Jacobs, R.A. (2000) Motor timing learned without motor training. Nature Neuroscience, 3, 860-862

(No abstract available.)

Download PDF file | Close Window

1999

Fine, I. and Jacobs, R.A. (1999) Modeling the combination of motion, stereo, and vergence angle cues to visual depth. Neural Computation, 11, 1297-1330

Three models of visual cue combination were simulated: a weak fusion model, a modified weak model, and a strong model. Their relative strengths and weaknesses are evaluated on the basis of their performances on the tasks of judging the depth and shape of an ellipse. The models differ in the amount of interaction that they permit among the cues of stereo, motion, and vergence angle. Results suggest that the constrained nonlinear interaction of the modified weak model allows better performance than either the linear interaction of the weak model or the unconstrained nonlinear interaction of the strong model. Further examination of the modified weak model revealed that its weighting of motion and stereo cues was dependent on the task, the viewing distance, and, to a lesser degree, the noise model. Although the dependencies were sensible from a computational viewpoint, they were sometimes inconsistent with psychophysical experimental data. In a second set of experiments, the modified weak model was given contradictory motion and stereo information. One cue was informative in the sense that it indicated an ellipse, while the other cue indicated a flat surface. The modified weak model rapidly reweighted its use of stereo and motion cues as a function of each cue's informativeness. Overall, the simulation results suggest that relative to the weak and strong models, the modified weak fusion model is a good candidate model of the combination of motion, stereo, and vergence angle cues, although the results also highlight areas in which this model needs modification or further elaboration.

Download PDF file | Close Window

Jacobs, R.A. (1999) Computational studies of the development of functionally specialized neural modules. Trends in Cognitive Sciences, 3, 31-38

This article reviews three hypotheses about the activity-dependent development of functionally specialized neural modules. These hypotheses state that: (i) a combination of structure-function correspondences plus the use of competition between neural modules leads to functional specializations; (ii) parcellation is due to a combination of neural selectionism, the idea that learning results from a stabilization of some neural connections and the elimination of others, and a locality constraint, which states that connections between nearby neurons are more easily stabilized than those between distant neurons; and (iii) a temporal and spatial modulation of plasticity can induce higher functional development in later-developing parts of the nervous system relative to earlier-developing parts. All three hypotheses have been implemented and evaluated in computational models. Due to limitations of current neuroscientific methodologies, computer simulation provides one of the only tools available for evaluating and refining our large-scale theories of the development of functionally specialized neural modules.

Download PDF file | Close Window

Jacobs, R.A. (1999) Optimal integration of texture and motion cues to depth. Vision Research, 39, 3621-3629

We report the results of a depth-matching experiment in which subjects were asked to adjust the height of an ellipse until it matched the depth of a simulated cylinder defined by texture and motion cues. On one-third of the trials the shape of the cylinder was primarily given by motion information, on one-third of the trials it was given by texture information, and on the remaining trials it was given by both sources of information. Two optimal cue combination models are described where optimality is defined in terms of Bayesian statistics. The parameter values of the models are set based on subjects' responses on trials when either the motion cue or the texture cue was informative. These models provide predictions of subjects' responses on trials when both cues were informative. The results indicate that one of the optimal models provides a good fit to the subjects' data, and the second model provides an exceptional fit. Because the predictions of the optimal models closely match the experimental data, we conclude that observers' cue combination strategies are indeed optimal, at least under the conditions studied here.

Download PDF file | Close Window

Jacobs, R.A. and Fine, I. (1999) Experience-dependent integration of texture and motion cues to depth. Vision Research, 39, 4062-4075

Previous investigators have shown that observers' visual cue combination strategies are remarkably flexible in the sense that these strategies adapt on the basis of the estimated reliabilities of the visual cues. However, these researchers have not addressed how observers acquire these estimated reliabilities. This article studies observers' abilities to learn cue combination strategies. Subjects made depth judgments about simulated cylinders whose shapes were indicated by motion and texture cues. Because the two cues could indicate different shapes, it was possible to design tasks in which one cue provided useful information for making depth judgments, whereas the other cue was irrelevant. The results of Experiment One suggest that observers' cue combination strategies are adaptable as a function of training; subjects adjusted their cue combination rules to use a cue more heavily when the cue was informative on a task versus when the cue was irrelevant. Experiment Two demonstrated that experience-dependent adaptation of cue combination rules is context-sensitive. On trials with presentations of short cylinders, one cue was informative, whereas on trials with presentations of tall cylinders, the other cue was informative. The results suggest that observers can learn multiple cue combination rules, and can learn to apply each rule in the appropriate context. Experiment Three demonstrated a possible limitation on the context-sensitivity of adaptation of cue combination rules. One cue was informative on trials with presentations of cylinders at a left oblique orientation, whereas the other cue was informative on trials with presentations of cylinders at a right oblique orientation. The results indicate that observers did not learn to use different cue combination rules in different contexts under these circumstances. These results are consistent with the hypothesis that observers' visual systems are biased to learn to perceive in the same way views of bilaterally symmetric objects that differ solely by a symmetry transformation. Taken in conjunction with the results of Experiment Two, this means that the visual learning mechanism underlying cue combination adaptation is biased such that some sets of statistics are more easily learned than others.

Download PDF file | Close Window

1997

Jacobs, R.A. (1997) Nature, nurture, and the development of functional specializations: A computational approach. Psychonomic Bulletin and Review, 4, 299-309

The roles assigned to nature and nurture in the acquisition of functional specializations have been modified in recent years due to increasing evidence that experience-dependent processes are more influential in determining a brain region's functional properties than was previously supposed. Consequently, one may study the developmental principles that play a role in the acquisition of functional specializations. This article studies the hypothesis that a combination of structure-function correspondences plus the use of competition between modules leads to functional specializations. This principle has been instantiated in a family of neural network architectures referred to as "mixtures-of-experts" architectures. These architectures are sensitive to structure-function relationships in the sense that they often learn to allocate to each task a network whose structure is well-matched to that task. The viewpoint advocated here represents a middle-ground between nativist and constructivist views of modularity.

Download PDF file | Close Window

Jacobs, R.A. (1997) Bias/Variance analyses of mixtures-of-experts architectures. Neural Computation, 9, 369-383

This article investigates the bias and variance of mixtures-of-experts (ME) architectures. The variance of an ME architecture can be expressed as the sum of two terms: the first term is related to the variances of the expert networks that comprise the architecture; the second term is related to the expert networks' covariances. One goal of this article is to study and quantify a number of properties of ME architectures via the metrics of bias and variance. A second goal is to clarify the relationships between this class of systems and other systems that have recently been proposed. It is shown that, in contrast to systems that produce unbiased experts whose estimation errors are uncorrelated, ME architectures produce biased experts whose estimates are negatively correlated.

Close Window

1996

Peng, F., Jacobs, R.A., and Tanner, M.A. (1996) Bayesian inference in mixtures-of-experts and hierarchical mixtures-of-experts models with an application to speech recognition. Journal of the American Statistical Association, 91, 953-960

Machine classification of acoustic waveforms as speech events is often difficult due to context-dependencies. A vowel recognition task with multiple speakers is studied in this paper via the use of a class of modular and hierarchical systems referred to as mixtures-of-experts and hierarchical mixtures-of-experts models. The statistical model underlying the systems is a mixture model in which both the mixture coefficients and the mixture components are generalized linear models. A full Bayesian approach is used as a basis of inference and prediction. Computations are performed using Markov chain Monte Carlo methods. A key benefit of this approach is the ability to obtain a sample from the posterior distribution of any functional of the parameters of the given model. In this way, more information is obtained than provided by a point estimate. Also avoided is the need to rely on a normal approximation to the posterior as the basis of inference. This is particularly important in cases where the posterior is skewed or multimodal. Comparisons between a hierarchical mixtures-of-experts model and other pattern classification systems on the vowel recognition task are reported. The results indicate that this model showed good classification performance, and also gave the additional benefit of providing for the opportunity to assess the degree of certainty of the model in its classification predictions.

Download PDF file | Close Window

1994

Jacobs, R.A. and Kosslyn, S.M. (1994) Encoding shape and spatial relations: The role of receptive field size in coordinating complementary representations. Cognitive Science, 18, 361-386

An effective functional architecture facilitates interactions among subsystems that are often used together. Computer simulations showed that differences in receptive field sizes can promote such organization. When input was filtered through relatively small nonoverlapping receptive fields, artificial neural networks learned to categorize shapes relatively quickly; in contrast, when input was filtered through relatively large overlapping receptive fields, networks learned to encode specific shape exemplars or metric spatial relations relatively quickly. Moreover, when the receptive field sizes were allowed to adapt during learning, networks developed smaller receptive fields when they were trained to categorize shapes or spatial relations, and developed larger receptive fields when they were trained to encode specific exemplars or metric distances. In addition, when pairs of networks were constrained to use input from the same type of receptive fields, networks learned a task faster when they were paired with networks that were trained to perform a compatible type of task. Finally, using a novel modular architecture, networks were not pre-assigned a task, but rather competed to perform the different tasks. Networks with small nonoverlapping receptive fields tended to win the competition for categorical tasks whereas networks with large overlapping receptive fields tended to win the competition for exemplar/metric tasks.

Download PDF file | Close Window

Jordan, M.I. and Jacobs, R.A. (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6, 181-214

We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation-Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.

Download PDF file | Close Window

1991

Jacobs, R.A., Jordan, M.I., and Barto, A.G. (1991) Task decomposition through competition in a modular connectionist architecture: The what and where vision tasks. Cognitive Science, 15, 219-250

A novel modular connectionist architecture is presented in which the networks composing the architecture compete to learn the training patterns. An outcome of the competition is that different networks learn different training patterns and, thus, learn to compute different functions. The architecture performs task decomposition in the sense that it learns to partition a task into two or more functionally independent tasks and allocates distinct networks to learn each task. In addition, the architecture tends to allocate to each task the network whose topology is most appropriate to that task. The architecture's performance on "what" and "where" vision tasks is presented and compared with the performance of two multilayer networks. Finally, it is noted that function decomposition is an underconstrained problem and, thus, different modular architectures may decompose a function in different ways. A desirable decomposition can be achieved if the architecture is suitably restricted in the types of functions that it can compute. Finding appropriate restrictions is possible through the application of domain knowledge. A strength of the modular architecture is that its structure is well suited for incorporating domain knowledge.

Download PDF file | Close Window

Jacobs, R.A., Jordan, M.I., Nowlan, S.J., and Hinton, G.E. (1991) Adaptive mixtures of local experts. Neural Computation, 3, 79-87

We present a new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases. The new procedure can be viewed either as a modular version of a multilayer supervised network, or as an associative version of competitive learning. It therefore provides a new link between these two apparently different approaches. We demonstrate that the learning procedure divides up a vowel discrimination task into appropriate subtasks, each of which can be solved by a very simple expert network.

Download PDF file | Close Window

1988

Jacobs, R.A. (1988) Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295-307

While there exist many techniques for finding the parameters that minimize an error function, only those methods that solely perform local computations are used in connectionist networks. The most popular learning algorithm for connectionist networks is the back-propagation procedure, which can be used to update the weights by the method of steepest descent. In this paper, we examine steepest descent and analyze why it can be slow to converge. We then propose four heuristics for achieving faster rates of convergence while adhering to the locality constraint. These heuristics suggest that every weight of a network should be given its own learning rate and that these rates should be allowed to vary over time. Additionally, the heuristics suggest how the learning rates should be adjusted. Two implementations of these heuristics, namely momentum and an algorithm called the delta-bar-delta rule, are studied and simulation results are presented.

Download PDF file | Close Window

top