Is Attention (Really) All You Need?

February 25, 2025by SineWave Ventures

The seminal paper for modern AI, “Attention Is All You Need” (2017) by Vaswani et al., introduced the Transformer model as an industry standard for processing enormous quantities of data. The paper demonstrated how “attention,” coupled with efficient use of memory and parallelization techniques, could enable machine translation and text generation. Any of us who have used a Generative AI chatbot are the beneficiaries of the transformer approach.

But at what cost?

Transformer models, particularly as seen in the development of Large Language Models, sparked an AI race fueled by ever-larger training datasets. The common belief in AI is that access to and effective use of data are the keys to competitive advantage. In practice, this means companies are continuously seeking more training data, despite concerns about sourcing, bias, security and privacy, and ultimately, the environmental costs of processing all of it. Notwithstanding the efforts by some companies to address these data concerns – by exploring smaller models, implementing retrieval-based approaches, or creating more efficient architectures – the current class of AI solutions are fundamentally statistical and computationally intensive. As a result, the insatiable appetite for more training data is only increasing.

Is there an alternative?

Cognitive computing is one option. These solutions represent a class of technology that seek to mimic functions of the brain. Humans don’t require extensive exposure to data before forming conclusions or making decisions. We naturally tune out redundant and easily predictable inputs, focusing on the novel or unexpected information needed for reliable decision-making and action-taking. In other words, given our current context and (world) model, we make “predictions” that determine how to direct not only our “attention,” but also our “inattention.”

Cognitive computing systems can replicate this “attention-inattention” mechanism through predictive modeling, anomaly detection, and filtering/selection of input data. Unlike those AI solutions that require extensive training on large datasets, cognitive computing solutions are designed to learn efficiently and incrementally using only the most informative and relevant data. By leveraging predictive filtering, these systems enhance efficiency, reduce computational load, adapt to novel domains with a minimum number of examples, and ensure more relevant data processing.

We see a future AI landscape that will favor companies that can deliver intelligent, adaptable solutions without the burden of excessive data requirements. Startups that embrace cognitive-inspired learning techniques, informed data ingestion strategies, and efficient architectures will be well-positioned to compete.

Is Attention (Really) All You Need?

Our Company

Connect with us