Introduction:

The pivotal advancement in the ability of computers to understand language and develop functional world models has profoundly reset the landscape in computing. This will inevitably beget far-reaching societal impact, no small part of which is likely to be the fundamental rethinking of interaction between humans and machines.

At the heart of this advance is the transformer architecture and the resulting large language models (LLMs) and large vision models (LVMs) built with it. While language understanding in machines has matured, their ability to plan and reason, as well as sense and understand the world needs to continue evolving to a point where machines can be truly transformative to the human endeavor. Fortunately, an emerging class of systems, called AI agents, based on LLMs and LVMs are beginning to show promise here. At Emergence AI, our mission is to advance this frontier of AI and build systems that reach a level of intelligence that truly unlocks the potential of AI to be beneficial to humanity.

LLM-Based Agents:

In venturing to understand why the new class of LLM-based AI agents is so promising, it might be illustrative to first define what they are. 

Agent anatomy.

Agents (their anatomy pictured above) are building blocks which can communicate with each other and with humans in natural language, can control tools, and can perform actions in the digital or physical world.

Agents have some functional capacity to plan, reason, and remember. They are the building blocks upon which scalable, intelligent systems can be built. Such systems, comprised of one or more agents, can profoundly reshape our ideas of what computers can do for us. These systems can transform many aspects of human interaction with computers, with profound implications for how we use computers, how we interact with them, and what they can do for us. These could range from asking systems of agents in natural language to perform a variety of actions such as book travel, plan vacation or purchase items, to asking systems to process claims and work across various applications on a smartphone or laptop and complete complex tasks for people, all with simple human commands issued in natural language.

There has been a lot of research in this space over the last couple of decades. However, the advent of LLM- and LVM-based AI agents promises to advance this field rapidly, improving the capabilities of agents in all the aspects mentioned above. 

We believe that in order for agents to be used to build truly intelligent, scalable systems, they must meet a specific benchmark in the characteristics they possess and the ways they organize together (ref 1). 

The Definition of an Agent

An agent has (ref 1):

1) Autonomy: it is a self-operating, interactive entity with its own state, behavior, and decision-making capabilities.

2) Reactivity and Proactivity: it is both reactive (meaning it can perceive its environment and respond to changes) and proactive (meaning it can take initiative in alignment with its goals).

3) Beliefs, Desires, and Intentions (BDI): it is characterized by its beliefs (information about the world), desires (goals or objectives), and intentions (plans of action).

4) Social Ability & Communication: it has a communication mechanism and can interact with other agents or entities in its environment. This interaction can be highly complex as it may involve negotiation, coordination, and cooperation.

5) Self-Improvement: it has an innate desire to maximize its utility and hence self-improve itself on the task that it was created for.

6) Security: it protects itself from being compromised or destroyed. It can be trusted to do no harm to other agents nor the environment it is operating in. Agents in a system will adhere to an organizational framework called agent oriented programming (AOP). The comprehensive tenets of this framework can be found here (ref 2).

Our AOP framework is inspired by the pillars of object-oriented programming (OOP, ref 1) which played a significant role in the development and evolution of several popular programming languages like C++, Java, Python, and more. Just as these languages are the bedrocks of modern software development, we hope that AOP will enable the development of new classes of agents, systems, and agent frameworks. 

We intend to advance the development of autonomous agents as an open-source project so that the community at large can benefit from and contribute to the project. We believe this is the safest and fastest way to advance the field of AI agents. 

Open-Source Releases

Our aim is to unify the technical community around a singular definition of an agent and a singular framework for agent systems.

Emergence’s first set of releases into the open source community include two web agents. The first allows computers to translate natural language commands into actions on web pages (like clicking on links, playing videos, or searching websites). The second is an example of a multi-tool agent system, built on LLMs, in this case for visual math pedagogy. We are working on a general multi-tool agent framework, whose purpose is to enable tools and LLMs to be integrated into agent systems following our AOP framework.

We are also working on releasing the first version of our self-improvement protocol for agents (ref 2 & 3). We believe that such self-improvement functionalities within agents could eventually result in the emergence of truly autonomous agents and systems. 

Our longer-term roadmap includes (a) the development of agents powered by LLMs and LVMs that will sit natively in devices like smartphones and laptops, and (b) the creation of agents that will navigate the web and perform a variety of tasks for consumers and enterprises. Of particular interest to us are a variety of back-office document-processing tasks, including claims analysis and domain-specific summarization.

Task and Domain Specific LLMs

LLMs and LVMs act as the “brain” of the AI agent. Emergence will power our agents with our own fine-tuned LLMs and LVMs based primarily on open-source pre-trained models such as Mistral and LLaMA-2. The focus on fine-tuning pre-trained models, using them to power agents and other solutions efficiently, safely, and cost-effectively, is another core pillar of Emergence. 

Our domain-specific models can be used both by agent-developers and application-makers as they scale their AI solutions. Each model is trained on the data of its specialized domain and will perform a variety of natural language processing tasks such as question-answering based on custom content and data, rubric-based summarization, video and image recommendation, transcript analysis, or assessment generation.

Voice Computing Platform

Our voice computing platform can accelerate the provisioning of voice interfaces for a variety of enterprises for controlling domain-specific tools and for the contextual retrieval of domain specific information. The platform currently powers AI assistants available on both laptops and front-of-the-room panels which controls various tools and technologies and provide a voice-based chat interface to the world of information relevant to the education domain.

Historically, there have been two main barriers to the emergence of voice as an interface to computing, namely the inability of humans to successfully communicate with computers completely naturally and the inability of computers to perform more than just rudimentary actions. Thanks to the recent advances in AI, these two barriers have recently been removed. Thus, a fascinating offshoot of the advance of AI is that voice as an interface has the potential to finally mature, decades after it was envisioned, and Emergence has the existing building blocks to deliver on this promise.  

Intelligent Systems

The final set of offerings from Emergence is a set of research, development and consulting services that allow end-customers to develop intelligent systems based on a composition of our offered agents and LLMs. Through our team’s decades long experiences in building some of the most widely scaled, enterprise-grade AI systems, we can help integrate into an enterprise’s ecosystem the self-operating solutions to given problems, automating relevant processes with the most apt in-house or third-party tools. 

Coda

There are several themes that we have not highlighted in this brief introductory document. These include our focus on:

  1. memory systems for agents, which are key to systems that learn and improve over time,

  2. the implications of systems of self-improving agents with well-defined goals, 

  3. our robust definitions of and commitment to safety,

  4. and our insight into the fascinating parallels between cognitive systems of mind and artificial systems of agents. 

We intend to develop many of these themes more fully, especially through our ongoing and future collaborations with leading academics. 

Ultimately, Emergence’s mission is to advance AI and create useful, intelligent machines which maximally automate human interactions with computers. 

References

1) The Anatomy of Agents

2) Self Improving Agents

3) Building Narrow Self Improving Agents