In today’s world, we are immersed in an era of continuous technological advances, where artificial intelligence has played a key role. This rapid development of innovative technologies is redefining the way we live, work and interact.
Thanks to this, virtual agents are reaching unprecedented levels; they are able to recognize and analyze speech, understand our intentions, adapt to our preferences and provide us with relevant and contextually appropriate responses.
Before we learn how we got here, let’s address a few basic questions.
What are virtual agents?
A virtual agent is a programme designed to interact with humans in a similar way to a real person. That is, it is able to carry on a conversation.
The presence of virtual agents in our lives is becoming more and more common. We find them in different devices and services, which demonstrates the growing importance of human-machine interaction and the advancement of technology in this field.
For example, virtual assistants such as Apple’s Siri, Amazon’s Alexa and Google Assistant are present in our mobile devices, smart speakers and other connected devices. These virtual agents allow us to perform everyday tasks simply by voice commands, such as asking questions, setting reminders, playing music, searching the internet and controlling smart home devices.
In addition to personal virtual assistants, virtual agents have become an integral part of many companies’ customer services. When we call a help desk, it is common to first interact with a virtual agent before being transferred to a human agent. These virtual agents are programmed to understand and answer common questions, provide relevant information and, in some cases, solve problems efficiently.
The presence of virtual agents on websites not only improves the user experience, but can also help companies save costs by automating certain customer interactions.
To understand how virtual agents work and what their components are, it is important to consider the following concepts:
Artificial intelligence (AI):
- Artificial intelligence (AI): the combination of algorithms and programmes designed to enable machines to perform tasks that previously could only be done by humans. This involves skills such as reasoning, problem solving, pattern recognition, learning and decision making.
- Machine Learning: a branch of AI that focuses on enabling machines to learn and improve automatically through training. This means that they can learn to perform tasks without needing to be programmed specifically for them. For example, an animal classifier can be trained to recognise different species. In this case, the classifier is not taught to distinguish the animals, but only learns by looking at many images and finding patterns for itself.
- Deep Learning: an area of machine learning that uses models inspired by the functioning of neurons in the human brain. These models are able to learn complex representations of data or hidden patterns, which makes them effective in more complicated tasks.
Natural Language Processing (NLP)
Natural language processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and human language. Its main goal is to equip machines with the ability to process, understand and interpret words and text, enabling more natural and effective communication between humans and machines.
NLP seeks to overcome the human language barrier, an intrinsic human ability, and enable machines to understand and use language in a similar way to the way we do. This involves not only recognising words and phrases, but also capturing the meaning, context and nuances of human language, including ironies, idioms, jokes and various languages.
In the field of machine learning, NLP plays a key role when working with text data. NLP techniques are used to train models that can understand and process human language in an automated way.
The use of NLP techniques in machine learning allows models to learn to perform human language-related tasks without the need to be programmed specifically for each use case. Machine learning algorithms can analyse large volumes of text data and extract relevant patterns and features that allow them to make inferences and decisions based on human language.
Importantly, as natural language processing tasks become more complex, such as machine translation, text summarisation, question answering or generative conversation, the use of NLP is combined with other branches of artificial intelligence, such as deep learning, to achieve more accurate and effective results.
Language representation is fundamental for computers to be able to understand and process human language effectively. In this sense, embeddings play a fundamental role in providing a numerical vector representation of natural language.
Embeddings are numeric vectors that are used to represent words, sentences or even whole texts in a vector space. These vectors contain information about the meaning and context of words, allowing the semantic similarity between words to be measured.
The idea behind these vectors is to capture the relationships and associations between words using machine learning techniques. By training a language model with large amounts of text, the model learns to assign numerical vectors to each word so that words that share semantic similarities are represented by nearby vectors in vector space.
One of the advantages of embeddings is that they capture semantic and contextual relationships, meaning that words with similar meanings or that appear in similar contexts are represented by nearby vectors in vector space. For example, embeddings can capture the similarity between the words “dog” and “canine” because they share a similar meaning. Similarly, words that often appear in similar contexts, such as “cat” and “mouse”, may also have close vectors due to their contextual association.
In addition to capturing semantic and contextual similarities, they can also perform arithmetic operations in vector space to infer relationships between words. For example, if we subtract the vector for the word “king” from the vector for the word “man” and add the vector for the word “woman”, the result is a vector that resembles the vector for the word “queen”. This demonstrates that embeddings can capture analogies and relationships between words through simple vector operations.
Natural language processing (NLP) models can benefit from the use of pre-trained embeddings to initialize their word representation layers. These are numerical vectors that have been trained using large amounts of text in a large language task, such as language modelling. By using these pre-trained embeddings, NLP models can leverage previously acquired linguistic knowledge and improve their performance on specific tasks.
One of the main advantages of using these is that they help address the challenge of data sparsity in natural language processing tasks. Training language models from scratch requires large amounts of annotated text data, which can be costly and time-consuming. In contrast, by using pre-trained embeddings, NLP models can benefit from the knowledge gained from large datasets without having to go through the entire training process.
Moreover, they contain semantic and contextual information about the words, capturing relationships and associations between them. By initialising word representation layers with these embeddings, NLP models can start from a solid foundation and have a richer understanding of the language from the outset. This can result in better performance on tasks such as sentiment analysis, text classification, information extraction, among others.
When working with embeddings in natural language processing or other machine learning tasks, it is common to have a large number of embeddings, each with hundreds or thousands of dimensions. To facilitate storage and efficient handling, vector databases are used.
Vector databases are systems specifically designed to store and perform operations on numerical vectors. These databases offer several important advantages when it comes to working with large volumes of embeddings.
One of the main advantages is scalability. In the field of machine learning and artificial intelligence, the size of the dataset tends to grow as more information is collected. By using vector databases to store embeddings, they can be added or removed without having to make changes to the rest of the stored data. This provides flexibility and makes it possible to handle constantly growing datasets efficiently.
In addition, vector databases offer faster processing of large volumes of data. By leveraging the indexing and optimised search capabilities of these databases, it is possible to efficiently query and retrieve information, even on massive datasets. This is especially beneficial when working with applications that require real-time processing, such as real-time information search or retrieval of similar documents.
One of the most prominent advantages of indexing vectors in databases is the ability to perform semantic searches.
Semantic search is a technique used to find embeddings with similar meaning in a vector space. Since they represent the meaning and context of words or texts, we can take advantage of this representation to measure the semantic similarity between them. In the context of embeddings, measuring semantic similarity translates into measuring distances between vectors.
In a vector space, such as the one generated by embeddings, we can calculate distances between vectors using different measures, such as the Euclidean distance or the cosine distance. These measures allow us to quantify the degree of similarity between two embeddings based on the proximity of their vectors in space.
Let’s imagine that we have a new embedding that is not present in our database. Using semantic search, we can find embeddings stored in the database that have the most similar meaning to this new one. To do this, we compare the vector of the new embedding with the vectors of the existing embeddings, and calculate the distance between them.
By searching for the most similar meaning, what we are really doing is finding those embeddings whose vectors are closest to the vector of the new one in vector space. The smaller the distance between two vectors, the greater the semantic similarity between the embeddings they represent.
This semantic search capability is extremely useful in a variety of applications. For example, in content recommendation systems, we can use it to find elements similar to a specific one, which allows us to offer personalised recommendations based on the meaning and context of the embeddings. It can also be applied in real-time information search, where documents or resources related to a specific topic can be found.
Applications and use cases
In the context of recommender systems, semantic search plays a key role in providing personalised recommendations to users. Take Spotify’s and Netflix’s recommendation systems, for example. These systems analyse a user’s listening or viewing history and use this information to suggest similar songs or movies that might be of interest to the user.
To do this, the recommender system transforms the song just listened to or the movie just watched into an embedding using natural language processing techniques and language models. It then performs a semantic search in a vector database containing embeddings of other songs or movies. By finding the embeddings most similar to the reference song or movie, the system can recommend similar songs or movies to the user.
It is important to note that recommender systems are more complex and use various metrics and techniques to improve the quality of recommendations, such as collaboration between users, content-based filtering and reinforcement learning. However, semantic search through embeddings is one of the key techniques used in these systems.
Semantic search also finds applications in question and answer systems, where the goal is to provide relevant and accurate answers to users’ questions. In this case, the vector database would contain embeddings of information related to the topic to be asked about.
When a user poses a question, the Q&A system transforms the question into an embedding using natural language processing techniques and language models. It then performs a semantic search in the vector database to find the embeddings closest to the question.
The results or embeddings obtained represent the context that can help answer the user’s question. Depending on the complexity of the question and answer system, additional techniques can be applied, such as the extraction of relevant information from the documents associated to the embeddings found, the generation of answers or the evaluation of the quality of the answers obtained.
Generative models are a type of deep learning model that are trained on large amounts of data and have the ability to generate new examples or samples from an input. Unlike other models that choose one of the existing samples, generative models have the ability to generate new samples with slight variations.
These generative models are trained using deep learning techniques, they learn to capture the features and patterns of the training data, so that they can generate new examples that resemble the original data.
A common example of the application of generative models is the generation of realistic images. These models are trained with a large number of images, such as hundreds of millions of photos, and then used to generate new images from an input, which can be a textual description.
For example, the model can be provided with a written description of a desired photo and the generative model is able to generate an image that matches that description. The generated image is not identical to any specific image in the training set, but is a new image created by the model, but conforms to the features and styles learned during training.
Generative language models are a type of model used to generate coherent and meaningful sequences of text. These models capture the relationship and patterns of language by training on large amounts of text data, such as words, sentences, conversations, questions and answers, among others.
These models rely on deep learning architectures, such as recurrent neural networks (RNNs) or transformers, to learn the structure and rules of language. They are trained using optimisation techniques, such as supervised learning or reinforcement learning, depending on the purpose of the model.
Applications of language models
Language models have different purposes and are trained according to what you want to achieve with them. Some examples of common purposes are:
- Word prediction or sentence completion: These models are able to generate words or complete sentences based on the previous context. They can predict the next word in a sentence or generate coherent and continuous text.
- Machine translation: Language models are also used in machine translation, where they are trained to generate text in a target language from text in a source language.
- Text summarisation: These models can generate concise and coherent summaries of longer texts, capturing the main ideas and eliminating redundant information.
- Question answering: Language models are also employed in question-answering systems, where they are trained to generate relevant and accurate answers based on a given question.
- Conversational interaction: Some language models are designed to have more natural conversational interactions. They can conduct dialogues, answer questions contextually and maintain a coherent conversation with users. To train language models, large text datasets are used that span a wide variety of language domains and styles.
These datasets can include books, articles, web pages, conversations, social networks and more. The model learns to capture the structure, context and linguistic patterns present in this data. The model learns to capture the structure, context and linguistic patterns present in this data.
After delving into more technical issues, we can say that virtual agents are an artificial intelligence system based on natural language processing (NLP) and large language models (LLM) that is trained to be able to hold a conversation in real time and provide coherent responses based on the knowledge it has acquired during its training.
Langchain is a development environment designed to exploit the full potential of language models in the creation of applications. Its main goal is to allow the combination of different language models and tools, as well as connecting them to external data sources and other tools, in order to enhance the power and usefulness of language models in various applications.
Langchain also offers the ability to connect to external data sources, which extends the scope and quality of information available to language models. This allows Langchain-based applications to access databases, APIs, websites, document repositories or other relevant data sources to enrich content generation or question answering.
Ultimately, virtual agents are the result of the use of advanced technologies such as embeddings, NLP, generative models and language models, among others.
Thanks to the development of these technologies, they have been widely explored in a variety of use cases and applications, and there is still much to be discovered and developed. These virtual companions have found utility in a variety of fields, such as customer service, home care, healthcare, education, machine translation, content generation and much more. Their versatility and adaptability allow them to be deployed in a wide range of environments and play multiple roles. So we can say that, thanks to their personalised assistance capabilities, virtual agents provide us with a more efficient and satisfying experience in our daily lives, playing an important role in our increasingly digitised world.