What is a large language model (LLM)? – TechTarget Definition – TechTarget

You forgot to provide an Email Address.
This email address doesn’t appear to be valid.
This email address is already registered. Please log in.
You have exceeded the maximum character limit.
Please provide a Corporate Email Address.
Please check the box if you want to proceed.
Please check the box if you want to proceed.
By submitting my Email address I confirm that I have read and accepted the Terms of Use and Declaration of Consent.
A large language model (LLM) is a type of artificial intelligence (AI) algorithm that uses deep learning techniques and massively large data sets to understand, summarize, generate and predict new content. The term generative AI also is closely connected with LLMs, which are, in fact, a type of generative AI that has been specifically architected to help generate text-based content.
Over millennia, humans developed spoken languages to communicate. Language is at the core of all forms of human and technological communications; it provides the words, semantics and grammar needed to convey ideas and concepts. In the AI world, a language model serves a similar purpose, providing a basis to communicate and generate new concepts.
The first AI language models trace their roots to the earliest days of AI. The Eliza language model debuted in 1966 at MIT and is one of the earliest examples of an AI language model. All language models are first trained on a set of data, and then they make use of various techniques to infer relationships and then generate new content based on the trained data. Language models are commonly used in natural language processing (NLP) applications where a user inputs a query in natural language to generate a result.
An LLM is the evolution of the language model concept in AI that dramatically expands the data used for training and inference. In turn, it provides a massive increase in the capabilities of the AI model. While there isn’t a universally accepted figure for how large the data set for training needs to be, an LLM typically has at least one billion or more parameters. Parameters are a machine learning term for the variables present in the model on which it was trained that can be used to infer new content.
This article is part of
Modern LLMs emerged in 2017 and use transformer models, which are neural networks commonly referred to as transformers. With a large number of parameters and the transformer model, LLMs are able to understand and generate accurate responses rapidly, which makes the AI technology broadly applicable across many different domains.
Some LLMs are referred to as foundation models, a term coined by the Stanford Institute for Human-Centered Artificial Intelligence in 2021. A foundation model is so large and impactful that it serves as the foundation for further optimizations and specific use cases.
As AI continues to grow, its place in the business setting becomes increasingly dominant. This is shown through the use of LLMs as well as machine learning tools. In the process of composing and applying machine learning models, research advises that simplicity and consistency should be among the main goals. Identifying the issues that must be solved is also essential, as is comprehending historical data and ensuring accuracy.
The benefits associated with machine learning are often grouped into four categories: efficiency, effectiveness, experience and business evolution. As these continue to emerge, businesses invest in this technology.
LLMs take a complex approach that involves multiple components.
At the foundational layer, an LLM needs to be trained on a large volume — sometimes referred to as a corpus — of data that is typically petabytes in size. The training can take multiple steps, usually starting with an unsupervised learning approach. In that approach, the model is trained on unstructured data and unlabeled data. The benefit of training on unlabeled data is that there is often vastly more data available. At this stage, the model begins to derive relationships between different words and concepts.
The next step for some LLMs is training and fine-tuning with a form of self-supervised learning. Here, some data labeling has occurred, assisting the model to more accurately identify different concepts.
Next, the LLM undertakes deep learning as it goes through the transformer neural network process. The transformer model architecture enables the LLM to understand and recognize the relationships and connections between words and concepts using a self-attention mechanism. That mechanism is able to assign a score, commonly referred to as a weight, to a given item (called a token) in order to determine the relationship.
Once an LLM has been trained, a base exists on which the AI can be used for practical purposes. By querying the LLM with a prompt, the AI model inference can generate a response, which could be an answer to a question, newly generated text, summarized text or a sentiment analysis report.
LLMs have become increasingly popular because they have broad applicability for a range of NLP tasks, including the following:
Among the most common uses for conversational AI is through a chatbot, which can exist in any number of different forms where a user interacts in a query-and-response model. The most widely used LLM-based AI chatbot is ChatGPT, which is developed by OpenAI. ChatGPT currently is based on the GPT-3.5 model, although paying subscribers can use the newer GPT-4 LLM.
There are numerous advantages that LLMs provide to organizations and users:
While there are many advantages to using LLMs, there are also several challenges and limitations:
There is an evolving set of terms to describe the different types of large language models. Among the common types are the following:
Generative AI challenges that businesses should consider
Generative AI ethics: 8 biggest concerns
Generative AI landscape: Potential future trends
History of generative AI innovations spans 9 decades
How to detect AI-generated content
The future of LLMs is still being written by the humans who are developing the technology, though there could be a future in which the LLMs write themselves, too. The next generation of LLMs will not likely be artificial general intelligence or sentient in any sense of the word, but they will continuously improve and get “smarter.”
LLMs will also continue to expand in terms of the business applications they can handle. Their ability to translate content across different contexts will grow further, likely making them more usable by business users with different levels of technical expertise..
LLMs will continue to be trained on ever larger sets of data, and that data will increasingly be better filtered for accuracy and potential bias, partly through the addition of fact-checking capabilities. It’s also likely that LLMs of the future will do a better job than the current generation when it comes to providing attribution and better explanations for how a given result was generated.
Enabling more accurate information through domain-specific LLMs developed for individual industries or functions is another possible direction for the future of large language models. Expanded use of techniques such as reinforcement learning from human feedback, which OpenAI uses to train ChatGPT, could help improve the accuracy of LLMs, too. There’s also a class of LLMs based on the concept known as retrieval-augmented generation — including Google’s Realm (short for Retrieval-Augmented Language Model) — that will enable training and inference on a very specific corpus of data, much like how a user today can specifically search content on a single site.
There’s also ongoing work to optimize the overall size and training time required for LLMs, including development of Meta’s Llama model. Llama 2, which was released in July 2023, has less than half the parameters than GPT-3 has and a fraction of the number GPT-4 contains, though its backers claim it can be more accurate.
On the other hand, the use of large language models could drive new instances of shadow IT in organizations. CIOs will need to implement usage guardrails and provide training to avoid data privacy problems and other issues. LLMs could also create new cybersecurity challenges by enabling attackers to write more persuasive and realistic phishing emails or other malicious communications.
Nonetheless, the future of LLMs likely will remain bright as the technology continues to evolve in ways that help improve human productivity.
Technology writer George Lawton contributed to this article.
A virtual agent — sometimes called an intelligent virtual agent, virtual rep or chatbot — is a software program that uses scripted rules and, increasingly, artificial intelligence (AI) applications to provide automated service or guidance to humans.
Port address translation (PAT) is a type of network address translation (NAT) that maps a network’s private internal IPv4 …
‘Network fabric’ is a general term used to describe underlying data network infrastructure as a whole.
Loose coupling is an approach to interconnecting the components in a system, network or software application so that those …
Triple extortion ransomware is a type of ransomware attack where a cybercriminal extorts their victim multiple times, namely by …
Risk avoidance is the elimination of hazards, activities and exposures that can negatively affect an organization and its assets.
Risk management is the process of identifying, assessing and controlling threats to an organization’s capital, earnings and …
The sharing economy, also known as collaborative consumption or peer-to-peer-based sharing, is a concept that highlights the …
A steering committee comprises a group of high-ranking IT professionals who provide guidance and strategic direction to an …
A learning management system is a software application or web-based technology used to plan, implement and assess a specific …
Gamification is a strategy that integrates entertaining and immersive gaming elements into nongame contexts to enhance engagement…
People analytics, also known as human resources (HR) analytics and talent analytics, is the use of data analysis on candidate and…
Reskilling is the process of teaching an employee new skills to improve proficiency in their current job or move into an advanced…
Omnichannel — also spelled omni-channel — is an approach to sales, marketing and customer support that seeks to provide …
Mindshare, also known as share of mind, is an approach to marketing that involves attempting to make a company, brand or product …
A virtual agent — sometimes called an intelligent virtual agent, virtual rep or chatbot — is a software program that uses …
All Rights Reserved, Copyright 1999 – 2023, TechTarget

Privacy Policy
Cookie Preferences
Do Not Sell or Share My Personal Information


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top