The AI vocabulary legal professionals actually need

49 terms, defined in plain English with the legal-practice angle included: what the word means, why it shows up in vendor pitches and ethics opinions, and what it implies for confidentiality, verification, or governance. Free, citable, updated as the language moves.

Core concepts

Agentic AI

AI designed to pursue goals through multi-step actions—planning, using software tools, and adjusting course—with limited human direction, rather than answering a single prompt. The umbrella term for systems built around AI agents.

AI agent

A software system that uses an AI model to make decisions and take actions toward a goal—running searches, filling forms, or managing a workflow—without step-by-step human instructions.

Deep learning

A type of machine learning that uses artificial neural networks with many layers to learn complex patterns from large amounts of data. It underlies modern image recognition, speech recognition, and language models.

Foundation model

A large AI model trained on broad data at scale so it can be adapted to many downstream tasks—drafting, translation, coding—rather than built for a single purpose. Large language models are the most prominent examples.

Generative AI

Artificial intelligence that creates new content—text, images, audio, video, or code—by learning patterns from large datasets, rather than only analyzing or classifying existing information.

Large language model (LLM)

A neural network trained on massive text corpora to predict the next token, producing fluent text. The engine behind ChatGPT, Claude, and Gemini.

Machine learning (ML)

A field of computer science in which systems learn patterns from data and improve at tasks through experience, instead of following rules written by hand. The foundation of modern AI, including large language models.

Multimodal AI

AI systems that can process and generate more than one type of data—such as text, images, audio, and video—within a single model. A multimodal model can, for example, describe a photograph or read a scanned document.

Natural language processing (NLP)

The branch of AI concerned with enabling computers to read, interpret, and generate human language. It covers tasks such as translation, summarization, classification, and the text generation performed by large language models.

Transformer

The neural network architecture behind modern language models, introduced by Google researchers in 2017. Its attention mechanism lets a model weigh how every word in a passage relates to every other, enabling fluent long-form text.

How models work

Context window

The maximum amount of text, measured in tokens, an AI model can consider at one time—covering the instructions, any documents provided, and the response. Material beyond the window is invisible to the model.

Embedding

A list of numbers that represents the meaning of a piece of text, so that similar content sits close together mathematically. Embeddings let software search by concept rather than by exact keywords.

Fine-tuning

Additional training applied to an existing AI model using a smaller, specialized dataset so it performs better on a particular domain or task—such as legal drafting conventions—without building a model from scratch.

Inference

The stage at which a trained AI model is put to work—taking an input and producing output. Every chatbot answer is an act of inference; training, by contrast, is when the model learned.

Model weights

The numerical parameters—often billions of them—inside a neural network that encode everything the model learned during training. The weights are the model: whoever holds them can run or modify it.

Reinforcement learning from human feedback (RLHF)

A training method in which people rate a model's answers and those ratings teach it to produce responses humans prefer—more helpful, better aligned with instructions, less harmful. A key step in turning raw language models into usable assistants.

Retrieval-augmented generation (RAG)

A technique that has an AI model answer using documents fetched from a trusted source at question time, instead of relying only on its training. It grounds responses in citable material and reduces—but does not eliminate—hallucination.

System prompt

Standing instructions given to an AI model before any user input, defining its role, rules, tone, and limits. Users of a product usually never see the system prompt steering their conversation.

Temperature

A setting that controls how predictable or varied an AI model's output is. Low temperature makes responses more consistent and conservative; high temperature makes them more varied and creative.

Token

The small unit of text a language model actually reads and writes—a word, part of a word, or punctuation mark. Models measure input size, output length, pricing, and usage limits in tokens.

Training data

The collection of text, images, or other examples a machine-learning model learns from. A model's knowledge, abilities, and biases all derive from what its training data did and did not contain.

Vector database

A database built to store embeddings—numerical representations of meaning—and quickly find the entries most similar to a query. The storage layer behind semantic search and most retrieval-augmented generation systems.

Using AI

API (application programming interface)

A defined way for software programs to communicate with each other. AI providers expose their models through APIs, letting firms and vendors build the models into their own tools and workflows.

Chain-of-thought (CoT)

A technique in which an AI model works through a problem step by step before giving its answer, either because the prompt requests it or because the model was trained to reason this way. It improves performance on multi-step problems.

Closed model

An AI model whose weights remain private, accessible only through the provider's app or API—as with the leading commercial models. Users send data to the provider's servers and accept its terms to use the model.

Enterprise AI

AI services sold to organizations under commercial terms—typically with administrative controls, security certifications, contractual data protections, and commitments not to train on customer data—as distinct from free or consumer versions of the same tools.

Few-shot prompting

Including a handful of worked examples in a prompt—inputs paired with the desired outputs—so the AI model learns the pattern and applies it to new material. Often markedly improves accuracy and formatting consistency.

Hallucination

An AI model's confident production of false information—invented facts, quotes, or citations—presented as though true. A structural byproduct of how language models generate text by probability rather than by looking facts up.

Open model

An AI model whose weights are published for anyone to download, run, and modify—on their own hardware if they choose. Contrasts with closed models accessible only through a provider's service.

Prompt

The input a user gives an AI model—an instruction, question, or document—to which the model responds. The quality and specificity of the prompt strongly influence the quality of the output.

Prompt engineering

The practice of designing and refining the instructions given to AI models to get accurate, useful, and consistent output—specifying role, context, format, examples, and constraints rather than asking open-ended questions.

Zero data retention (ZDR)

A contractual arrangement in which an AI provider does not store a customer's prompts or outputs after processing the request—data is used to generate the response and then deleted rather than logged.

Zero-shot prompting

Asking an AI model to perform a task with instructions alone—no worked examples included in the prompt. The model relies entirely on what it learned in training to understand the request.

Risk and governance

AI audit

A structured review of an AI system or an organization's AI use, testing whether it works as claimed and complies with applicable policies, contracts, and laws—examining accuracy, bias, security, and data handling.

AI disclosure

Informing affected parties—clients, courts, counterparties, or the public—that artificial intelligence was used in producing work or making a decision. Disclosure obligations may arise from court orders, client agreements, regulation, or professional-conduct duties.

AI governance

The framework of policies, roles, processes, and controls an organization uses to direct and oversee its use of artificial intelligence—covering tool approval, acceptable use, risk review, training, and accountability.

AI policy

An organization's written rules for AI use: which tools are approved, what data may be entered, what review output requires, and who is accountable. The baseline document of any AI governance program.

Data residency

Where data is physically stored and processed—which country's or region's servers hold it. Residency determines which jurisdictions' laws can reach the data and is a common requirement in client and regulatory commitments.

Human in the loop (HITL)

A design and oversight approach in which a person reviews, corrects, or approves an AI system's output before it takes effect—keeping human judgment between the machine and the consequence.

Model training on inputs

A provider practice in which the prompts and content users submit are used to train future versions of the AI model. Whether a tool trains on inputs is a central confidentiality question for legal users.

Shadow AI

Employees' use of AI tools without the organization's knowledge or approval—pasting work content into personal chatbot accounts, for example. The AI-era version of shadow IT, and a leading source of unmanaged data risk.

SOC 2

A widely used independent audit report on a service provider's controls for security, availability, processing integrity, confidentiality, and privacy, developed by the American Institute of Certified Public Accountants (AICPA). A standard item in vendor security diligence.

Legal AI applications

AI intake chatbot

A conversational AI assistant on a law firm's website or phone line that greets prospective clients, gathers facts about their matter, screens for fit and conflicts information, and schedules consultations—at any hour, in natural language.

Contract analysis AI

AI tools that read contracts to extract key terms, flag risks and deviations from preferred positions, compare drafts against playbooks, and summarize obligations—across single agreements or entire contract portfolios.

Document automation

Software that assembles legal documents from templates and structured inputs—answers to a questionnaire populate clauses, names, and terms to produce a consistent draft. Increasingly augmented with generative AI for drafting flexibility.

E-discovery AI

AI used in electronic discovery to identify, classify, and prioritize documents in litigation and investigations—finding responsive material, flagging potential privilege, and surfacing key facts across collections far too large for page-by-page human review.

Legal AI

Artificial intelligence applied to legal work—research, document review, drafting, contract analysis, e-discovery, and client intake. Most current legal AI is built on large language models adapted with legal data sources and retrieval.

Legal research AI

AI tools that answer legal questions and find authority by searching case law, statutes, and secondary sources, then generating a synthesized, cited response—typically using retrieval-augmented generation over a curated legal database.

Technology-assisted review (TAR)

An e-discovery workflow in which reviewers code a sample of documents and a machine-learning model learns from that coding to classify or rank the rest of the collection, sharply reducing manual review.

About this glossary

Who is this glossary for?

Lawyers, firm administrators, and legal ops professionals who keep meeting AI vocabulary in vendor pitches, ethics opinions, and CLE sessions. Every definition is written to be quoted: self-contained, jargon-free, and accurate without being academic.

How is this different from a general AI glossary?

Each definition notes the legal-practice angle where one exists: what hallucination means for court filings, why model training on inputs is a confidentiality issue, what zero data retention actually promises. For the binding rules themselves, our companion Legal AI Compliance Tracker covers every state bar opinion and court rule.

Can I reuse these definitions?

Yes, with attribution to legalai.help. The glossary exists to make AI conversations in legal settings precise; quote it in your firm training materials freely.

Educational information, not legal advice. AI terminology and tools change quickly; definitions reflect usage as of the last-updated date. For what bar associations and courts actually require of lawyers using AI, see legalaicompliance.help and consult a licensed attorney in your jurisdiction.