Archive for October 2018

Computational linguistics – short glossary

By Marie Lebert, version of 14 November 2018.

Here is a short basic down-to-earth glossary based on definitions read on Wikipedia. The simpler the better. It should improve over time.


AI
see artificial intelligence

algorithm
specification for performing calculation, data processing and automated reasoning tasks

allomorph
variant form of a morpheme

allophone
set of multiple possible phones (or signs for the sign language) used to pronounce a single phoneme in a specific natural language

API
see application programming interface

application programming interface / API
set of tools and resources in an operating system in order to create software applications

applied linguistics
branch of linguistics studying language-related real-life problems and solutions in education, psychology, communication research, anthropology, sociology, etc.

AR
see augmented reality

artificial intelligence / AI
intelligence demonstrated by machines, a field originated in the 1960s that came to include computational linguistics (originated in the 1950s)

ASR
automatic speech recognition, see speech recognition

augmented reality / AR
technology superimposing a computer-generated image on a user’s view of the real world; “augmentation” of the real-world environment with computer-generated perceptual information (visual, auditory, sensory, olfactory)

automatic speech recognition / ASR
see speech recognition

big data
data sets that are too complex for standard data-processing application software, for example big data obtained by social media mining from user-generated content on social media sites and apps

CAT
see computer-assisted translation

character encoding
encoding of textual data with an encoding system such as Unicode

chatbot
web or mobile interface used by a human being to ask questions through text, sound or video, and retrieve information from hard-coded answers or from a larger content base using machine learning

CLI
see command line interface

cloud database
database on a cloud computing platform

CMUdict
see CMU Pronouncing Dictionary

CMU Pronouncing Dictionary / CMUdict
open-source pronouncing dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in speech recognition research

code
algorithm used to convert information (letter, word, sound, image, gesture) into another form of representation for communication and storage

command-line interface / CLI
interface with a command in the form of lines of text

command shell
command-line interface program to an operating system

compiled language
programming language whose implementations are compilers (and not interpreters)

compiler
program that transforms computer code written in one programming language into another programming language

computational linguistics
branch of linguistics which processes natural languages using computer science and mathematics for analysis and synthesis of language and speech; originated in the 1950s with machine translation; includes applications such as spell and grammar checkers, speech synthesis, speech recognition, virtual assistants and smart speakers

computational science
multidisciplinary field using computing capabilities for science

computational semantics
study of computing capabilities for semantics

computer-assisted translation / CAT
language translation in which a human translator uses specific software to support and facilitate the translation process; includes translation memory, language search engines, terminology management, alignment, interactive machine translation and augmented translation

computer linguistics
see computational linguistics

computer science
study of computers (hardware, software, networks, internet) and computing concepts

computer vision
theory behind the artificial systems that extract data from digital images or videos in order to process, analyze and understand such data

content analysis
process of studying digital media (texts, pictures, audio, video) and communication patterns is a systematic manner

conversational interface
interface that uses natural language processing (NLP) and natural language understanding (NLU) to run a conversation with a human being, for example a voice assistant

conversational user interface / CUI
see conversational interface

corpus
see text corpus

corpus linguistics
study of language as expressed in bodies (corpora) of written text; originated in the 1970s to advance discourse analysis

CUI
conversational user interface, see conversational interface

data analysis
process of inspecting, cleaning, transforming and modeling data to find useful information

data manipulation
process of inserting, deleting, modifying and updating data

data mining
process of turning raw data into useful information; used for example for machine learning or statistics programs

data model
abstract model that organizes elements of data and standardizes how they relate to one another

data modeling
process of creating a data model for an information system by applying formal techniques

data processing
collecting, storing, visualizing, searching, querying, analyzing, updating, sharing and transferring data

data science
field that uses statistics, data analysis and machine learning to extract knowledge from data

data set
collection of data

decoding
process of converting code symbols back into information, for example information expressed in a plain natural language

deep learning
machine learning method based on learning data representation as opposed to task-specific algorithms

descriptive linguistics
branch of linguistics which analyses and describes how natural language is actually used by a group of people

dictionary
listing in alphabetical order of the lexicon of a natural language (or two or several natural languages), with definitions, usage, etymologies, pronunciations and translations

digital assistant
see virtual assistant

diphone
adjacent pair of phones; often used for the recording of the transition between two phones, with better resulting sounds in speech synthesis than if combining two phones

discourse analysis
study of language use; language can be written language, vocal language and/or sign language

encoding
process of converting information into code symbols for communication and storage

expert system
system that emulates the decision-making ability of a human expert

general linguistics
see theoretical linguistics

glossary
alphabetical list of terms in a given field with the definition of those terms

grammar
system of rules which allow for the combination of words into sentences; includes morphology (grammar of word forms) and syntax (grammar of sentence structure)

grammatology
study of the history and theory of writing and writing systems

graphematics
see graphemics

grapheme
visual character that is the smallest unit of a writing system in a natural language

graphemics
linguistic study of writing systems and their graphemes

graphical user interface / GUI
interface which allows users to interact with other users through graphical icons and visual indicators

GUI
see graphical user interface

HCI
see human-computer interaction

hot word
word providing hands-free activation of a voice-command device with an integrated virtual assistant; also called wake word

human-computer interaction / HCI
definition and development of interfaces between users and computers

human language
see natural language

index
alphabetical list created in order to locate data in a data set

inference engine
system component that applies logical rules to the knowledge base in order to deduce new information

information system
organized system for collecting, storing, classifying and communicating information

International Phonetic Alphabet / IPA
alphabetical system of phonetic notation based primarily on the Latin alphabet; created by the International Phonetic Association in the late 19th century to standardize the representation of the sounds of spoken language

Internet of Things / IoT
network of physical devices, home appliances, cars, wearable devices and other items embedded with electronics, software, sensors, actuators (movers) and connectivity in order to collect and exchange data

interpreter
linguist who translates speech into another language; computer program that directly executes instructions written in a programming or scripting language

IoT
see Internet of Things

IPA
see International Phonetic Alphabet

KB
see knowledge base

KBS
see knowledge-based system

kernel
core of an operating system

KM
see knowledge management

knowledge base / KB
base that stores complex structured and unstructured information used by a computer system

knowledge-based system / KBS
computer system that reasons and uses the knowledge base

knowledge management / KM
process of creating, using, sharing and managing the knowledge (information) of an organization

language
see natural language, programming language, voice command language

language change
variation over time in the various features of a natural language (phonological, morphological, semantic, syntactic)

language for general purpose / LGP
term used for a general dictionary (called language-for-general-purpose dictionary or LGP dictionary) that provides a description of a natural language in general use

language for specific purpose / LSP
term used for a specialized dictionary (called language-for-specific-purpose dictionary or LSP dictionary) that defines the specialized vocabulary used by experts in a subject field

language science
see linguistics

language usage
manner in which natural language is used by a user or a group of users

lemma
dictionary form used for a set of words, for example “run” for the set of words “run”, “runs”, “ran” and “running”

lexical resource
database offering one or several dictionaries (monolingual, bilingual, multilingual)

lexicalization
process of adding items to a lexicon, for example words, set phrases and word patterns

lexicography
practice of compiling, writing and editing general or specialized dictionaries; study of the semantic relationships in the vocabulary (lexicon) of a natural language

lexicon
vocabulary of a user, a language or a branch of knowledge; inventory of lexemes

LGP
see language for general purpose

linguist
specialist studying natural language and other languages (artificial, constructed); in a broader sense, language professional such as a translator, interpreter, copyeditor and/or proofreader

linguistic corpora
collection of linguistic data, either written text or transcriptions of recorded speech

linguistics
study of language, including language form, language meaning and language in context

linked data
structured data that are interlinked for more or better results in semantic queries

Linux
family of free and open-source software operating systems built around the Linux kernel

Linux kernel
open-source Unix-like operating system kernel, first released in 1991 by Linus Torvalds

locale
set of parameters that defines a user’s language (language identifier) and region (region identifier) in a user interface

localization
adaptation of a translated product to a specific country, region or language community for cultural adaptation to its market and customs

LSP
see language for specific purpose

machine intelligence
see artificial intelligence, machine learning

machine learning
field that uses statistical techniques for computer systems to learn from data

machine translation
translation of text or speech from one language to another by a computer program

metadata
data that provide information about other data

morpheme
unit of meaning varying in sound without changing the meaning

morphology
study of the internal structure of words (formation and composition)

natural intelligence
intelligence displayed by humans (and animals)

natural language
human language, that basically consists of a lexicon (list of words) and a grammar (for the combination of words into sentences)

natural language generation / NLG
process of generating natural language from a machine representation system such as a knowledge base

natural language processing / NLP
field that uses computer programs to process large amounts of data pertaining to natural language

natural language understanding / NLU
subfield of natural language processing for machine reading comprehension; includes search engines optimization, news gathering, text categorization, voice activation, large-scale content analysis, automated customer service and online education

natural-language user interface
computer-human interface in which linguistic components (verbs, phrases, etc.) act as UI (user interface) controls for creating, selecting and modifying data in software applications

network science
study of complex networks (telecommunications, computer, biological, cognitive, semantic and social networks) and their connections between their elements or actors

NLG
see natural language generation

NLP
see natural language processing

NLU
see natural language understanding

OCR
see optical character recognition

ontology
formal naming and definition of the categories, properties and relations between concepts, data and entities in a domain of discourse

open data
data that are freely available for everyone to use and republish without restrictions from copyrights or patents

operating system / OS
system that manages computer hardware and software resources, and provides common services for computer programs

optical character recognition / OCR
electronic conversion of scans or photographs of text (printed, typed, handwritten) into machine-encoded text

OS
see operating system

paradigm
set of concepts or thought patterns such as theories, research methods, postulates and standards

parsing
analyzing a string of symbols from large-scale empirical data in order to annotate the syntactic and/or semantic sentence structure and create a parsed corpus (or treebank)

part of speech / POS
category of words with similar grammatical properties, for example noun, verb, article, adjective, preposition, pronoun, adverb, conjunction, interjection (nine main parts of speech) in English, with 50 to 150 sub-categories for part-of-speech tagging

part-of-speech tagging / POST
process of marking up a word on a particular part of speech in order to study its use in relationship with adjacent and related words in a phrase, sentence or paragraph (or more simply to study its use in context)

phone
any distinct speech sound, or any distinct speech gesture for sign language

phoneme
unit of sound (or gesture for sign language) that distinguishes one word from another

phonemic transcription
visual representation of phonemes with a phonetic alphabet such as the International Phonetic Alphabet (IPA)

phonetic notation
see phonetic transcription

phonetic transcription
visual representation of speech sounds (phones), usually by using a phonetic alphabet such as the International Phonetic Alphabet (IPA)

phonetics
study of human speech sounds; includes articulatory phonetics (production of speech sounds by the speech organ), auditory phonetics (reception of speech sounds from the ear to the brain) and acoustic phonetics (loudness, amplitude and frequency of speech sounds)

phonology
study of how sounds are used in natural language to convey meaning; includes for example stress (emphasis on a given syllable or word) and intonation (variations in spoken pitch)

phonotactics
branch of phonology that deals with restrictions in a natural language on the permissible combinations of phonemes

phrasebook
collection of ready-made phrases, often in the form of indexed questions and answers, for example phrases along with a translation to learn the basics of a foreign natural language

POS
see part of speech

POST
see part-of-speech tagging

pragmatics
study of the way in which context contributes to meaning

programming
building and designing an executable program for a specific computing task

programming language
set of commands, instructions and other syntax use

psycholinguistics
study of the interrelation between linguistic factors and psychological aspects

question answering
system that automatically answers questions asked in a natural language

relational database
database based on the relational model of data (model that manages data as a set of relations)

SAMPA
stands for Speech Assessment Methods Phonetic Alphabet; computer-readable phonetic alphabet based on the International Phonetic Alphabet (IPA) and using 7-bit printable ASCII characters

script
program written for a special run-time environment to automate the execution of tasks

scripting language
programming language that supports scripts to automate the execution of tasks

semantics
study of meaning in natural languages and programming languages

semiotics
study of meaning-making signs and sign processes

sentiment analysis
process that uses natural language processing (NLP) and text analysis to identify, extract, quantify and study subjective information such as users’ reviews and surveys

shell
user interface for access to the services of an operating system; outermost layer (hence its name shell) around the operating system kernel

shell script
program designed to be run by the Unix shell, a command-line interpreter

smart speaker
voice command device with an integrated virtual assistant, that offers hands-free activation with the help of one hot word (or wake word)

sociolinguistics
study of the effect of society on the way natural language is used; takes into account ethnicity, gender, age range, education, social status, religion and other factors

sociology of language
study of the effect of natural language on society

speaker recognition
identification of users from their voice biometrics

speech
vocal communication using language

speech recognition
process that enables the recognition, interpretation and translation of spoken language by computers, for example in the built-in speech recognition software offered by most operating systems; originated in the late 1970s

speech synthesis
artificial production of human speech by a computer program, which such software included in operating systems since the early 1990s

standard library
library made available across implementations of a programming language

statistics
branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation

stylistics
study of linguistic factors that place a discourse in context

sublanguage
subset of a natural language, a computer language or a relational database

syntagma
elementary constituent segment within a text, for example a phoneme, word, phrase or sentence

syntax
study of language structure (formation and composition of phrases and sentences) in order to describe how structural relations between elements in a sentence (often depicted in parse tree format) contribute to its interpretation; set of rules that define a structured computer program

syntax analysis
see parsing

taxonomy
classification that improves relevance in vertical search, for example for a web search query

terminology
study of terms (words and compound words) and their use

text corpus
structured set of texts for storage and processing; can be for example a monolingual corpus, a multilingual corpus, a translation corpus (texts and their translations), a parallel corpus (texts alongside their translations), or a comparable corpus (texts covering the same contents)

text processing
creation and manipulation of electronic text, for example reformatting or content change (search and replace, select and move, etc.)

theoretical linguistics
study of the nature of language itself and its relation to cognitive processes; includes phonology, morphology, syntax and semantics

thesaurus
listing of words grouped according to similarity of meaning; controlled vocabulary organizing semantic metadata for information storage and retrieval

treebank
see parsing

triphone
sequence of three phones; used in natural language processing (NLP) to establish the various contexts of a phone in a specific natural language

TTS
see text to speech

UI
see user interface

Unix
family of multitasking, multiuser computer operating systems launched in the 1970s

Unix shell
command-line interpreter providing a Unix-like command-line user interface

user interface / UI
design field of human-computer interaction; can be a command-line interface (CLI) or a graphical user interface (GUI)

VCD
see voice command device

virtual assistant
software agent performing tasks and services for a user

virtual reality / VR
replacement of the user’s real-world environment with a computer-generated simulation of a three-dimensional environment that be accessed with electronic equipment, for example a helmet with a screen or gloves with sensors

vocabulary
set of words for communication and knowledge acquisition; can be for example reading vocabulary, listening vocabulary, speaking vocabulary, writing vocabulary, native language vocabulary, second language vocabulary and foreign language vocabulary

voice command device / VCD
device controlled by the human voice, for example a mobile phone with voice-activated dialing or a remote controller

voice recognition
see speaker recognition, speech recognition

voice tag
short audio phrase used as a command to a voice command device or a voice user interface

voice user interface / VUI
voice/speech platform for computer-human interaction to initiate an automated service or process

VR
see virtual reality

VUI
see voice user interface

wake word
see hot word

wearable device
smart electronic device (with micro-controllers) that can be incorporated into clothing, worn as an accessory (for example a smartwatch or a fitness tracker) or worn on/in the body as an implant

wearable technology
technology behind smart devices and items worn on the body or in the body

web mining
data mining to discover patterns from the web

writing system
conventional method of visually representing verbal communication by converting spoken language into visual symbols for a wider communication across space and time


Copyright © 2018 Marie Lebert
License CC BY-NC-SA version 4.0

Written by marielebert

2018/10/17 at 23:13

Posted in Uncategorized