Skip to Main Content
FGCU Logo

Computational Analysis of Chat Transcripts

Library Assessment Conference 2024 poster presentation

Glossary of Terms

Artificial Intelligence (AI) - A broad set of computer science technologies that enables machines to simulate human behavior.

Corpus - A collection of written or spoken text. A key component of NLP research, often used to train AI and ML.

HTML Character Entitites - Reserved characters in HTML like   (non-breaking space) and & (ampersand).

HTML Tags - Building blocks of HTML, such as <b></b> to bold text.

Large Language Model - A specific type of ML designed for NLP tasks like understanding and generating human language.

Machine Learning (ML) - An application of AI that allows machines to learn from data using patterns and inference.

Named Entities - Classifiers like names, identification numbers, addresses, and emails. 

Natural Language Processing - A subfield of computer science that uses ML to analyze and interpret natural language.

Personally Identifiable Information - Information that can be used to identify a person.

Regular Expressions - A sequence of characters represented as a text string meant to describe a search pattern.

Stop Words - Common, inconsequential words like "a", "of", and "it" that are filtered out from a corpus.

Tokenization - Breaking a larger text down into smaller units like sentences, words, or even syllables to facilitate analysis.