Terminology in Federated Learning can be complex and context-specific. The glossary presents clear, concise definitions of key concepts and technical terms to ensure conceptual clarity and promote consistent understanding across disciplines.
Term | Description | Category |
---|---|---|
Active Learning | Model selects most informative data points for labeling. | General ML |
Adverse Event | Unintended medical occurrence during treatment or study. | Clinical/Healthcare |
Algorithmic Fairness | Ensuring machine learning models avoid biased or discriminatory outcomes. | General ML |
Alignment | Matching data or models to a reference standard. | Bioinformatics |
Allele Frequency | Proportion of a specific allele among all alleles in a population. | Bioinformatics |
Anonymization | Removing or masking personal identifiers from data. | Data & Privacy |
Asynchronous Federated Learning | FL where clients update models at different times, unsynchronized. | Federated Learning |
AutoFE in Federated Learning | Automated feature engineering adapted for federated learning settings. | Federated Learning |
AutoML | Automated process of model selection, training, and tuning. | General ML |
Batch Effect | Systematic differences between data batches, often in biomedical data. | Bioinformatics |
Bias Mitigation | Techniques to reduce bias in machine learning models. | General ML |
Bioinformatics | Application of computational tools to biological data. | Bioinformatics |
Biomarker | Biological molecule indicating a process, condition, or disease. | Clinical/Healthcare |
Blockchain in FL | Using blockchain for secure, transparent model updates in FL. | Federated Learning |
Bootstrapping | Resampling technique for estimating statistics or model performance. | Analytics |
BYOD (Bring Your Own Data) | Participants contribute their own data to collaborative analysis. | Data & Privacy |
Byzantine-Robust Aggregation | Aggregation methods resilient to malicious or faulty clients in FL. | Federated Learning |
Casemix | Measuring clinical activity based on patient characteristics for reimbursement. | Clinical/Healthcare |
Centralized Learning | Model training with all data collected in one location. | General ML |
ChIP-Seq | Technique to analyze protein interactions with DNA. | Bioinformatics |
Client Clustering | Grouping clients with similar data distributions in FL to enhance performance. | Federated Learning |
Clinical Decision Support System (CDSS) | System providing clinicians with knowledge to enhance patient care decisions. | Clinical/Healthcare |
Clinical Trial | Research study to evaluate medical, surgical, or behavioral interventions. | Clinical/Healthcare |
Cohort Study | Observational study following a group over time. | Clinical/Healthcare |
Common Data Model (CDM) | Standardized structure for organizing data to facilitate sharing and analysis. | Data & Privacy |
Communication-Efficient Algorithms | FL algorithms designed to minimize communication overhead. | Federated Learning |
Consent Management | Handling patient permissions for data use and sharing. | Data & Privacy |
Continuous Learning | Model updates as new data arrives, without retraining from scratch. | General ML |
Cross-Silo Federated Learning | FL among organizations (e.g., hospitals) with large datasets. | Federated Learning |
Cross-Validation | Splitting data into folds to assess model performance. | Analytics |
Data Acquisition | Gathering data from various sources for analysis. | Data & Privacy |
Data Anonymization | Removing identifiable information from datasets to protect privacy. | Data & Privacy |
Data Augmentation | Creating new data samples by modifying existing ones. | General ML |
Data Cleaning | Correcting or removing erroneous data to improve quality. | Data & Privacy |
Data Dictionary | Descriptive list of data elements in a system or database. | Data & Privacy |
Data Drift | Change in data distribution over time, affecting model performance. | Analytics |
Data Federation | Sharing data from distributed sources without centralization. | Data & Privacy |
Data Governance | Managing data availability, usability, integrity, and security. | Data & Privacy |
Data Harmonization | Standardizing data from multiple sources to a common format. | Data & Privacy |
Data Imputation | Filling in missing data values using statistical methods. | Analytics |
Data Integration | Combining data from different sources into a unified view. | Data & Privacy |
Data Lake | Centralized repository for storing raw, unstructured data. | Data & Privacy |
Data Leakage | Unintended exposure of information from outside the training dataset. | Data & Privacy |
Data Lineage | Tracking data origin and transformations throughout its lifecycle. | Data & Privacy |
Data Minimization | Limiting data collection to only what is necessary. | Data & Privacy |
Data Preprocessing | Preparing data for analysis through normalization and encoding. | Analytics |
Data Provenance | Documentation of data origins and processing history. | Data & Privacy |
Data Quality | Measure of data’s accuracy, completeness, and reliability. | Data & Privacy |
Data Stewardship | Overseeing data assets to ensure quality and compliance. | Data & Privacy |
Data Use Agreement (DUA) | Contract governing data sharing and usage between parties. | Data & Privacy |
Data Wrangling | Cleaning and transforming raw data into a usable format. | Analytics |
Deep Learning | Machine learning using neural networks with multiple layers. | General ML |
De-identification | Removing or obscuring personal identifiers from data. | Data & Privacy |
Descriptive Analytics | Examining data to understand past events and trends. | Analytics |
Differential Expression | Identifying genes expressed differently between conditions. | Bioinformatics |
Differential Privacy | Ensuring outputs do not reveal individual data points. | Security/Privacy |
Digital Biomarker | Digital data indicating health status or disease progression. | Clinical/Healthcare |
Digital Pathology | Analysis of digitized pathology slides using computational methods. | Clinical/Healthcare |
Digital Twin | Virtual representation of a patient or system for simulation. | Clinical/Healthcare |
Distributed Learning | Model training across multiple locations or devices. | General ML |
DNA Sequencing | Determining the order of nucleotides in DNA. | Bioinformatics |
Edge Computing in FL | Performing FL computations on edge devices to reduce latency. | Federated Learning |
Electronic Health Record (EHR) | Digital record of a patient’s medical history. | Clinical/Healthcare |
Electronic Medical Record (EMR) | Digital version of a patient’s paper chart. | Clinical/Healthcare |
Ensemble Learning | Combining multiple models to improve prediction accuracy. | General ML |
Ethics Board | Committee overseeing ethical aspects of research and data use. | Clinical/Healthcare |
Exploratory Data Analysis (EDA) | Summarizing main characteristics of datasets through analysis. | Analytics |
Explainable AI (XAI) | AI systems whose decisions can be understood by humans. | General ML |
FAIR Principles | Guidelines for making data Findable, Accessible, Interoperable, Reusable. | Data & Privacy |
Federated Analytics | Analyzing distributed data without moving or centralizing it. | Federated Learning |
Federated Averaging (FedAvg) | FL algorithm averaging local model parameters for global updates. | Federated Learning |
Federated Feature Engineering | Feature engineering in FL without sharing raw data. | Federated Learning |
Federated Learning | ML training across decentralized devices without data exchange. | Federated Learning |
Federated One-Shot Analysis | Single-round federated analysis without iterative communication. | Federated Learning |
Federated Query | Querying distributed datasets without centralizing data. | Federated Learning |
FedProx | FL algorithm improving performance on non-IID data. | Federated Learning |
FHIR | Standard for electronic healthcare information exchange. | Clinical/Healthcare |
Genotype | Genetic makeup of an organism. | Bioinformatics |
Genome-Wide Association Study (GWAS) | Study associating genetic variants with traits or diseases. | Bioinformatics |
Generalization | Model’s ability to perform well on unseen data. | General ML |
Gradient Leakage | Attack reconstructing training data from shared gradients. | Security/Privacy |
Health Information Exchange (HIE) | Electronic sharing of health-related information among organizations. | Clinical/Healthcare |
HL7 | Standards for transferring clinical and administrative data. | Clinical/Healthcare |
Homomorphic Encryption | Encryption allowing computations on encrypted data without decryption. | Security/Privacy |
Horizontal Federated Learning | FL with same features but different samples across clients. | Federated Learning |
Horizontally Partitioned Data | Data with different rows stored in different locations. | Data & Privacy |
Hyperparameter | Parameter set before training, not learned from data. | General ML |
ICD-10 | International classification system for diseases and health conditions. | Clinical/Healthcare |
Imbalanced Data | Datasets where some classes are underrepresented. | Analytics |
Informed Consent | Patient agreement for data use in research. | Clinical/Healthcare |
Interoperability | Ability of systems to exchange and use information. | Data & Privacy |
k-anonymity | Ensuring records are indistinguishable from at least k-1 others. | Security/Privacy |
Key Performance Indicators (KPIs) | Metrics evaluating organizational or activity success. | Analytics |
Label Noise | Incorrect or inconsistent labels in training data. | Analytics |
Label Propagation | Spreading labels from labeled to unlabeled data points. | General ML |
Latency | Delay between input and response in a system. | Analytics |
Local Differential Privacy | Privacy protection applied at the data source before sharing. | Security/Privacy |
Longitudinal Study | Research collecting data from the same subjects over time. | Clinical/Healthcare |
Machine Learning | Enabling computers to learn from data without explicit programming. | General ML |
Medical Imaging | Creating visual representations of the interior of a body. | Clinical/Healthcare |
Membership Inference | Attacks to identify if data was used in training. | Security/Privacy |
Meta-Learning in Federated Learning | Meta-learning for fast adaptation of FL global models. | Federated Learning |
Metabolomics | Study of chemical processes involving metabolites. | Bioinformatics |
Minimum Data Set | Smallest set of data elements for a specific purpose. | Data & Privacy |
mHealth | Using mobile devices for medicine and public health. | Clinical/Healthcare |
Model Compression | Reducing model size for efficiency. | General ML |
Model Deployment | Integrating machine learning models into production environments. | General ML |
Model Drift | Model performance degrades due to changing data. | Analytics |
Model Evaluation | Assessing model performance using metrics like accuracy. | Analytics |
Model Explainability | Ability to interpret and understand model predictions. | General ML |
Model Personalization | Adapting FL global models to individual client data. | Federated Learning |
Model Poisoning | Malicious client updates degrading FL global models. | Security/Privacy |
Model Selection | Choosing the best machine learning model for a task. | General ML |
Model Training | Teaching machine learning models using data. | General ML |
Multi-Omics | Integrative analysis of multiple omics data types. | Bioinformatics |
Multi-Task Learning | Training models on multiple related tasks simultaneously. | General ML |
Neural Network | Computational model inspired by the human brain. | General ML |
Next-Generation Sequencing (NGS) | High-throughput DNA sequencing technologies. | Bioinformatics |
Non-IID Data | Data not independently and identically distributed across clients. | Federated Learning |
OHDSI | Community developing standards for observational health data. | Clinical/Healthcare |
Omics Data | Large-scale datasets from genomics, proteomics, etc. | Bioinformatics |
One-Shot Federated Learning | FL training global model in a single communication round. | Federated Learning |
Ontology | Structured vocabulary for a domain, enabling data integration. | Bioinformatics |
Overfitting | Model learns training data too well, performs poorly on new data. | General ML |
Patient Cohort | Group of patients sharing common characteristics. | Clinical/Healthcare |
Patient Similarity Learning | Identifying similar patients for diagnosis or treatment planning. | Clinical/Healthcare |
Pathology Informatics | Application of informatics in pathology for data management and analysis. | Clinical/Healthcare |
Personal Health Record (PHR) | Health record managed and controlled by the patient. | Clinical/Healthcare |
Personalized Federated Learning (PFL) | FL customizing models for each client’s data. | Federated Learning |
Personally Identifiable Information (PII) | Data that can identify an individual. | Data & Privacy |
Pharmacogenomics | Study of how genes affect drug response. | Bioinformatics |
Phenotype | Observable characteristics of an organism. | Bioinformatics |
Predictive Analytics | Predicting future events using data analysis. | Analytics |
Prescriptive Analytics | Recommending actions for optimal outcomes using data. | Analytics |
Privacy by Design | Incorporating privacy into system design from the start. | Security/Privacy |
Privacy-Preserving Computation | Computations that protect private data. | Security/Privacy |
Proteomics | Study of the structure and function of proteins. | Bioinformatics |
Pseudonymization | Replacing private identifiers with fake identifiers. | Security/Privacy |
Quality Assurance (QA) | Ensuring data and processes meet defined quality standards. | Analytics |
Quality Control (QC) | Operational techniques to fulfill quality requirements. | Analytics |
Real-World Data (RWD) | Data collected from routine clinical practice. | Clinical/Healthcare |
Real-World Evidence (RWE) | Clinical evidence from real-world data analysis. | Clinical/Healthcare |
Reproducibility | Ability to obtain consistent results using the same data and methods. | Analytics |
Scaffold | FL algorithm reducing client drift using control variates. | Federated Learning |
Secure Aggregation | Protocol ensuring only aggregated updates are visible to the server. | Security/Privacy |
Secure Enclave | Hardware-based secure area for sensitive computations. | Security/Privacy |
Secure Multi-Party Computation | Cryptographic protocol for private multi-party computations. | Security/Privacy |
Semi-Supervised Learning | ML using both labeled and unlabeled data. | General ML |
SHAP Values | Method for explaining individual model predictions. | Analytics |
Single-Cell Analysis | Study of gene expression at the single-cell level. | Bioinformatics |
SNOMED CT | Standardized clinical terminology for electronic health records. | Clinical/Healthcare |
Synthetic Data | Artificially generated data resembling real data. | Data & Privacy |
Supervised Learning | ML using labeled data to train models. | General ML |
Swarm Learning | Decentralized ML using blockchain for coordination. | Federated Learning |
Telemedicine | Remote diagnosis and treatment via telecommunications. | Clinical/Healthcare |
Test Data | Dataset for evaluating trained model performance. | Analytics |
Tokenization | Converting sensitive data into non-sensitive tokens. | Security/Privacy |
Training Data | Dataset used to train machine learning models. | Analytics |
Transfer Learning | Reusing a pre-trained model for a new task. | General ML |
Transcriptomics | Study of RNA transcripts produced by the genome. | Bioinformatics |
Trusted Execution Environment (TEE) | Secure area of a processor for sensitive computations. | Security/Privacy |
Underfitting | Model too simple to capture data patterns. | General ML |
Unsupervised Learning | ML finding patterns in unlabeled data. | General ML |
Validation Data | Dataset for tuning hyperparameters to prevent overfitting. | Analytics |
Variant Calling | Identifying genetic variants from sequence data. | Bioinformatics |
Vertical Federated Learning | FL with different features for the same samples across clients. | Federated Learning |
Vertically Partitioned Data | Data with different columns stored in different locations. | Data & Privacy |
Zero-Knowledge Proof | Proving knowledge of information without revealing it. | Security/Privacy |