Federated Learning Glossary

Terminology in Federated Learning can be complex and context-specific. The glossary presents clear, concise definitions of key concepts and technical terms to ensure conceptual clarity and promote consistent understanding across disciplines.

Term	Description	Category
Active Learning	Model selects most informative data points for labeling.	General ML
Adverse Event	Unintended medical occurrence during treatment or study.	Clinical/Healthcare
Algorithmic Fairness	Ensuring machine learning models avoid biased or discriminatory outcomes.	General ML
Alignment	Matching data or models to a reference standard.	Bioinformatics
Allele Frequency	Proportion of a specific allele among all alleles in a population.	Bioinformatics
Anonymization	Removing or masking personal identifiers from data.	Data & Privacy
Asynchronous Federated Learning	FL where clients update models at different times, unsynchronized.	Federated Learning
AutoFE in Federated Learning	Automated feature engineering adapted for federated learning settings.	Federated Learning
AutoML	Automated process of model selection, training, and tuning.	General ML
Batch Effect	Systematic differences between data batches, often in biomedical data.	Bioinformatics
Bias Mitigation	Techniques to reduce bias in machine learning models.	General ML
Bioinformatics	Application of computational tools to biological data.	Bioinformatics
Biomarker	Biological molecule indicating a process, condition, or disease.	Clinical/Healthcare
Blockchain in FL	Using blockchain for secure, transparent model updates in FL.	Federated Learning
Bootstrapping	Resampling technique for estimating statistics or model performance.	Analytics
BYOD (Bring Your Own Data)	Participants contribute their own data to collaborative analysis.	Data & Privacy
Byzantine-Robust Aggregation	Aggregation methods resilient to malicious or faulty clients in FL.	Federated Learning
Casemix	Measuring clinical activity based on patient characteristics for reimbursement.	Clinical/Healthcare
Centralized Learning	Model training with all data collected in one location.	General ML
ChIP-Seq	Technique to analyze protein interactions with DNA.	Bioinformatics
Client Clustering	Grouping clients with similar data distributions in FL to enhance performance.	Federated Learning
Clinical Decision Support System (CDSS)	System providing clinicians with knowledge to enhance patient care decisions.	Clinical/Healthcare
Clinical Trial	Research study to evaluate medical, surgical, or behavioral interventions.	Clinical/Healthcare
Cohort Study	Observational study following a group over time.	Clinical/Healthcare
Common Data Model (CDM)	Standardized structure for organizing data to facilitate sharing and analysis.	Data & Privacy
Communication-Efficient Algorithms	FL algorithms designed to minimize communication overhead.	Federated Learning
Consent Management	Handling patient permissions for data use and sharing.	Data & Privacy
Continuous Learning	Model updates as new data arrives, without retraining from scratch.	General ML
Cross-Silo Federated Learning	FL among organizations (e.g., hospitals) with large datasets.	Federated Learning
Cross-Validation	Splitting data into folds to assess model performance.	Analytics
Data Acquisition	Gathering data from various sources for analysis.	Data & Privacy
Data Anonymization	Removing identifiable information from datasets to protect privacy.	Data & Privacy
Data Augmentation	Creating new data samples by modifying existing ones.	General ML
Data Cleaning	Correcting or removing erroneous data to improve quality.	Data & Privacy
Data Dictionary	Descriptive list of data elements in a system or database.	Data & Privacy
Data Drift	Change in data distribution over time, affecting model performance.	Analytics
Data Federation	Sharing data from distributed sources without centralization.	Data & Privacy
Data Governance	Managing data availability, usability, integrity, and security.	Data & Privacy
Data Harmonization	Standardizing data from multiple sources to a common format.	Data & Privacy
Data Imputation	Filling in missing data values using statistical methods.	Analytics
Data Integration	Combining data from different sources into a unified view.	Data & Privacy
Data Lake	Centralized repository for storing raw, unstructured data.	Data & Privacy
Data Leakage	Unintended exposure of information from outside the training dataset.	Data & Privacy
Data Lineage	Tracking data origin and transformations throughout its lifecycle.	Data & Privacy
Data Minimization	Limiting data collection to only what is necessary.	Data & Privacy
Data Preprocessing	Preparing data for analysis through normalization and encoding.	Analytics
Data Provenance	Documentation of data origins and processing history.	Data & Privacy
Data Quality	Measure of data’s accuracy, completeness, and reliability.	Data & Privacy
Data Stewardship	Overseeing data assets to ensure quality and compliance.	Data & Privacy
Data Use Agreement (DUA)	Contract governing data sharing and usage between parties.	Data & Privacy
Data Wrangling	Cleaning and transforming raw data into a usable format.	Analytics
Deep Learning	Machine learning using neural networks with multiple layers.	General ML
De-identification	Removing or obscuring personal identifiers from data.	Data & Privacy
Descriptive Analytics	Examining data to understand past events and trends.	Analytics
Differential Expression	Identifying genes expressed differently between conditions.	Bioinformatics
Differential Privacy	Ensuring outputs do not reveal individual data points.	Security/Privacy
Digital Biomarker	Digital data indicating health status or disease progression.	Clinical/Healthcare
Digital Pathology	Analysis of digitized pathology slides using computational methods.	Clinical/Healthcare
Digital Twin	Virtual representation of a patient or system for simulation.	Clinical/Healthcare
Distributed Learning	Model training across multiple locations or devices.	General ML
DNA Sequencing	Determining the order of nucleotides in DNA.	Bioinformatics
Edge Computing in FL	Performing FL computations on edge devices to reduce latency.	Federated Learning
Electronic Health Record (EHR)	Digital record of a patient’s medical history.	Clinical/Healthcare
Electronic Medical Record (EMR)	Digital version of a patient’s paper chart.	Clinical/Healthcare
Ensemble Learning	Combining multiple models to improve prediction accuracy.	General ML
Ethics Board	Committee overseeing ethical aspects of research and data use.	Clinical/Healthcare
Exploratory Data Analysis (EDA)	Summarizing main characteristics of datasets through analysis.	Analytics
Explainable AI (XAI)	AI systems whose decisions can be understood by humans.	General ML
FAIR Principles	Guidelines for making data Findable, Accessible, Interoperable, Reusable.	Data & Privacy
Federated Analytics	Analyzing distributed data without moving or centralizing it.	Federated Learning
Federated Averaging (FedAvg)	FL algorithm averaging local model parameters for global updates.	Federated Learning
Federated Feature Engineering	Feature engineering in FL without sharing raw data.	Federated Learning
Federated Learning	ML training across decentralized devices without data exchange.	Federated Learning
Federated One-Shot Analysis	Single-round federated analysis without iterative communication.	Federated Learning
Federated Query	Querying distributed datasets without centralizing data.	Federated Learning
FedProx	FL algorithm improving performance on non-IID data.	Federated Learning
FHIR	Standard for electronic healthcare information exchange.	Clinical/Healthcare
Genotype	Genetic makeup of an organism.	Bioinformatics
Genome-Wide Association Study (GWAS)	Study associating genetic variants with traits or diseases.	Bioinformatics
Generalization	Model’s ability to perform well on unseen data.	General ML
Gradient Leakage	Attack reconstructing training data from shared gradients.	Security/Privacy
Health Information Exchange (HIE)	Electronic sharing of health-related information among organizations.	Clinical/Healthcare
HL7	Standards for transferring clinical and administrative data.	Clinical/Healthcare
Homomorphic Encryption	Encryption allowing computations on encrypted data without decryption.	Security/Privacy
Horizontal Federated Learning	FL with same features but different samples across clients.	Federated Learning
Horizontally Partitioned Data	Data with different rows stored in different locations.	Data & Privacy
Hyperparameter	Parameter set before training, not learned from data.	General ML
ICD-10	International classification system for diseases and health conditions.	Clinical/Healthcare
Imbalanced Data	Datasets where some classes are underrepresented.	Analytics
Informed Consent	Patient agreement for data use in research.	Clinical/Healthcare
Interoperability	Ability of systems to exchange and use information.	Data & Privacy
k-anonymity	Ensuring records are indistinguishable from at least k-1 others.	Security/Privacy
Key Performance Indicators (KPIs)	Metrics evaluating organizational or activity success.	Analytics
Label Noise	Incorrect or inconsistent labels in training data.	Analytics
Label Propagation	Spreading labels from labeled to unlabeled data points.	General ML
Latency	Delay between input and response in a system.	Analytics
Local Differential Privacy	Privacy protection applied at the data source before sharing.	Security/Privacy
Longitudinal Study	Research collecting data from the same subjects over time.	Clinical/Healthcare
Machine Learning	Enabling computers to learn from data without explicit programming.	General ML
Medical Imaging	Creating visual representations of the interior of a body.	Clinical/Healthcare
Membership Inference	Attacks to identify if data was used in training.	Security/Privacy
Meta-Learning in Federated Learning	Meta-learning for fast adaptation of FL global models.	Federated Learning
Metabolomics	Study of chemical processes involving metabolites.	Bioinformatics
Minimum Data Set	Smallest set of data elements for a specific purpose.	Data & Privacy
mHealth	Using mobile devices for medicine and public health.	Clinical/Healthcare
Model Compression	Reducing model size for efficiency.	General ML
Model Deployment	Integrating machine learning models into production environments.	General ML
Model Drift	Model performance degrades due to changing data.	Analytics
Model Evaluation	Assessing model performance using metrics like accuracy.	Analytics
Model Explainability	Ability to interpret and understand model predictions.	General ML
Model Personalization	Adapting FL global models to individual client data.	Federated Learning
Model Poisoning	Malicious client updates degrading FL global models.	Security/Privacy
Model Selection	Choosing the best machine learning model for a task.	General ML
Model Training	Teaching machine learning models using data.	General ML
Multi-Omics	Integrative analysis of multiple omics data types.	Bioinformatics
Multi-Task Learning	Training models on multiple related tasks simultaneously.	General ML
Neural Network	Computational model inspired by the human brain.	General ML
Next-Generation Sequencing (NGS)	High-throughput DNA sequencing technologies.	Bioinformatics
Non-IID Data	Data not independently and identically distributed across clients.	Federated Learning
OHDSI	Community developing standards for observational health data.	Clinical/Healthcare
Omics Data	Large-scale datasets from genomics, proteomics, etc.	Bioinformatics
One-Shot Federated Learning	FL training global model in a single communication round.	Federated Learning
Ontology	Structured vocabulary for a domain, enabling data integration.	Bioinformatics
Overfitting	Model learns training data too well, performs poorly on new data.	General ML
Patient Cohort	Group of patients sharing common characteristics.	Clinical/Healthcare
Patient Similarity Learning	Identifying similar patients for diagnosis or treatment planning.	Clinical/Healthcare
Pathology Informatics	Application of informatics in pathology for data management and analysis.	Clinical/Healthcare
Personal Health Record (PHR)	Health record managed and controlled by the patient.	Clinical/Healthcare
Personalized Federated Learning (PFL)	FL customizing models for each client’s data.	Federated Learning
Personally Identifiable Information (PII)	Data that can identify an individual.	Data & Privacy
Pharmacogenomics	Study of how genes affect drug response.	Bioinformatics
Phenotype	Observable characteristics of an organism.	Bioinformatics
Predictive Analytics	Predicting future events using data analysis.	Analytics
Prescriptive Analytics	Recommending actions for optimal outcomes using data.	Analytics
Privacy by Design	Incorporating privacy into system design from the start.	Security/Privacy
Privacy-Preserving Computation	Computations that protect private data.	Security/Privacy
Proteomics	Study of the structure and function of proteins.	Bioinformatics
Pseudonymization	Replacing private identifiers with fake identifiers.	Security/Privacy
Quality Assurance (QA)	Ensuring data and processes meet defined quality standards.	Analytics
Quality Control (QC)	Operational techniques to fulfill quality requirements.	Analytics
Real-World Data (RWD)	Data collected from routine clinical practice.	Clinical/Healthcare
Real-World Evidence (RWE)	Clinical evidence from real-world data analysis.	Clinical/Healthcare
Reproducibility	Ability to obtain consistent results using the same data and methods.	Analytics
Scaffold	FL algorithm reducing client drift using control variates.	Federated Learning
Secure Aggregation	Protocol ensuring only aggregated updates are visible to the server.	Security/Privacy
Secure Enclave	Hardware-based secure area for sensitive computations.	Security/Privacy
Secure Multi-Party Computation	Cryptographic protocol for private multi-party computations.	Security/Privacy
Semi-Supervised Learning	ML using both labeled and unlabeled data.	General ML
SHAP Values	Method for explaining individual model predictions.	Analytics
Single-Cell Analysis	Study of gene expression at the single-cell level.	Bioinformatics
SNOMED CT	Standardized clinical terminology for electronic health records.	Clinical/Healthcare
Synthetic Data	Artificially generated data resembling real data.	Data & Privacy
Supervised Learning	ML using labeled data to train models.	General ML
Swarm Learning	Decentralized ML using blockchain for coordination.	Federated Learning
Telemedicine	Remote diagnosis and treatment via telecommunications.	Clinical/Healthcare
Test Data	Dataset for evaluating trained model performance.	Analytics
Tokenization	Converting sensitive data into non-sensitive tokens.	Security/Privacy
Training Data	Dataset used to train machine learning models.	Analytics
Transfer Learning	Reusing a pre-trained model for a new task.	General ML
Transcriptomics	Study of RNA transcripts produced by the genome.	Bioinformatics
Trusted Execution Environment (TEE)	Secure area of a processor for sensitive computations.	Security/Privacy
Underfitting	Model too simple to capture data patterns.	General ML
Unsupervised Learning	ML finding patterns in unlabeled data.	General ML
Validation Data	Dataset for tuning hyperparameters to prevent overfitting.	Analytics
Variant Calling	Identifying genetic variants from sequence data.	Bioinformatics
Vertical Federated Learning	FL with different features for the same samples across clients.	Federated Learning
Vertically Partitioned Data	Data with different columns stored in different locations.	Data & Privacy
Zero-Knowledge Proof	Proving knowledge of information without revealing it.	Security/Privacy

More information

Contributors