Ant Colony Client Selection for Federated Learning on Non-IID Healthcare Data

A proposal for combining Ant Colony Optimization with cluster-backbone federated learning to address client heterogeneity in cross-hospital healthcare model training.

Federated learning (FL) has matured into a practical paradigm for training models across hospitals without centralizing patient records (Nguyen et al., 2022). Yet two stubborn obstacles continue to limit its clinical adoption: the statistical heterogeneity of data across institutions (the so-called non-IID problem), and the communication cost of repeatedly synchronizing large models with a fluctuating pool of clients. In this article I argue that a class of solutions long studied in the metaheuristics literature — specifically Ant Colony Optimization (ACO) — is a natural fit for the client selection sub-problem in FL, and that pairing it with cluster-backbone aggregation produces measurable benefits for healthcare workloads.

Why client selection is the bottleneck

In cross-silo healthcare FL, each round typically involves: (i) the server picking a subset of clients, (ii) those clients training locally, and (iii) the server aggregating their updates — the FedAvg template (McMahan et al., 2017). When client data distributions diverge — e.g., a tertiary cardiac center versus a rural clinic — uniformly random selection wastes rounds on contributors whose gradients pull the global model in conflicting directions; this is the empirical pathology that more recent selection schemes try to fix (Zhang et al., 2021). The result is slow convergence, more communication, and, in privacy-aware deployments, a larger cumulative differential-privacy budget.

Client selection is, formally, a combinatorial optimization problem over a graph whose nodes are institutions and whose edges encode similarity, latency, and trust. This is precisely the setting where ACO has historically excelled. In earlier work on distributed ACO for the Travelling Salesman Problem on a Raspberry Pi cluster (Alobaedy, Khalaf, & Fazea, 2022), pheromone trails were shown to converge robustly even under noisy, heterogeneous compute conditions — properties one wants in a federation of hospitals running heterogeneous hardware.

A cluster-backbone view of the federation

A second ingredient comes from recent work on federated k-means based on cluster backbones (Deng, Wang, & Alobaedy, 2025). The cluster-backbone idea is that, instead of exchanging raw centroids or full models every round, clients agree on a compact skeleton of representative structures that summarizes the local data geometry. The backbone is cheap to transmit and reveals which clients occupy similar regions of the data manifold.

That skeleton is exactly the kind of side-information a metaheuristic can exploit. If two hospitals share large portions of their backbone, an ant traversing the federation graph should be biased toward selecting one of them per round rather than both — they are statistically redundant. Conversely, when a backbone region is under-represented in recent rounds, pheromone should accumulate on the clients that cover it, nudging the selection process to restore coverage.

A concrete proposal

The proposal is to treat each FL round as one iteration of an ACO solver whose objective combines three terms: (1) expected reduction in global loss, estimated from backbone overlap; (2) communication and latency cost, drawn from standard edge-device cost models for federated training; and (3) a privacy-budget penalty that rises as a client is sampled more frequently. Pheromone evaporation handles the third term naturally: a client repeatedly chosen loses attractiveness over time, which is the discrete analog of a per-client privacy accountant.

The defensible, non-survey claim is this: for non-IID healthcare FL, a backbone-informed ACO selector should require strictly fewer communication rounds to reach a target validation metric than both uniform random selection and gradient-norm-based selection, while keeping the worst-case per-client sampling rate lower. Two mechanisms drive the prediction. First, the backbone gives a low-dimensional, privacy-preserving proxy for client similarity, so the heuristic component of ACO is informed rather than blind. Second, evaporation enforces sampling diversity without requiring an explicit fairness constraint — diversity emerges from the dynamics.

Where this connects to ongoing work

The pieces needed to test this claim already exist in adjacent literatures. Blockchain-anchored FL frameworks for EHR management (Munusamy & Jothi, 2025) provide the auditability layer for logging which clients were selected and on what basis — important for clinical governance and for reproducing experiments across institutions. Comprehensive benchmarking of classical and ensemble classifiers on structured heart disease data (Kumar et al., 2025) supplies a tractable, well-understood downstream task on which to measure the selector's effect without confounding it with deep model instability.

Two caveats deserve emphasis. First, ACO introduces hyperparameters — evaporation rate, pheromone-versus-heuristic weighting, and colony size — that themselves need tuning, and naive tuning can erase the gains; the sensitivity of ACS to even the number of ants alone has been documented empirically (Alobaedy, Khalaf, & Muraina, 2017). Second, the backbone exchange must itself be privacy-budgeted; otherwise the selector leaks distributional information that the FL protocol is meant to hide. Both are tractable engineering problems, but they are the right places to focus skepticism.

The broader point is that the FL community tends to treat client selection as either a uniform sampling problem or a reinforcement-learning problem. Population-based metaheuristics, particularly ACO, occupy a useful middle ground: they are explainable, parallelizable, and — as the Raspberry Pi cluster experiments suggested — robust to the kind of heterogeneous, intermittently available compute that real hospital federations actually present.

References

Alobaedy, M. M., Khalaf, A. A., & Fazea, Y. (2022). Distributed multi-ant colony system algorithm using Raspberry Pi cluster for Travelling Salesman Problem. Iraqi Journal of Science, 4067–4078.

Alobaedy, M. M., Khalaf, A. A., & Muraina, I. D. (2017). Analysis of the number of ants in ant colony system algorithm. 2017 5th International Conference on Information and Communication Technology (ICoIC7), 1–5. IEEE.

Deng, Z., Wang, Y., & Alobaedy, M. M. (2025). Federated k-means based on clusters backbone. PLOS One, 20(6), e0326145. https://doi.org/10.1371/journal.pone.0326145

Kumar, R., Garg, S., Kaur, R., Johar, M. G. M., Singh, S., Menon, S. V., Kumar, P., Hadi, A. M., Hasson, S. A., & Lozanović, J. (2025). A comprehensive review of machine learning for heart disease prediction: Challenges, trends, ethical considerations, and future directions. Frontiers in Artificial Intelligence, 8, 1583459. https://doi.org/10.3389/frai.2025.1583459

McMahan, B., Moore, E., Ramage, D., Hampson, S., & Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 1273–1282.

Munusamy, S., & Jothi, K. R. (2025). Blockchain-enabled federated learning with edge analytics for secure and efficient electronic health records management. Scientific Reports, 15, 27524. https://doi.org/10.1038/s41598-025-12225-x

Nguyen, D. C., Pham, Q.-V., Pathirana, P. N., Ding, M., Seneviratne, A., Lin, Z., Dobre, O. A., & Hwang, W.-J. (2022). Federated learning for smart healthcare: A survey. ACM Computing Surveys, 55(3), 1–37. https://doi.org/10.1145/3501296

Zhang, W., Wang, X., Zhou, P., Wu, W., & Zhang, X. (2021). Client selection for federated learning with non-IID data in mobile edge computing. IEEE Access, 9, 24462–24474. https://doi.org/10.1109/ACCESS.2021.3056919

Why client selection is the bottleneck

A cluster-backbone view of the federation

A concrete proposal

Where this connects to ongoing work

References

Keywords