FedOne: Query-Efficient Federated Learning for Black-Box Discrete Prompts

As large language models become more pervasive, organizations are turning to prompt-based learning to tailor models to niche tasks without retraining from scratch. When data lives on user devices or across organizations, federated learning promises to learn shared prompting strategies without exposing raw information. Yet the combination of black-box discrete prompts and federated environments introduces a stubborn bottleneck: each evaluation of a candidate prompt is a costly query to the model, and the space of possible prompts grows combinatorially. FedOne confronts this challenge head-on by prioritizing query efficiency while preserving privacy, robustness, and real-world practicality.

The challenge of black-box discrete prompts in federation

Discrete prompts—sequences of tokens drawn from a fixed vocabulary—are inherently non-differentiable. When you treat a prompt as a decision variable and the underlying model as a black box, traditional gradient-based optimization simply isn’t an option. In a federated setting, each client’s data distribution can differ dramatically, making naive search across the prompt space expensive and potentially unstable. Moreover, frequent communication of prompts or full gradient signals would quickly exhaust bandwidth and raise privacy concerns. The core problem becomes:

How to explore the space of discrete prompts efficiently with as few model queries as possible?
How to aggregate feedback from heterogeneous clients without leaking sensitive data?
How to ensure the learned prompts generalize across non-IID user data and tasks?

FedOne: a high-level view of the approach

FedOne adopts a query-conscious strategy that treats prompt optimization as a collaborative search problem across clients. The server maintains a knowledge base of promising prompt candidates and leverages a lightweight surrogate model to forecast which prompts are likely to perform well, given limited feedback. Clients evaluate a carefully chosen subset of prompts on their local data and return compact, privacy-preserving signals. These signals guide the next round of prompt proposals, gradually honing in on high-impact prompts with minimal queries.

“If you want a scalable federation for prompts, the key is to decouple what we try from how we learn and do as much as possible with relative feedback rather than raw data.”

Core components that make FedOne work

Discrete prompt library and encoding: A compact, well-structured search space avoids an intractable explosion in options. Prompts are represented in a way that supports efficient comparison and sampling.
Surrogate-guided search: A lightweight model (e.g., a probabilistic forecaster) estimates the expected utility of candidate prompts based on prior evaluations, reducing the need for brute-force querying.
Privacy-preserving evaluation: Clients locally score prompts against their data and send only aggregated rewards or minimal statistics, never raw inputs or model activations.
Communication-efficient aggregation: Updates are compressed and sparsified, enabling quicker rounds and lower bandwidth without sacrificing convergence.
Robust aggregation over non-IID data: FedOne accounts for client heterogeneity, ensuring that a few biased clients don’t skew the global prompt direction.

Driving query efficiency in practice

The essence of FedOne lies in balancing exploration with prudent querying. By prioritizing high-potential prompts early and reusing information from previous rounds, the system avoids wasteful evaluations. A few practical techniques underpin this efficiency:

Adaptive prompting budgets allocate more queries to regions of the prompt space with uncertain payoff and fewer to well-understood areas.
Cross-client sharing of qualitative signals allows the server to build a richer signal without exposing data, accelerating convergence.
Early-stopping rules terminate unproductive prompt trials before expending excessive queries on them.

These design choices collectively reduce the total number of expensive model evaluations per client, cutting the overall query load while maintaining robust performance across tasks and domains.

Practical considerations for teams adopting FedOne

Beyond algorithmic ingenuity, real-world deployment hinges on thoughtful engineering choices. Privacy and security sit at the top of the stack, with strict controls over what gets transmitted and how signals are anonymized. The prompt space must be engineered to reflect domain knowledge, ensuring that the library remains comprehensive yet tractable. Latency, fault tolerance, and versioning of prompts matter when users expect responsive experiences and consistent results across updates. Finally, monitoring and auditing mechanisms help track convergence behavior and detect any drift caused by shifting data distributions.

Evaluation perspectives

Assessing FedOne requires a multi-faceted lens. Key metrics include:

Query efficiency: total number of model evaluations required to reach a target performance.
Prompts-yield: quality of the final discrete prompts on held-out data and across tasks.
Communication rounds: how many rounds are needed for convergence under a given privacy and bandwidth budget.
Robustness: performance stability across non-IID client distributions and varying data sizes.

What this enables for the future

FedOne points toward a practical pathway for federated prompt learning that respects privacy and scales with real-world constraints. As models evolve, the ability to adapt prompts in a distributed, query-efficient manner will become increasingly valuable for industries ranging from healthcare to financial services, where sensitive data and swift adaptation must coexist. Looking ahead, extensions could explore richer discrete prompt spaces, multi-modal prompts, or hybrid approaches that blend discrete and lightweight continuous refinements without compromising the core federated, black-box, and query-efficient guarantees.

In the end, the promise of FedOne is clear: you can teach powerful prompts across many shores without overburdening the system with costly queries. A careful balance of exploration, privacy, and collaboration turns a daunting search into a manageable, scalable journey toward more capable, data-respecting language interfaces.