Empirical Analysis of SISA Machine Unlearning:
How Sharding Impacts Model Performance
As machine learning systems increasingly handle sensitive data, the ability to “unlearn” specific information has become essential for privacy compliance (GDPR, CCPA, etc.). The SISA (Sharded, Isolated, Sliced, and Aggregated) framework was proposed to make this process efficient — allowing models to forget data without complete retraining.
In this project, we conducted an empirical evaluation of SISA to understand how dividing data into shards affects model accuracy, stability, and retraining efficiency.
Methodology
We trained and tested three models across datasets of varying complexity:
CNN on MNIST
ResNet on CIFAR-10
UNet on a binary segmentation dataset
Each model was trained under the SISA framework, where the dataset was split into multiple shards. We then simulated data removal (machine unlearning) and retrained the models to measure how accuracy changed with different shard counts.
Key Findings
Performance drops as shards increase.
More shards mean each sub-model sees less data, reducing its ability to generalize.Training cost scales non-linearly.
Higher shard counts increase retraining time and resource usage.Low shard counts are optimal.
A small number of shards (2–5) balances performance and unlearning efficiency.Unexpected behaviors in complex models.
In some cases (like UNet), small datasets or missing regularization led to unusual accuracy gains after unlearning, hinting at training instability.
Takeaway
SISA offers a promising solution for privacy-focused machine learning, but our experiments show that it comes with a cost: accuracy and stability decline as the system becomes more fragmented.
For practical deployments, the key is to find the right shard balance—enough to enable efficient unlearning but few enough to maintain reliable performance.
Looking Ahead
Future work will explore:
Adaptive shard sizing and smarter aggregation methods
Integrating regularization and data augmentation
Combining SISA with differential privacy for enhanced protection