publications | Jary Pomponi

2025

Adaptive Semantic Token Communication for Transformer-based Edge Inference

Alessio Devoto , Jary Pomponi , Mattia Merluzzi , Paolo Di Lorenzo , and Simone Scardapane

arXiv preprint arXiv:2505.17604, 2025

@article{devoto2025adaptive,
  title = {Adaptive Semantic Token Communication for Transformer-based Edge Inference},
  author = {Devoto, Alessio and Pomponi, Jary and Merluzzi, Mattia and Di Lorenzo, Paolo and Scardapane, Simone},
  journal = {arXiv preprint arXiv:2505.17604},
  year = {2025},
  abs = {This paper presents an adaptive framework for edge inference based on a dynamically configurable transformer-powered deep joint source channel coding (DJSCC) architecture. Motivated by a practical scenario where a resource constrained edge device engages in goal oriented semantic communication, such as selectively transmitting essential features for object detection to an edge server, our approach enables efficient task aware data transmission under varying bandwidth and channel conditions. To achieve this, input data is tokenized into compact high level semantic representations, refined by a transformer, and transmitted over noisy wireless channels. As part of the DJSCC pipeline, we employ a semantic token selection mechanism that adaptively compresses informative features into a user specified number of tokens per sample. These tokens are then further compressed through the JSCC module, enabling a flexible token communication strategy that adjusts both the number of transmitted tokens and their embedding dimensions. We incorporate a resource allocation algorithm based on Lyapunov stochastic optimization to enhance robustness under dynamic network conditions, effectively balancing compression efficiency and task performance. Experimental results demonstrate that our system consistently outperforms existing baselines, highlighting its potential as a strong foundation for AI native semantic communication in edge intelligence applications.},
}

NACHOS: Neural Architecture Search for Hardware Constrained Early Exit Neural Networks

Matteo Gambella , Jary Pomponi , Simone Scardapane , and Manuel Roveri

IEEE Transactions on Neural Networks and Learning Systems (Under Review), 2025

Abs Bib Code

Early Exit Neural Networks (EENNs) endow astandard Deep Neural Network (DNN) with Early Exit Classifiers (EECs), to provide predictions at intermediate points of the processing when enough confidence in classification is achieved. This leads to many benefits in terms of effectiveness and efficiency. Currently, the design of EENNs is carried out manually by experts, a complex and time-consuming task that requires accounting for many aspects, including the correct placement, the thresholding, and the computational overhead of the EECs. For this reason, the research is exploring the use of Neural Architecture Search (NAS) to automatize the design of EENNs. Currently, few comprehensive NAS solutions for EENNs have been proposed in the literature, and a fully automated, joint design strategy taking into consideration both the backbone and the EECs remains an open problem. To this end, this work presents Neural Architecture Search for Hardware Constrained Early Exit Neural Networks (NACHOS), the first NAS framework for the design of optimal EENNs satisfying constraints on the accuracy and the number of Multiply and Accumulate (MAC) operations performed by the EENNs at inference time. In particular, this provides the joint design of backbone and EECs to select a set of admissible (i.e., respecting the constraints) Pareto Optimal Solutions in terms of best tradeoff between the accuracy and number of MACs. The results show that the models designed by NACHOS are competitive with the state-of-the-art EENNs. Additionally, this work investigates the effectiveness of two novel regularization terms designed for the optimization of the auxiliary classifiers of the EENN
@article{gambella2024nachos, title = {NACHOS: Neural Architecture Search for Hardware Constrained Early Exit Neural Networks}, author = {Gambella, Matteo and Pomponi, Jary and Scardapane, Simone and Roveri, Manuel}, year = {2025}, journal = {IEEE Transactions on Neural Networks and Learning Systems (Under Review)}, eprint = {2401.13330}, archiveprefix = {arXiv}, primaryclass = {cs.LG} }
Class Incremental Learning with probability dampening and cascaded gated classifier

Jary Pomponi , Alessio Devoto , and Simone Scardapane

Neurocomputing, 2025

Abs Bib Code

Humans are capable of acquiring new knowledge and transferring learned knowledge into different domains, incurring a small forgetting. The same ability, called Continual Learning, is challenging to achieve when operating with neural networks due to the forgetting affecting past learned tasks when learning new ones. This forgetting can be mitigated by replaying stored samples from past tasks, but a large memory size may be needed for long sequences of tasks; moreover, this could lead to overfitting on saved samples. In this paper, we propose a novel regularisation approach and a novel incremental classifier called, respectively, Margin Dampening and Cascaded Gates. The first combines a constraining loss and a knowledge distillation approach to preserve past learned knowledge while allowing the model to learn new patterns effectively. The latter is a gated incremental classifier, helping the model modify past predictions without directly interfering with them. This is achieved by modifying the output of the model with auxiliary scaling functions. We empirically show that our approach performs well on multiple benchmarks against well-established baselines, and we also study each component of our proposal and how the combinations of such components affect the final results.
@article{pomponi202, title = {Class Incremental Learning with probability dampening and cascaded gated classifier}, journal = {Neurocomputing}, pages = {130295}, year = {2025}, issn = {0925-2312}, author = {Pomponi, Jary and Devoto, Alessio and Scardapane, Simone} }
Joint or Disjoint: Mixing Training Regimes for Early-Exit Models

Piotr Kubaty , Bartosz Wójcik , Bartĺomiej Tomasz Krzepkowski , Monika Michaluk , Tomasz Trzcinski , and 2 more authors

In Forty-Second International Conference on Machine Learning (ICML 2025) , 2025

Abs Bib

Early exits enable the network’s forward pass to terminate early by attaching trainable internal classifiers to the backbone network. Existing early-exit methods typically adopt either a joint training approach, where the backbone and exit heads are trained simultaneously, or a disjoint approach, where the heads are trained separately. However, the implications of this choice are often overlooked, with studies typically adopting one approach without adequate justification. This choice influences training dynamics and its impact remains largely unexplored. In this paper, we introduce a set of metrics to analyze early-exit training dynamics and guide the choice of training strategy. We demonstrate that conventionally used joint and disjoint regimes yield suboptimal performance. To address these limitations, we propose a mixed training strategy: the backbone is trained first, followed by the training of the entire multi-exit network. Through comprehensive evaluations of training strategies across various architectures, datasets, and early-exit methods we present strengths and weaknesses of the early exit training strategies. In particular, we show consistent improvements in performance and efficiency using the proposed mixed strategy.
@inproceedings{krzepkowski2024joint, title = {Joint or Disjoint: Mixing Training Regimes for Early-Exit Models}, author = {Kubaty, Piotr and W{\'o}jcik, Bartosz and Krzepkowski, Bart{\'l}omiej Tomasz and Michaluk, Monika and Trzcinski, Tomasz and Pomponi, Jary and Adamczewski, Kamil}, year = {2025}, eprint = {2407.14320}, booktitle = {Forty-Second International Conference on Machine Learning (ICML 2025)} }

2024

Conditional computation in neural networks: principles and research trends

Simone Scardapane , Alessandro Baiocchi , Alessio Devoto , Valerio Marsocci , Pasquale Minervini , and 1 more author

2024

Abs Bib

This article summarizes principles and ideas from the emerging area of applying \textitconditional computation methods to the design of neural networks. In particular, we focus on neural networks that can dynamically activate or de-activate parts of their computational graph conditionally on their input. Examples include the dynamic selection of, e.g., input tokens, layers (or sets of layers), and sub-modules inside each layer (e.g., channels in a convolutional filter). We first provide a general formalism to describe these techniques in an uniform way. Then, we introduce three notable implementations of these principles: mixture-of-experts (MoEs) networks, token selection mechanisms, and early-exit neural networks. The paper aims to provide a tutorial-like introduction to this growing field. To this end, we analyze the benefits of these modular designs in terms of efficiency, explainability, and transfer learning, with a focus on emerging applicative areas ranging from automated scientific discovery to semantic communication.
@misc{scardapane2024conditional, title = {Conditional computation in neural networks: principles and research trends}, author = {Scardapane, Simone and Baiocchi, Alessandro and Devoto, Alessio and Marsocci, Valerio and Minervini, Pasquale and Pomponi, Jary}, year = {2024}, }
Adaptive Semantic Token Selection for AI-native Goal-oriented Communications

Alessio Devoto , Simone Petruzzi , Jary Pomponi , Paolo Di Lorenzo , and Simone Scardapane

2024

Abs Bib

In this paper, we propose a novel design for AI-native goal-oriented communications, exploiting transformer neural networks under dynamic inference constraints on bandwidth and computation. Transformers have become the standard architecture for pretraining large-scale vision and text models, and preliminary results have shown promising performance also in deep joint source-channel coding (JSCC). Here, we consider a dynamic model where communication happens over a channel with variable latency and bandwidth constraints. Leveraging recent works on conditional computation, we exploit the structure of the transformer blocks and the multihead attention operator to design a trainable semantic token selection mechanism that learns to select relevant tokens (e.g., image patches) from the input signal. This is done dynamically, on a per-input basis, with a rate that can be chosen as an additional input by the user. We show that our model improves over state-of-the-art token selection mechanisms, exhibiting high accuracy for a wide range of latency and bandwidth constraints, without the need for deploying multiple architectures tailored to each constraint. Last, but not least, the proposed token selection mechanism helps extract powerful semantics that are easy to understand and explain, paving the way for interpretable-by-design models for the next generation of AI-native communication systems.
@misc{devoto2024adaptive, title = {Adaptive Semantic Token Selection for AI-native Goal-oriented Communications}, author = {Devoto, Alessio and Petruzzi, Simone and Pomponi, Jary and Lorenzo, Paolo Di and Scardapane, Simone}, year = {2024}, eprint = {2405.02330}, archiveprefix = {arXiv}, primaryclass = {cs.IT}, }
Adaptive Layer Selection for Efficient Vision Transformer Fine-Tuning

Alessio Devoto , Federico Alvetreti , Jary Pomponi , Paolo Di Lorenzo , Pasquale Minervini , and 1 more author

2024

Abs Bib

Recently, foundation models based on Vision Transformers (ViTs) have become widely available. However, their fine-tuning process is highly resource-intensive, and it hinders their adoption in several edge or low-energy applications. To this end, in this paper we introduce an efficient fine-tuning method for ViTs called ALaST (Adaptive Layer Selection Fine-Tuning for Vision Transformers) to speed up the fine-tuning process while reducing computational cost, memory load, and training time. Our approach is based on the observation that not all layers are equally critical during fine-tuning, and their importance varies depending on the current mini-batch. Therefore, at each fine-tuning step, we adaptively estimate the importance of all layers and we assign what we call “compute budgets” accordingly. Layers that were allocated lower budgets are either trained with a reduced number of input tokens or kept frozen. Freezing a layer reduces the computational cost and memory usage by preventing updates to its weights, while discarding tokens removes redundant data, speeding up processing and reducing memory requirements. We show that this adaptive compute allocation enables a nearly-optimal schedule for distributing computational resources across layers, resulting in substantial reductions in training time (up to 1.5x), FLOPs (up to 2x), and memory load (up to 2x) compared to traditional full fine-tuning approaches. Additionally, it can be successfully combined with other parameter-efficient fine-tuning methods, such as LoRA.
@misc{devoto2024adaptivf, title = {Adaptive Layer Selection for Efficient Vision Transformer Fine-Tuning}, author = {Devoto, Alessio and Alvetreti, Federico and Pomponi, Jary and Lorenzo, Paolo Di and Minervini, Pasquale and Scardapane, Simone}, year = {2024}, eprint = {2408.08670}, }
Goal-Oriented Communications Based on Recursive Early Exit Neural Networks

Jary Pomponi , Mattia Merluzzi , Alessio Devoto , Mateus Pontes Mota , Paolo Di Lorenzo , and 1 more author

In 2024 58th Asilomar Conference on Signals, Systems, and Computers , 2024

Abs Bib

This paper presents a novel framework for goal-oriented semantic communications leveraging recursive early exit models. The proposed approach is built on two key components. First, we introduce an innovative early exit strategy that dynamically partitions computations, enabling samples to be offloaded to a server based on layer-wise recursive prediction dynamics that detect samples for which the confidence is not increasing fast enough over layers. Second, we develop a Rein-forcement Learning-based online optimization framework that jointly determines early exit points, computation splitting, and offloading strategies, while accounting for wireless conditions, inference accuracy, and resource costs. Numerical evaluations in an edge inference scenario demonstrate the method’s adaptability and effectiveness in striking an excellent trade-off between performance, latency, and resource efficiency.
@inproceedings{10942792, author = {Pomponi, Jary and Merluzzi, Mattia and Devoto, Alessio and Mota, Mateus Pontes and Di Lorenzo, Paolo and Scardapane, Simone}, booktitle = {2024 58th Asilomar Conference on Signals, Systems, and Computers}, title = {Goal-Oriented Communications Based on Recursive Early Exit Neural Networks}, year = {2024}, volume = {}, number = {}, pages = {378-382}, }

2023

Rearranging Pixels is a Powerful Black-Box Attack for RGB and Infrared Deep Learning Models

Jary Pomponi , Daniele Dántoni , Nicolosi Alessandro , and Simone Scardapane

IEEE Access, 2023

Abs Bib Code

Recent research has found that neural networks for computer vision are vulnerable to several types of external attacks that modify the input of the model, with the malicious intent of producing a misclassification. With the increase in the number of feasible attacks, many defence approaches have been proposed to mitigate the effect of these attacks and protect the models. Mainly, the research on both attack and defence has focused on RGB images, while other domains, such as the infrared domain, are currently underexplored. In this paper, we propose two attacks, and we evaluate them on multiple datasets and neural network models, showing that the results outperform others established attacks, on both RGB as well as infrared domains. In addition, we show that our proposal can be used in an adversarial training protocol to produce more robust models, with respect to both adversarial attacks and natural perturbations that can be applied to input images. Lastly, we study if a successful attack in a domain can be transferred to an aligned image in another domain, without any further tuning. The code, containing all the files and the configurations used to run the experiments, is available https://github.com/jaryP/IR-RGB-domain-attackonline.
@article{pomponi2023rearranging, title = {Rearranging Pixels is a Powerful Black-Box Attack for RGB and Infrared Deep Learning Models}, author = {Pomponi, Jary and D{\'a}ntoni, Daniele and Alessandro, Nicolosi and Scardapane, Simone}, journal = {IEEE Access}, volume = {11}, pages = {11298--11306}, year = {2023}, publisher = {IEEE}, }
Continual learning with invertible generative models

Jary Pomponi , Simone Scardapane , and Aurelio Uncini

Neural Networks, 2023

Abs Bib Code

Catastrophic forgetting (CF) happens whenever a neural network overwrites past knowledge while being trained on new tasks. Common techniques to handle CF include regularization of the weights (using, e.g., their importance on past tasks), and rehearsal strategies, where the network is constantly re-trained on past data. Generative models have also been applied for the latter, in order to have endless sources of data. In this paper, we propose a novel method that combines the strengths of regularization and generative-based rehearsal approaches. Our generative model consists of a normalizing flow (NF), a probabilistic and invertible neural network, trained on the internal embeddings of the network. By keeping a single NF throughout the training process, we show that our memory overhead remains constant. In addition, exploiting the invertibility of the NF, we propose a simple approach to regularize the network’s embeddings with respect to past tasks. We show that our method performs favorably with respect to state-of-the-art approaches in the literature, with bounded computational power and memory overheads.
@article{pomponi2023continual, title = {Continual learning with invertible generative models}, author = {Pomponi, Jary and Scardapane, Simone and Uncini, Aurelio}, journal = {Neural Networks}, volume = {164}, pages = {606--616}, year = {2023}, publisher = {Elsevier}, }

2022

Centroids Matching: an efficient Continual Learning approach operating in the embedding space

Jary Pomponi , Simone Scardapane , and Aurelio Uncini

Transactions on Machine Learning Research, 2022

Abs Bib Code

Catastrophic forgetting (CF) occurs when a neural network loses the information previously learned while training on a set of samples from a different distribution, i.e., a new task. Existing approaches have achieved remarkable results in mitigating CF, especially in a scenario called task incremental learning. However, this scenario is not realistic, and limited work has been done to achieve good results on more realistic scenarios. In this paper, we propose a novel regularization method called Centroids Matching, that, inspired by meta-learning approaches, fights CF by operating in the feature space produced by the neural network, achieving good results while requiring a small memory footprint. Specifically, the approach classifies the samples directly using the feature vectors produced by the neural network, by matching those vectors with the centroids representing the classes from the current task, or all the tasks up to that point. Centroids Matching is faster than competing baselines, and it can be exploited to efficiently mitigate CF, by preserving the distances between the embedding space produced by the model when past tasks were over, and the one currently produced, leading to a method that achieves high accuracy on all the tasks, without using an external memory when operating on easy scenarios, or using a small one for more realistic ones. Extensive experiments demonstrate that Centroids Matching achieves accuracy gains on multiple datasets and scenarios.
@article{pomponi2022centroids, title = {Centroids Matching: an efficient Continual Learning approach operating in the embedding space}, author = {Pomponi, Jary and Scardapane, Simone and Uncini, Aurelio}, journal = {Transactions on Machine Learning Research}, year = {2022}, }
Pixle: a fast and effective black-box attack based on rearranging pixels

Jary Pomponi , Simone Scardapane , and Aurelio Uncini

In 2022 International Joint Conference on Neural Networks (IJCNN) , 2022

Abs Bib Code Poster

Recent research has found that neural networks are vulnerable to several types of adversarial attacks, where the input samples are modified in such a way that the model produces a wrong prediction that misclassifies the adversarial sample. In this paper we focus on black-box adversarial attacks, that can be performed without knowing the inner structure of the attacked model, nor the training procedure, and we propose a novel attack that is capable of correctly attacking a high percentage of samples by rearranging a small number of pixels within the attacked image. We demonstrate that our attack works on a large number of datasets and models, that it requires a small number of iterations, and that the distance between the original sample and the adversarial one is negligible to the human eye.
@inproceedings{9892966, author = {Pomponi, Jary and Scardapane, Simone and Uncini, Aurelio}, booktitle = {2022 International Joint Conference on Neural Networks (IJCNN)}, title = {Pixle: a fast and effective black-box attack based on rearranging pixels}, year = {2022}, pages = {1-7}, }

2021

Structured Ensembles: An approach to reduce the memory footprint of ensemble methods

Jary Pomponi , Simone Scardapane , and Aurelio Uncini

Neural Networks, 2021

Abs Bib Code

In this paper, we propose a novel ensembling technique for deep neural networks, which is able to drastically reduce the required memory compared to alternative approaches. In particular, we propose to extract multiple sub-networks from a single, untrained neural network by solving an end-to-end optimization task combining differentiable scaling over the original architecture, with multiple regularization terms favouring the diversity of the ensemble. Since our proposal aims to detect and extract sub-structures, we call it Structured Ensemble. On a large experimental evaluation, we show that our method can achieve higher or comparable accuracy to competing methods while requiring significantly less storage. In addition, we evaluate our ensembles in terms of predictive calibration and uncertainty, showing they compare favourably with the state-of-the-art. Finally, we draw a link with the continual learning literature, and we propose a modification of our framework to handle continuous streams of tasks with a sub-linear memory cost. We compare with a number of alternative strategies to mitigate catastrophic forgetting, highlighting advantages in terms of average accuracy and memory.
@article{pomponi2021structured, title = {Structured Ensembles: An approach to reduce the memory footprint of ensemble methods}, author = {Pomponi, Jary and Scardapane, Simone and Uncini, Aurelio}, journal = {Neural Networks}, volume = {144}, pages = {407--418}, year = {2021}, publisher = {Elsevier}, }
A probabilistic re-intepretation of confidence scores in multi-exit models

Jary Pomponi , Simone Scardapane , and Aurelio Uncini

Entropy, 2021

Abs Bib Code

In this paper, we propose a new approach to train a deep neural network with multiple intermediate auxiliary classifiers, branching from it. These ‘multi-exits’ models can be used to reduce the inference time by performing early exit on the intermediate branches, if the confidence of the prediction is higher than a threshold. They rely on the assumption that not all the samples require the same amount of processing to yield a good prediction. In this paper, we propose a way to train jointly all the branches of a multi-exit model without hyper-parameters, by weighting the predictions from each branch with a trained confidence score. Each confidence score is an approximation of the real one produced by the branch, and it is calculated and regularized while training the rest of the model. We evaluate our proposal on a set of image classification benchmarks, using different neural models and early-exit stopping criteria.
@article{pomponi2021probabilistic, title = {A probabilistic re-intepretation of confidence scores in multi-exit models}, author = {Pomponi, Jary and Scardapane, Simone and Uncini, Aurelio}, journal = {Entropy}, volume = {24}, number = {1}, pages = {1}, year = {2021}, publisher = {MDPI}, }
Bayesian neural networks with maximum mean discrepancy regularization

Jary Pomponi , Simone Scardapane , and Aurelio Uncini

Neurocomputing, 2021

Abs Bib Code

Bayesian Neural Networks (BNNs) are trained to optimize an entire distribution over their weights instead of a single set, having significant advantages in terms of, e.g., interpretability, multi-task learning, and calibration. Because of the intractability of the resulting optimization problem, most BNNs are either sampled through Monte Carlo methods, or trained by minimizing a suitable Evidence Lower BOund (ELBO) on a variational approximation. In this paper, we propose a variant of the latter, wherein we replace the Kullback-Leibler divergence in the ELBO term with a Maximum Mean Discrepancy (MMD) estimator, inspired by recent work in variational inference. After motivating our proposal based on the properties of the MMD term, we proceed to show a number of empirical advantages of the proposed formulation over the state-of-the-art. In particular, our BNNs achieve higher accuracy on multiple benchmarks, including several image classification tasks. In addition, they are more robust to the selection of a prior over the weights, and they are better calibrated. As a second contribution, we provide a new formulation for estimating the uncertainty on a given prediction, showing it performs in a more robust fashion against adversarial attacks and the injection of noise over their inputs, compared to more classical criteria such as the differential entropy.
@article{pomponi2021bayesian, title = {Bayesian neural networks with maximum mean discrepancy regularization}, author = {Pomponi, Jary and Scardapane, Simone and Uncini, Aurelio}, journal = {Neurocomputing}, volume = {453}, pages = {428--437}, year = {2021}, publisher = {Elsevier}, }
Avalanche: an end-to-end library for continual learning

Vincenzo Lomonaco , Lorenzo Pellegrini , Andrea Cossu , Antonio Carta , Gabriele Graffieti , and 6 more authors

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2021

Abs Bib Code Website

Learning continually from non-stationary data streams is a long-standing goal and a challenging problem in machine learning. Recently, we have witnessed a renewed and fast-growing interest in continual learning, especially within the deep learning community. However, algorithmic solutions are often difficult to re-implement, evaluate and port across different settings, where even results on standard benchmarks are hard to reproduce. In this work, we propose Avalanche, an open-source end-to-end library for continual learning research based on PyTorch. Avalanche is designed to provide a shared and collaborative codebase for fast prototyping, training, and reproducible evaluation of continual learning algorithms.
@inproceedings{lomonaco2021avalanche, title = {Avalanche: an end-to-end library for continual learning}, author = {Lomonaco, Vincenzo and Pellegrini, Lorenzo and Cossu, Andrea and Carta, Antonio and Graffieti, Gabriele and Hayes, Tyler L and De Lange, Matthias and Masana, Marc and Pomponi, Jary and Van de Ven, Gido M and others}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages = {3600--3610}, year = {2021}, }

2020

Pseudo-rehearsal for continual learning with normalizing flows

Jary Pomponi , Simone Scardapane , and Aurelio Uncini

arXiv preprint arXiv:2007.02443, 2020

Abs Bib

Catastrophic forgetting (CF) happens whenever a neural network overwrites past knowledge while being trained on new tasks. Common techniques to handle CF include regularization of the weights (using, e.g., their importance on past tasks), and rehearsal strategies, where the network is constantly re-trained on past data. Generative models have also been applied for the latter, in order to have endless sources of data. In this paper, we propose a novel method that combines the strengths of regularization and generative-based rehearsal approaches. Our generative model consists of a normalizing flow (NF), a probabilistic and invertible neural network, trained on the internal embeddings of the network. By keeping a single NF conditioned on the task, we show that our memory overhead remains constant. In addition, exploiting the invertibility of the NF, we propose a simple approach to regularize the network’s embeddings with respect to past tasks. We show that our method performs favorably with respect to state-of-the-art approaches in the literature, with bounded computational power and memory overheads.
@article{pomponi2020pseudo, title = {Pseudo-rehearsal for continual learning with normalizing flows}, author = {Pomponi, Jary and Scardapane, Simone and Uncini, Aurelio}, journal = {arXiv preprint arXiv:2007.02443}, year = {2020} }
DeepRICH: learning deeply Cherenkov detectors

Cristiano Fanelli , and Jary Pomponi

Machine Learning: Science and Technology, 2020

Abs Bib

Imaging Cherenkov detectors are largely used for particle identification (PID) in nuclear and particle physics experiments, where developing fast reconstruction algorithms is becoming of paramount importance to allow for near real time calibration and data quality control, as well as to speed up offline analysis of large amount of data.
@article{fanelli2020deeprich, title = {DeepRICH: learning deeply Cherenkov detectors}, author = {Fanelli, Cristiano and Pomponi, Jary}, journal = {Machine Learning: Science and Technology}, volume = {1}, number = {1}, pages = {015010}, year = {2020}, publisher = {IOP Publishing} }
Efficient continual learning in neural networks with embedding regularization

Jary Pomponi , Simone Scardapane , Vincenzo Lomonaco , and Aurelio Uncini

Neurocomputing, 2020

Abs Bib Code

Continual learning of deep neural networks is a key requirement for scaling them up to more complex applicative scenarios and for achieving real lifelong learning of these architectures. Previous approaches to the problem have considered either the progressive increase in the size of the networks, or have tried to regularize the network behavior to equalize it with respect to previously observed tasks. In the latter case, it is essential to understand what type of information best represents this past behavior. Common techniques include regularizing the past outputs, gradients, or individual weights. In this work, we propose a new, relatively simple and efficient method to perform continual learning by regularizing instead the network internal embeddings. To make the approach scalable, we also propose a dynamic sampling strategy to reduce the memory footprint of the required external storage. We show that our method performs favorably with respect to state-of-the-art approaches in the literature, while requiring significantly less space in memory and computational time. In addition, inspired inspired by to recent works, we evaluate the impact of selecting a more flexible model for the activation functions inside the network, evaluating the impact of catastrophic forgetting on the activation functions themselves.
@article{pomponi2020efficient, title = {Efficient continual learning in neural networks with embedding regularization}, author = {Pomponi, Jary and Scardapane, Simone and Lomonaco, Vincenzo and Uncini, Aurelio}, journal = {Neurocomputing}, volume = {397}, pages = {139--148}, year = {2020}, publisher = {Elsevier}, }