Text
Carlo Vitucci's Licenciate Defense
2024/04/15
Carlo Vitucci, PhD Student of ARRAY++ defended successfully his Licenciate on the 18th April 2024 at 13:15
Title: “The role of fault management in the embedded system design’’
Date and time: April 18th, 2024, at 13:15
Place: Milos, Mälardalen University, Västerås
Faculty Examiner: Assoc. Prof. Fredrik Asplund, Royal Institute of Technology (Sweden)
Grading Committee: Adj. Prof. Kristina Forsberg, Saab (Sweden) and Prof. Zebo Peng, Linköping University (Sweden)
Reserve Committee Member: Prof. Kristina Lundqvist, Mälardalen University (Sweden).
Advisors: Daniel Sundmark, Thomas Nolte and Marcus Jägemar
Abstract
In the last decade, the world of telecommunications has seen the value of services definitively affirmed and the loss of the connectivity value. This change of pace in the use of the network (and available hardware resources) has led to continuous, unlimited growth in data traffic, increased incomes for service providers, and a constant erosion of operators’ incomes for voice and Short Message Service (SMS) traffic. The change in mobile service consumption is evident to operators. The market today is in the hands of over the top (OTT) media content delivery companies (Google, Meta, Netflix, Amazon, etc.), and The fifth generation of mobile networks (5G), the latest generation of mobile architecture, is nothing other than how operators can invest in system infrastructure to participate in the prosperous service business.
With the advent of 5G, the worlds of cloud and telecommunications have found their meeting point, paving the way for new infrastructures and ser- vices, such as smart cities, industry 4.0, industry 5.0, and Augmented Reality (AR)/Virtual Reality (VR). People, infrastructures, and devices are connected to provide services that we even struggle to imagine today, but a highly intercon- nected system requires high levels of reliability and resilience. Hardware reliability has increased since the 1990s. However, it is equally correct to mention that the introduction of new technologies in the nanometer domain and the growing complexity of on-chip systems have made fault man- agement critical to guarantee the quality of the service offered to the customer and the sustainability of the network infrastructure.
In this thesis, our first contribution is a review of the fault management implementation framework for the radio access network domain. Our approach introduces a holistic vision in fault management where there is increasingly more significant attention to the recovery action, the crucial target of the proposed framework. A new contribution underlines the attention toward the recovery target: we revisited the taxonomy of faults in mobile systems to enhance the result of the recovery action, which, in our opinion, must be propagated between the different layers of an embedded system ( hardware, firmware, middleware, and software). The practical adoption of the new framework and the new taxonomy allowed us to make a unique contribution to the thesis: the proposal of a new algorithm for managing system memory errors, both temporary (soft) and permanent (hard). The holistic vision of error management we introduced in this thesis involves hardware that proactively manages faults. An efficient implementation of fault management is only possible if the hardware design considers error-handling techniques and methodologies. Another contribution of this thesis is the def- inition of the fault management requirements for the RAN embedded system hardware design.
Another primary function of the proposed fault management framework is fault prediction. Recognizing error patterns means allowing the system to react in time, even before the error condition occurs, or identifying the topology of the error to implement more targeted and, therefore, more efficient recovery actions. The operating temperature is always a critical characteristic of embedded radio access network systems. Base stations must be able to work in very different temperature conditions. However, the working temperature also directly affects the probability of error for the system. In this thesis, we have also contributed in terms of a machine-learning algorithm for predicting the working temperature of base stations in radio access networks — a first step towards a more sophisticated implementation of error prevention and prediction.
The pdf of the thesis is available at: http://www.diva-portal.org/smash/get/diva2:1843504/FULLTEXT02.pdf External link.