Scaling the Future: Master Thesis Defense on Scalability in Simulation Environments for Distributed Cyber-Physical Systems

Today, we celebrate that Herman Kelder successfully defended his master thesis “Scalability in System-Level Simulation Environments for Distributed Cyber-Physical Systems“. This work was carried out in the context of the DSE2.0 project, where we address the complex scientific challenges involved in performing design-space exploration (DSE) for complex distributed cyber-physical systems (dCPS), such as lithography machines. Three key challenges in this context are: 1) automatically modelling the application and platform based on data from the running system, 2) scalable search and pruning algorithms that help navigate large design spaces efficiently, and 3) scalable simulation environments that allow many design points to be efficiently evaluated concurrently.

Herman’s thesis addresses the last of these three challenges. To facilitate scalable and efficient DSE for dCPS, an evaluation environment is proposed, implemented, and evaluated. The research considers key design considerations for developing a distributed evaluation workflow that can dynamically be adapted to enable efficient and scalable exploration of the vast design space of complex, distributed cyber-physical systems. Evaluation of the proposed environment employs a set of system models, representing design points within a DSE process, to assess the solution and its behavior, performance, capability, and applicability in addressing the scalability challenge in the context of DSE for dCPS. During the evaluation, the performance and behavior are investigated in three areas: (i) Simulation Campaign, (ii) Task Management Configuration, and (iii) Parallel Discrete-Event Simulation (PDES). Throughout the evaluation, it is demonstrated that the proposed environment is capable of providing scalable and efficient evaluation of design points in the context of DSE for dCPS. Furthermore, the proposed solution enables designers and researchers to tailor it to their environment through dynamic complex workflows and interactions, workload-level and task-level parallelism, and simulator and compute environment agnosticism.

Herman executed his project meticulously and delivered excellent research results, both in terms of concepts and implementation. Thank you very much for your contributions Herman and we hope to work with you again at some point.

Keynote Address Explores Performance Engineering in Cloud-Connected Cyber-Physical Systems

I had the honor of being invited as keynote speaker at RT-Cloud 2023. The keynote discussed the increasing complexity of cyber-physical systems (CPS) in the Dutch high-tech systems industry and a gradual transition towards microservice architectures and cloud-connected systems. This transition challenges our experience with performance engineering in the CPS domain, as we adapt our methods to embrace new tools and technologies. To make the presentation concrete, I discussed two projects that I am currently working on, a project on performance verification of microservice architectures together with Thales, and a project about performance engineering and service continuity in the compute continuum, together with Philips and TU/e and other TRANSACT partners. I would like to thank Johan Eker and Luca Abeni for the invitation and all participants for their attention and questions.

Advancing Sustainability: Paper Accepted on Estimating Energy Consumption of Applications in the Computing Continuum

I am happy to please that the paper “Estimating the Energy Consumption of Applications in the Computing Continuum with iFogSim” was accepted at the International Workshop on Converged Computing (WOCC). The paper is first-authored by Saaedeh Baneshi and is the first publication to come out of the project Energy Labels for Digital Services. Congratulations Saaedeh!

The paper explains how digital services – applications that often span the entire computing continuum – have become an essential part of our daily lives, but they can have a significant energy cost, raising sustainability concerns. Measuring the energy consumption of such applications is challenging due to the distributed nature of the system and the application. As such, simulation techniques are promising solutions to estimate energy consumption, and several simulators are available for modeling the cloud and fog computing environment. The paper investigates iFogSim’s effectiveness in analyzing the end-to-end energy consumption of applications in the computing continuum through two case studies. We design different scenarios for each case study to map application modules to devices along the continuum, including the Edge-Cloud collaboration architecture, and compare them with the two placement policies native to iFogSim: Cloud-only and Edge-ward policies. We observe iFogSim’s limitations in reporting energy consumption, and improve its ability to report energy consumption from an application’s perspective; this enables additional insight into an application’s energy consumption, thus enhancing the usability of iFogSim in evaluating the end-to-end energy consumption of digital services.

Inaugural Lecture Explores Managing Complexity of High-Tech Systems

Today, I finally gave my inaugural lecture “Managing Complexity in High-tech Systems” to celebrate my appointment as Endowed Professor at the University of Amsterdam, which happened back in 2019.

The academic ceremony started at 16:00 with a small reception for fellow professors and members of the curatorium. Together, this group walked in a procession into the beautiful auditorium of the University of Amsterdam, where an audience of colleagues, family, and friends, where waiting in anticipation. The lecture discussed the challenge of increasing complexity in the high-tech equipment industry and how new (model-based) development methodologies leveraging abstraction, boundedness, and composition, are required to address it. I argued that the required innovation should come from collaboration in an innovation chain, where universities, applied research organizations, and industry work together in strategic partnerships. The presentation was concluded with a number of concrete examples of what this collaboration could look like, based on examples from my education and research at TNO and the University of Amsterdam. The inaugural lecture was followed by a reception full of networking and congratulations. I would like to thank everybody that showed up for the event, physically and online. Together, we created a memory that I will treasure for a lifetime.

If you did not manage to attend the lecture, or see it online, there is a recording available. Pop some popcorn, take a seat, and click the link below:

https://webcolleges.uva.nl/Mediasite/Play/99497b81432a49acb57f0ae7a32050d11d

Optimizing Efficiency and Performance: PhD Thesis Defense on Energy- and Time-aware Scheduling for High-Performance Embedded Systems

Yesterday, I participated in the PhD defense committee of Julius Röder, a PhD student in the Parallel Computing Systems group at the University of Amsterdam. The thesis “Energy- and Time-aware Scheduling for Heterogeneous High-Performance Embedded Systems” addresses the relevant problem of optimizing non-functional behavior, such as timing and energy consumption, of heterogeneous high-performance embedded systems. The goal of this optimization Is to reduce energy consumption, thereby also reducing carbon footprint and extending battery-life, as well as ensuring that real-time requirements of applications are satisfied, even at high resource utilizations. To this end, the thesis contributes a discussion on setups used for energy measurements, as well as experiments and a statistical analysis that demonstrate the Importance of sampling frequency on the accuracy of such measurements. The bulk of the thesis proposes heuristic algorithms, both conventional and based on reinforcement learning, for mapping and scheduling applications modelled as directed acyclic graphs (DAG) on heterogeneous platforms. The applications are assumed to be available In different versions, with different non-functional behavior, for the different types of processing elements In the heterogeneous architecture, which enables trade-offs between timing and energy. A key strength of the thesis is that theory is combined with a practical component; the scheduling algorithms are implemented and evaluated on a heterogeneous multi-core systems, where timing and energy behavior are carefully measured and analyzed.

In presence of family, friends, and colleagues, Julius confidently defended his PhD thesis and earned the right to call himself a doctor. Congratulations Julius with this great achievement!

Advancing Design Space Exploration: Literature Review Explores Network Delay Models for Distributed Cyber-Physical Systems

Another literature review has been completed in the context of the DSE2.0 research project. William Ford completed his review entitled “Network Delay Model Creation and Validation for Design Space Exploration of Distributed Cyber-Physical Systems“.

Design-space exploration (DSE) in early phases of design of a distributed cyber-physical system (dCPS) requires models. In the DSE2.0 project, we are particularly interested in models that capture the timing behavior of hardware and software, allowing temporal system performance to be evaluated for different design points. One important part of the system to model is the network that connects the subsystems of the CPS. This study reviews previous work in the fields of analytical network modeling, network simulation, and network model validation. In addition, a recommended plan is presented to create and validate such a network model for the DSE2.0 project, based on this previous work. Two main directions are recommended at different levels of abstraction. For the lower level of abstraction, we will make a model using the existing INET framework that models each network element explicitly. At a higher level of abstraction, we will use a latency-rate server to capture the behavior of the network using only two parameters, latency and rate.

Having delivered his literature review. William has started his master project to pursue this research along these directions. The team looks forward to working with him.

Driving Innovation and Collaboration: Dutch Real-time Days Event Sparks Ideas for Future Research and Industry Relevance

I recently co-organized a Dutch Real-time Days event together with real-time systems researchers from TU/e and UT. The event was funded through a 4TU.NIRICT Call Community Funding and its goals were to:

1) share and develop new ideas for real-time systems research,
2) stimulate new collaborations, and
3) networking.

In addition to the four organizers from the Netherlands, Mitra Nasri (TU/e), Geoffrey Nelissen (TU/e), Kuan-Hsun Chen (UT), and myself, four well-established European researchers in the area of real-time systems were invited to the event. Everybody was invited to pitch their current work, ideas for future directions, and appropriate mechanisms to support collaborations. This was followed by brainstorming sessions were these ideas were creatively improved, as well as a working session where some of the ideas were discussed in more detail and made actionable. At the end of the first day, there was a lovely dinner at Restaurant Giornale in Eindhoven, providing further room for discussions and networking.

The outcome of the two days was a mix of technical ideas that can be pursued in future research papers or project proposals, and actions to shape direction of the academic real-time systems community and further increase its industrial relevance. For example, we agreed to propose that the Technical Community on Real-time Systems (TCRTS) adds an award for industry impact/technology transfer and propose a special issue on industry challenges/visions in the Journal of Real-time Systems.

TNO-ESI Cloud Continuum Workshop Connects Researchers and Promotes Collaboration in the Netherlands

The TNO-ESI Cloud Continuum workshop, an informal hybrid event that attracted just over twenty participants, took place at ESI on February 21. The goals of this workshop were to: 1) connect applied and academic researchers in the area of cloud continuum in the Netherlands, 2) disseminate research results from ongoing research projects, and 3) identify possibilities for collaboration. Benny Akesson, the organizer of the event, opened the workshop by presenting some drivers for cloud adoption/integration in the high-tech industry, as well as the work done by ESI in the ArchViews and TRANSACT projects related to performance observability. This was followed by four invited speakers from Eindhoven University of Technology and Vrije Universiteit Amsterdam. The topics of the presentations ranged from reference architectures for the cloud continuum, root-cause analysis in the continuum, modelling and calibration of cyber-physical systems deployed in the continuum, to performance variability of cloud/edge systems. All in all, it was a nice and successful event that showcased parts of the body of work currently going on in this exciting area. Thank you Matthijs Jansen, Jeroen Voeten, Mahtab Modaber, and Panagiotis Giannakopoulos for your presentations.

Ensuring Safety, Performance, and Security in Cloud-Enabled CPS: Accepted Paper Presents Thirteen Concepts at IEEE SysCon 2023

Our paper entitled “Thirteen Concepts to Play it Safe with the Cloud” has been accepted at IEEE International Systems Conference (SysCon), that will take place in Vancouver, Canada on April 17-20, 2023. The paper discusses how edge and cloud technologies has the potential to enhance safety-critical CPS, also in regulated environments. This is only possible when safety, performance, cyber security, and privacy of data are kept at the same level as in on-device only safety-critical CPS. To this end, this paper presents thirteen selected safety and performance concepts for distributed device-edge-cloud CPS solutions. This early result of the TRANSACT project aims to ensure needed end-to-end performance and safety levels from an end-user perspective, to extend edge and cloud benefits of more rapid innovation and inclusion of value-added services, also to safety-critical CPS.

Literature Review on Scalable System-level Simulation

Herman Kelder has joined the DSE2.0 research project as a master student. DSE2.0 is a project that aims to propose a methodology for design-space exploration of complex distributed cyber-physical systems, like lithography machines manufactured by ASML. One of the great challenges is to improve the scalability to handle the complexity of such systems, a challenge that needs to be addressed both in terms of how the system (performance) is modelled and evaluated (simulated) for a particular design point, as well as how design points to evaluate is being chosen. Hermans thesis will focus on how to improve the scalability of system-level simulation to allow more design points to be evaluated faster.

One of Herman’s first assignments was to put together a literature review on this topic. The literature review, entitled “Exploring Scalability in System-Level Simulation Environments for Distributed Cyber-Physical Systems“, investigates state-of-the-art scalability techniques for system-level simulation environments, i.e. Simulation Campaigns, Parallel Discrete Event Simulations (PDES), and Hardware Accelerators. The goal is to address the challenge of scalable Design Space Exploration (DSE) for dCPS, discussing such approaches’ characteristics, applications, advantages, and limitations. The conclusion recommends starting with simulation campaigns as those provide increased throughput, adapt to the number of tasks and resources, and are already implemented by many state-of-the-art simulators. Nevertheless, further research has to be conducted to define, implement, and test a sophisticated general workflow addressing the diverse sub-challenges of scaling system-level simulation environments for the exploration of industrial-size distributed Cyber-Physical Systems.

We look forward to working with Herman and seeing how his research develops along these directions.