Xiaoyu Chu

Ph.D. student, Vrije University Amsterdam

Research Focus
System Modeling and Analysis, Operational Data Analytics, Datacenter Ontology, LLMxHPC

Contact

Links

Publications
Talks

Publications
An Empirical Characterization of Outages and Incidents in Public Services for Large Language Models
16th ACM/SPEC International Conference on Performance Engineering (ICPE'25)
PDF
Generic and ML Workloads in an HPC Datacenter: Node Energy, Job Failures, and Node-Job Analysis
The 30th International Conference on Parallel and Distributed Systems (ICPADS 2024)
PDF
Enabling Operational Data Analytics for Datacenters through Ontologies, Monitoring, and Simulation-based Prediction
The Second Workshop on Serverless, Extreme-Scale, and Sustainable Graph Processing Systems (GraphSys 2024)
PDF
How Do ML Jobs Fail in Datacenters? Analysis of a Long-Term Dataset from an HPC Cluster
The 6th Workshop on Hot Topics in Cloud Computing Performance (HotCloudPerf 2023)
PDF

Supervised Master Theses
Operational Analysis of OpenAI Services Using Self-Reported Outages and Incidents
'ODAbler': Design and Evaluation of an Operational Data Analytics Framework for Energy-efficient management of Workloads in a Data Centre Simulator OpenDC
Characterization and Modelling of Resource Usage and Energy Consumption in HPC Datacenters by Machine Learning
Data Characterization and Anomaly Detection for HPC Datacenters Using Machine Learning

Supervised Bachelor Theses
Enhancing Operational Data Synthesis and Predictive Analysis in HPC Clusters Using Large Language Models

Talks
Generic and ML Workloads in an HPC Datacenter: Node Energy, Job Failures, and Node-Job Analysis
ICPADS'24, Belgrade, Serbia, October 12, 2024
How Do ML Jobs Fail in Datacenters? Analysis of a Long-Term Dataset from an HPC Cluster
HotCloudPerf'23, Coimbra, Portugal, April 16, 2023