For New Students

We are always looking for new students to join our team.

Is this for You?

If you ever wondered how to design, develop, and deploy an awesome global system, how to understand and analyze its properties, why your experiments show different results on each evaluation, why debugging distributed systems is so difficult, or what it would take to make your computer 100-1,000,000 times faster, then you are a perfect fit with our research group. You are invited to join.

Topics we work on

Datacenter simulation, Serverless computing, Accelerators (GPU/FPGA), Language virtual machines, Virtual reality, Multiplayer games. A full list of our projects can be found here.

How to Join?

Contact join@atlarge-research.com to join us!

Let's start our collaboration! We have for you a diverse selection of bachelor projects, masters thesis projects, stand-alone research projects, and variety of topics for literature surveys. These projects are part of our cutting-edge research on distributed (eco)systems, storage and networking.

What Can We Do Together?

Our general philosophy is that we want to help everybody develop to their true potential, and not waste their talent.

Don't worry if you don't have all the skills at the start of the project. That's what we're here for. We have a strong focus on training students like you to become top-notch computer scientists. For example, our team of experts will help you onboard quickly, give you technical advice, help you formulate new concepts, explore with you design alternatives, etc.

You can make a mark in the scientific world, and at the same time help our research group get real-world impact through a strong scientific and technical contribution. The following publications are the direct result of AtLarge student projects from the last few years:

Thesis Advice

There are a couple of things to take into account when doing a thesis with us. All master and bachelor students, please thoroughly read vu_thesis_template_advice.pdf. There are three things that need to be present in each thesis and are often forgotten:

  1. Please include a declaration of the original work in your thesis (https://vu.nl/en/student/examinations/academic-integrity): Put a statement like "I confirm that this thesis work is my own work, is not copied from any other source (person, Internet, or machine), and has not been submitted elsewhere for assessment." You can make a new subsection in the introduction titled "Plagiarism Declaration". If you copied the content in the thesis from your own past work, clearly put that in this section.
  2. Make an artifact appendix, https://github.com/ctuning/ck-artifact-evaluation/blob/master/wfe/artifact-evaluation/templates/ae.tex -- Please include and fill up this Appendix in your thesis. Put a direct link to the github repo in the abstract. Please see examples of how to do this in our bachelor/master thesis highlights.
  3. Make a subsection in the introduction, "Societal Relevance" and discuss how your work connects to the societal concerns discussed explicitly in the Dutch Computer Systems Manifesto: https://arxiv.org/abs/2206.03259.

Links

Highlights

A small highlight of the work we have supervised.

Master Theses

Testing in Kubernetes: A Use Case Study of JetBrains CodeCanvas
Mewbie: Scale Adjustable Benchmark for Microservice Deployments
Understanding Datacenter Scheduler Programming Abstractions: Reference Architecture Design, Scheduler Analysis, and Cost Quantification
DPFS++: Cloudifying the DPU-Powered File System Virtualization Framework
TropoDB: Design, Implementation and Evaluation of an Optimised KV-Store for NVMe Zoned Namespace Devices
Radice: Data-driven Risk Analysis of Sustainable Cloud Infrastructure using Simulation

Bachelor Theses

ShareBench: Performance Characterization of Distributed Resource-Sharing Mechanisms
Labels, Cards, and Simulation-Based Analysis for Energy Efficiency and Sustainability in Data Centers
PorygonCraft: Improving and Measuring the Scalability of Modifiable Virtual Environments using Dynamic Consistency Units
Evaluating Performance Characteristics of the PMDK Persistent Memory Software Stack

Literature Surveys

Multivocal Survey of the Function Management Layer in the Open-Source Serverless Platforms
Survey of Graph Analysis Applications

Research Projects

Honours Projects

Supervised work

An overview of all supervised work.

Master Theses

Understanding Service Reliability of Large Language Models: An Empirical Characterization on Operator and User Reports
Characterizing The Energy Contribution and Energy-Performance Trade-offs of NVMe SSDs in the Linux Storage Stack
BenchFrame: A Framework for Benchmarking Power Monitoring Tools
Testing in Kubernetes: A Use Case Study of JetBrains CodeCanvas
Mewbie: Scale Adjustable Benchmark for Microservice Deployments
Mitigating Cold Start Latency in Serverless Computing through LLM-Driven Optimization
A Reproducible Energy Benchmarking Framework for Big Data Workloads
Investigating Performance Overhead of Distributed Tracing in Microservices and Serverless Systems
Operational Analysis of OpenAI Services Using Self-Reported Outages and Incidents
Performance Characterization Study of NVMe Storage Over TCP
Exploring the Performance of the io_uring Kernel I/O Interface
Erroneous Kubernetes Object Generation using Structure-aware Fuzzing
Enhancing Graph Processing Efficiency in Kubernetes: Towards Application-Aware Scheduling
End-to-End Power Model for the Compute Continuum
Controless: A serverless control plane for Kubernetes
Memory-Efficient WebAssembly Containers
Real-time Scaphandre Energy Metrics Pipeline Integrated with Escheduler
Exploring the Performance of Kubernetes-Deployed Containers
Edgeless: Design and Implementation of Serverless Computing at the Edge for Performance in Precision Agriculture
'ODAbler': Design and Evaluation of an Operational Data Analytics Framework for Energy-efficient management of Workloads in a Data Centre Simulator OpenDC
Characterization and Modelling of Resource Usage and Energy Consumption in HPC Datacenters by Machine Learning
Data Characterization and Anomaly Detection for HPC Datacenters Using Machine Learning
Understanding Datacenter Scheduler Programming Abstractions: Reference Architecture Design, Scheduler Analysis, and Cost Quantification
msF2FS: Design and Implementation of an NVMe ZNS SSD Optimized F2FS File System
DPFS++: Cloudifying the DPU-Powered File System Virtualization Framework
FrogFishDB - A Timeseries Database for in-order timeseries using the TimeTree datastructure on ash SSDs
PMicroProfile: A Micro-Architecture Aware Persistent Memory Profiling Framework
TropoDB: Design, Implementation and Evaluation of an Optimised KV-Store for NVMe Zoned Namespace Devices
Radice: Data-driven Risk Analysis of Sustainable Cloud Infrastructure using Simulation
Healthor: Heterogeneity-aware Flow Control in DLTs to Increase Performance and Decentralization
Modeling and Simulation of the Google TensorFlow Ecosystem
Stored: A Distributed Immutable Blob Store
The Design and Experimental Use of CReB, a Container Registry Benchmark
A Performance-Based Recommender System for Distributed DNN Training
Capelin: Fast Data-Driven Capacity Planning for Cloud Datacenters
The Design, Productization, and Evaluation of a Serverless Workflow-Management System
Experimental Performance Analysis of Graph Analytics Frameworks
POSUM: A Generic Portfolio Scheduler for MapReduce Workloads
Design and Experimental Evaluation of a System based on Dynamic Conits for Scaling Minecraft-like Environments
Workload Characterization and Modeling, and the Design and Evaluation of Cache Policies for Big Data Storage Workloads in the Cloud
ANANKE: a Q-Learning-Based Portfolio Scheduler for Complex Industrial Workflows
Design and Evaluation of a Portfolio Scheduler for Business-Critical Workloads Hosted in Cloud Datacenters

Bachelor Theses

DataViz: A Business Data Visualization System Using LLMs
Network Simulation for AI in the Digital Continuum
A performance analysis of TC for high speed, scalable data center networks
Kavier: Exploring Performance, Sustainability, and Efficiency of LLM Ecosystems under Inference through Cache-Aware Discrete-Event Simulation
Enhancing Operational Data Synthesis and Predictive Analysis in HPC Clusters Using Large Language Models
Exploring Redis Persistence Modes: Introducing AOFUring, an io_uring AOF extension
Optimizing Metadata Handling with vkFS: A Hybrid Key-Value Store File System leveraging RocksDB
Task-in-Pod Scheduling Support for Kubernetes and Apache Spark Stack
ShareBench: Performance Characterization of Distributed Resource-Sharing Mechanisms
Kubeless: A Novel Architecture for Kubernetes' Control Plane
Embedded Domain Specific Language: A Streamlined Approach for Framework Abstraction
Labels, Cards, and Simulation-Based Analysis for Energy Efficiency and Sustainability in Data Centers
Collection and Analysis of Operational Traces from SURFsara Datacenters
Design and Evaluation of a Cloud operated Storage System for Minecraft-like Games
LEGO, but with Servers: Creating the Building Blocks to Design and Simulate Datacenters
A Trace-Based Validation Study of OpenDC
OpenDC Serverless: Design, Implementation and Evaluation of a FaaS Platform Simulator
PorygonCraft: Improving and Measuring the Scalability of Modifiable Virtual Environments using Dynamic Consistency Units
Evaluating Performance Characteristics of the PMDK Persistent Memory Software Stack
A Systematic Design Space Exploration of Datacenter Schedulers
Nebu: A topology-aware deployment system for reliable virtualized multi-cluster environments

Literature Surveys

A Survey on Energy Aware Offloading Strategies in the Compute Continuum
A Systematic Review of GPU Modelling Approaches in Datacenter Simulators
Testing Container Orchestration Systems: A Literature Review
Configuration Management Systems
Energy Consumption and Optimization Strategies of Cloud-Based Big Data and Machine Learning Applications: Current Trends and Future Directions
A survey on flash storage disaggregation: performance and quality of service considerations
A Survey of Energy Measurement Methodologies for Computer Systems
Survey of Function Offloading and Serverless Functions in the Computing Continuum
Fairness, Isolation, Predictability and Performance Management in NVMe Devices: A Survey
Exploring the Performance-Isolation Trade-off for Isolation Mechanisms
Serverless Computing at the Edge in Precise Agriculture
A Survey of Scheduling Algorithms for the Edge
Literature Study: Timeseries Databases
Persistent Memory File Systems: A Survey
Key-Value Stores on Flash Storage Devices: A Survey
Multivocal Survey of the Function Management Layer in the Open-Source Serverless Platforms
Survey of Graph Analysis Applications
  • Mark Hendrikx,
  • Sebastiaan Meijer,
  • Joeri Van Der Velden,
  • Alexandru Iosup
  • (2013)
Procedural content generation for games: A survey.

Research Projects

Energy Consumption of Heuristic Kubernetes Schedulers

Honours Projects

M3SA: Exploring the Performance and Climate Impact of Datacenters by Multi-Model Simulation and Analysis
You do not need fast NVMes for MVEs. PokeSto: A Storage Benchmark for Modifiable Virtual Environments