Computer Science Department, University of Crete
HY-590.45. Modern Topics in Scalable Storage Systems

info | readings | syllabus | other resources

Course Staff

Name Email Office Hours
Instructor: Kostas Magoutis hy590-45@csd By appt./E-306
Teaching Assistant: Efthymios Papageorgiou hy590-45@csd By appt.

General Information

The course meets on Tue 12-2pm, Fri 12-2pm in E.313 (3rd floor of CSD building). Exceptionally, in certain weeks we will be meeting on Wed 6-8pm in E.313 instead of Tue 12-2pm (in those occasions, you will be notified upfront).

Announcements

24.3.2025 10:00: Your project proposals are due April 1, 2025

20.2.2025 10:00: Note NEW course meeting times: Tue 12-2pm, Fri 12-2pm (with certain exceptions, see under General Information)

14.2.2024 10:00: To join the HY-590.45 mailing list, send an e-mail to majordomo@csd with body subscribe hy590-45-list

10.2.2024 10:00: We will be using the AWS Academy cloud platform for course assignments, you may find our course page here

12.1.2024 10:00: The course will start on Thursday 13/2

1.1.2024 10:00: You are welcome to get in touch with the instructor to discuss course-related issues

Course Description

The explosive growth of information processing services in recent years has created an unprecedented need for storage capacity. Scalable access to storage resources requires a class of distributed systems designed for fast, reliable, and uninterrupted access to storage media (e.g., magnetic disks and tapes) over high-speed networks. This course offers an introduction to scalable storage systems and examines existing design techniques as well as current research problems in the design and implementation of such systems, along with possible solutions.

Some of the advantages of the scalable storage model over direct-attached storage include expandable capacity and performance, as well as improved utilization and sharing of distributed storage resources. A number of challenges, however, are facing the scalable storage systems architect: First, it is the higher complexity (compared to direct-attached storage) due to the distributed nature of the scalable storage system. Administration, capacity planning, configuration, backup, and disaster recovery are complicated in large-scale scalable storage systems. Second, transferring data over the network requires stronger security and safety guarantees than when transferring them on the system I/O bus. In addition, it sometimes requires new, storage-specific network transport protocols. These and other challenges make scalable storage an exciting research area that has made significant advances in recent years.

The core part of the course focuses on the study of scalable storage systems with special emphasis on architectures, design principles for scalable performance, reliability, and availability, the management of data during their lifecycle, application-specific design concepts, ways to reduce implementation cost, storage system capacity planning, and storage outsourcing services.

This course is targeted for graduate students and advanced undergraduates and requires the undertaking of a research project. The topics of the research projects will be chosen with the help and guidance of the course staff.

Coursework

Prerequisites

Grading

The final grade depends on class participation, presentation of two research papers, and a research project.

Readings

There are a number of paper readings that are available online. You are expected to read the papers before the beginning of each class.

There is no required textbook for this class. The following textbooks, however, are recommended readings:

Syllabus

Date Topic Readings, notes
Thu 13/2 Course overview -
Fri 14/2 Background I -
Thu 20/2 Background II -
Fri 21/2 Background III -
Wed 26/2 6pm Background IV -
Fri 28/2 Class will be rescheduled -
Wed 5/3 6pm Background V -
Fri 7/3 Extending file systems over the network csdp1178, Sandberg: Design and implementation of the Sun Network Filesystem
Wed 12/3 6pm NFS (contd.) Macklem: Not Quite NFS, Soft Cache Consistency for NFS
Fri 14/3 Distributed coordination Ongaro: In Search of an Understandable Consensus Algorithm
Tue 18/3 Raft (contd.) Visualization
Fri 21/3 Distributed virtual disks Petal: Distributed virtual disks
Wed 26/3 6pm Petal (contd.) -
Fri 28/3 Presentations I csdp1318, csdp1368, csdp1394
Tue 1/4 Tutorial on AWS Academy Learner Lab (TA) Project proposals due
Fri 4/4 Presentations I csdp1397, csdp1408, csdp1414
Wed 9/4 6pm Presentations I csd5880, csd5881, csdp1388
Fri 11/4 Presentations I csdp1178, csdp1411, csdp1418
Mon 14/4 - Fri 25/4 Easter recess -
Tue 29/4 Distributed file systems I Thekkath: Frangipani: A Scalable Distributed File System
Fri 2/5 Distributed file systems II Ghemawat: The Google File System
Tue 6/5 Presentations II csdp1318, csdp1368, csdp1394
Tue 9/5 Presentations II csdp1397, csdp1408, csdp1388
Tue 13/5 Presentations II csdp1414, csdp1418
Tue 16/5 Presentations II csd5880, csd5881
Tue 20/5 Application-specific storage systems Saito: Manageability, availability and performance in Porcupine

Projects HOWTO

Please note the following project guidelines:

Other Resources / Useful links