Project 5

Explore the unknown!

Your goal with this project is to pick a data system that we won’t explore in class projects, and evaluate its usability and performance on a dataset and queries of your choosing. You will be doing this project in groups of two or three.

Some examples of data systems you could use include Graph databases (neo4j), Streaming databases (Flink, Kafka), key-value store, Spark, Velox, Polars, Apache Data Fusion, Ibis, NumPy, DuckDB. Be creative here! If you need help with picking a data system, or other feedback in general, feel free to make a private post.

Your project will be due April 17 at 5pm. We expect you to write up a short report of up to 3 pages on:

  • the system you chose and why you chose it
  • how you set it up
  • what dataset you used and what was the query workload
  • include screenshots of the system results on at least one of the queries
  • an analysis of the usability and performance of the system for the query workload
    • when do you think the system is useful? when would you use it and when would you urge others to use it?
    • (bonus) include when the system would be a good fit compared to other systems we’ve looked at in class, such as Postgres (or other relational databases), pandas, MongoDB.