Current: Data Science and Systems

Program Requirements

All EECS MEng students should expect to complete four (4) technical courses within the EECS department at the graduate level, the Fung Institute's engineering leadership curriculum, as well as a capstone project that will be hosted by the EECS department. You must select a project from the list here.

Capstone Design Experience

Data Science in the World (advisor Prof. David Culler)

Description -  The Data Science in the World capstone experience is a combination of an introductory course (Data 200A: Principles and Techniques of Data Science), coupled with an advanced Capstone Project class in the Spring term. The Capstone class will focus on research explorations into topics centered on current State and National big data efforts, such as the National Transportation Data Challenge, which seeks to improve transportation safety, the California Water Data Challenge, which leverages open source technology and available water information to support decisions around water reliability and resource sustainability, the American Energy Data Challenge, which leverages green button data for improved energy systems, the U.S. Obesity Data Challenge, and the Smart Cities Innovation Challenge. Students will work in small teams focused on a topic in a major data challenge, survey the data resources and state of the art, formulate a specific issue they seek to address, engage with researchers in the field, and conduct an independent investigation on the topic resulting in a portfolio report.

2018-19 Capstone Projects

For the capstone projects for Master of Engineering in Electrical Engineering and Computer Science (EECS) our department believes that our students are going to have a significantly better experience if the projects are followed closely by an EECS professor throughout the academic year. To ensure this, we have asked the faculty in each area for which the Master of Engineering is offered in our department to formulate one or more project ideas that the incoming students will have to choose from.

Project 1

Title - Intelligent Collaborative Radio Networks (advisor Profs. Anant Sahai and John Wawrzynek)

Description - The next generation of radio systems are going to be agile, intelligent, self-configuring, and collaborative. This project is in conjunction with the DARPA Spectrum Challenge and we will have multiple teams working on different aspects of a new software-defined radio system featuring collaborative intelligence. Different backgrounds are welcome for team members --- ranging from a FPGA-targeted digital design to networking to signal processing to human/computer interaction to machine learning and game theory. 

Project 2

Title - Impact of App Useage in Large Scale Mobility (advisor Prof. Alex Bayen)

Description - The goal of this project is tu use large scale traffic data (GPS from smartphones, CDRs from cellular access providers, video data, loop data and other sources of data), at scale, to infer mobility patterns in large scale cities. With classical machine learning and optimization techniques, the team will provide analysis of the impact of app useage on congestion. The team will analyze how better routing decisions can be made to improve congestion. The analysis will rely on game theoretic analysis of selfish behavior and convex optimization. The team will have the opportunity to work with two frameworks, Flow (a platform unifying rrlab and SUMO on AWS), and the Connected Corridors, which enables one to run the microsimulator Aimsun on AWS as well. 

Project 3

Title - AI and Blockchain (advisor Prof. Dawn Song)

Description - Blockchains and smart contracts have created new exciting opportunities for building highly resilient applications with integrity, liveness, and correctness guarantees. The goal of this project is to apply secure computation and blockchains to train machine learning models on large-scale data sets, while preserving data privacy. By running training and prediction on a blockchain, we can develop private smart contracts that continually learn over time. The project will entail writing private smart contracts that receive one or more datasets and train models in a secure setting. Through thecourse of this project, we will be using blockchains (e.g. Ethereum), trusted hardware (e.g. Intel SGX), and cryptographic protocols (e.g. secure multi-party computation). We will also explore other directions and questions at the intersection of AI and Blockchain such as using machine learning techniques for improving blockchain technologies.

Project 4

Title - Understanding the Spread of Misinformation and Fake News (advisor Prof. Gireeja Ranade)

Description - While misinformation and rumor-spreading are age-old problems, technology and social media have made it easy for malicious agents to spread propaganda with minimal overhead. Such spread of misinformation can be detrimental to society, and its negative effects have been seen worldwide. This project will explore the spread of misinformation and fact-checking behavior online through measurement and modeling. Students from different backgrounds could contribute to this project, e.g. web-data and social media analysis as well as more theoretical interests in game theory and networks.

Project 5

Title - Design of Neural Nets for Embedded, Mobile, and IOT Applications (advisors Prof. Kurt Keutzer)

Description - Students will choose a target application in embedded (e.g. autonomous vehicle), IOT (surveillance), or mobile (e.g. iPhone application) domain and develop a deep neural net that is tailored to that application.

Project 6

Title - Second Order Optimization for Neural Network Learning (advisor Prof. Michael Mahoney)

Description -The vast majority of work in machine learning and deep learning on optimizing models has focused on so-called first order methods that only use gradient information. This is despite the fact that employing the curvature information, e.g., in the form of the Hessian, can help with obtaining methods with desirable convergence properties for non-convex problems, e.g., avoiding saddle-points and convergence to local minima. The conventional wisdom in machine learning is that the application of these second-order methods that employ Hessian as well as gradient information can be highly inefficient. This project aims to implement and apply recent theoretical advances in second order algorithms to state-of-the art neural network problems. The goals are to demonstrate that these methods are competitive with state-of-the-art first-order methods and that they can overcome shortcomings of first-order methods, e.g., high sensitivity to hyper-parameters such as step-size and undesirable behavior near saddle-points.

Project 7

Title - Artificial Intelligence for Data Science (advisor Prof. Dawn Song)

Description - More and more data is being collected in all areas ranging from business activities, smart homes, smart buildings, to smart cities, with the promise to help improve decision making and efficiency. However, data analytics today is still a labor-intensive process, requiring significant manual effort at almost every stage of the data science pipeline. As a result, huge volumes of collected data goes unutilized due to the lack of analyst resources. Can we make the data science pipeline more automated and reduce the mundane manual labor needed? Can we help analysts be more productive and help automatically extract insights from data?

In this project you will help explore new approaches for automated data exploration, model building, and insight extraction, while leveraging limited guidance and feedback from human analysts. The project will employ various techniques including deep learning, reinforcement learning, program synthesis, meta learning, probabilistic programming, and interpretable machine learning. We aim to build a real-world system to be used by end users. Thus, you can also get an experience in learning how to build a real working system.

Project 8

Title - Scalable Deep Learning and Reinforcement Learning (advisor Prof. John Canny)

Description - This project is about scaling deep learning and reinforcement learning on clusters of computers. Current-generation distributed learning systems uses shared parameter stores which are often a bottleneck when optimizing complex models. They also have poor error-tolerance and a limited repertoire of distributed updates. Next generation systems use "shared nothing" design which optimizes throughput, provides cheap error tolerance, and supports a much more general set of distributed updates. This design opens the door to richer and more efficient distributed optimization strategies, including Monte Carlo and tree search. This project is designed for a team of four students to extend our current work on shared-nothing distributed ML focusing on error tolerance and distributed Monte-Carlo. We also hope to establish new performance benchmarks for a number of deep learning and reinforcement learning problems. The project includes regular collaboration with one or more industry partners.

Project 9

Title - Visualizing Machine Learning(advisor Prof. John Canny)

Description - Machine Learning (ML) is the method of first resort for many challenges in computing and data science. But there is often a gap between users conceptual models of ML system behavior and reality. This is particularly acute for deep neural models (DL), whose structure often has no relation to actual or perceived structure in the problem domain. This project is for a team of four students to extend our work on *interactive machine learning tools* and their application in ML and DL. Interactive modeling allows users to manipulate models in real-time, in particular to see the effects of hyperparameter choices on models as they are being training. This exploration helps users gain an understanding of the optimization process, and allows them to better align model performance with their needs. It also supports "visual explanations" where under appropriate conditions, output behaviors can be attributed to patterns of activity in the network. So far visual explanations have been applied to convolution networks working on images. In future we would like to extend the approach to more general networks. 

Technical Courses

At least three of your four technical courses should be chosen from the list below. The remaining technical courses should be chosen from your own or another MEng area of concentration within the EECS Department.

Fall 2018

Spring 2019

  • CS C200A, Principles & Techniques of Data Science
  • CS 260A, User Interface Design and Development 
  • CS 267, Parallel Computing
  • CS 282A, Designing, Visualizing and Understanding Deep Neural Networks
  • CS 289A, Introduction to Machine Learning
  • CS 294-144, Blockchain, Cryptoeconomics, and the Future of Technology
  • EECS 227AT, Optimization Models in Engineering
  • EE C227C, Convex Optimization and Approximation

    Note: The courses listed here are not guaranteed to be offered, and the course schedule may change without notice. Refer to the UC Berkeley Course Schedule for further enrollment information.