DigiPen’s BS in Computer Science in Machine Learning program may be new, however, our experience in the subject is anything but. Machine learning has been a central component in many of DigiPen’s Research & Development projects for quite some time, and as faculty have been putting together the framework for the new degree program, real-world lessons informed by those past projects have found their way into the curriculum.
One such lesson stemmed from a research-based project developed for Andretti Autosport, which uses machine learning to predict, in real-time, which strategies will have optimal results during IndyCar races. “The strategy in IndyCar is knowing when to go to the pit, which tires to use — hard or soft — and how much fuel to put in the car,” says DigiPen R&D lead software engineer and computer science professor Antoine Abi Chacra. Managers would be able to input strategies into the DigiPen-developed application and quickly see how those strategies would play out for their driver based on real-time data such as the car’s current performance and position, the driver’s performance and behavior in both past and present races, and a wide host of other factors.
Abi Chacra says one of the most important lessons the R&D team learned from the project is that data is never clean when you receive it. Before machine learning algorithms can interpret and make predictions about data, first it’s important to scrub the data and weed out anomalies. “We had to do something like that with Andretti,” Abi Chacra says. “You’d get lap times that were way off. Sometimes they would go to the pit and the flag that notifies everyone that this car is in the pit wouldn’t go up for some reason, so now we have one lap at 41 seconds, another at 40 seconds, and then one lap that took 20 minutes. Your machine learning algorithm should be able to automatically analyze the data using a technique called clustering to detect and remove anomalies like that.”
Abi Chacra is bringing the real-world IndyCar example to students directly, giving them an opportunity to work with the same exact data the R&D team used. “For assignment one, I clean all the data for them, give them the lap times of every driver for 200 laps, and tell them to use the first 50 percent of the data to try and predict the last 50 percent,” Abi Chacra says. “But as the final project, I give them all the data, which has all the garbage in it, and now they have to use multiple machine learning algorithms to clean the data and solve the problem, just like you would in the real world.”
In another past project, the DigiPen R&D team was tasked with creating a machine learning application to evaluate the effectiveness of security methods for large crowds and events in order to find the correct balance between the strictness of the threat evaluation and line expediency. “If the checkpoint security is too tight and takes a long time, then you get long lines,” DigiPen Chief Technology Officer Samir Abou Samra says. “If it’s too loose, you might have shorter lines, but it’s not secure. The machine learning component of the project was part of finding the sweet spot between the two.”
According to Abou Samra, the major takeaway from the project had less to do with figuring out how to churn the data — it was in communicating with the clients about what the data should even be.
“Our job is to decide, for situation x, what is the best mathematical function that will approximate that situation, translating an analogue situation into math?” Abou Samra says. “And that means, you have an input and an output — it’s an equation. But when you’re dealing with security subject matter experts, they’re not computer scientists or mathematicians. Their job is to go, ‘Hmm. That guy. Flag him.’ But if you ask them why or how they determine that, they don’t tell you, ‘It’s because of this, this, and this factor, and then I compute the data and if it comes out to a score above a certain number then I flag them.’ No. They say, ‘It’s based on my experience. That person just seemed agitated.’”
While much of machine learning is about utilizing and combining the correct algorithms to make predictions, figuring out what data is relevant to input in the first place, is often the very first step.
“Usually, a computer scientist won’t come to you to solve a computational problem. It’s usually someone from a completely different area, and it’s a really good experience to learn how to bridge the gap and find that midpoint between yourself and the subject matter expert. This is something we’re teaching in the machine learning program,” Abou Samra says. “We try to understand how the subject matter expert or client thinks, so we’re then in a better position to ask questions and to lead them towards explaining what kind of logic we want the machine learning to take, and what those inputs are.”
The first cohort of the BS in Computer Science in Machine Learning degree program begins this fall.