New course aims to train the next generation of data scientists

Computer science professor Mitsunori Ogihara and graduate student Jerry Bonnell designed and taught a pilot course in data science last fall that aims to address a growing lack of expertise in the field.

The volume of data collected today is growing astronomicallybecause our computers, phones and smart devices track our every move, purchase and desire.

Yet the number of people who can sift through this data to find useful information remains a meager percentage of the workforce.

The need for more data analysts and data scientists simply outstrips the supply. Thus, colleges and universities must help bridge the growing knowledge gap.

That’s the crux of a forthcoming article in the Institute of Electrical and Electronics Engineers Computer Journal, written by University of Miami computer science professors Yelena Yesha and Mitsunori Ogihara and graduate student Jerry Bonnell. . This was the impetus for the Master of Science in Data Science program, now offered by the Graduate School. And that’s why Ogihara and Bonnell designed and piloted the “Data Science for the World” undergraduate course in the Fall 2021 semester for students interested in the field. It will also be offered next fall.

“Many disciplines today, including the sciences, medicine, social sciences and even the humanities, use data for discoveries or the exploration of ideas,” Ogihara said. “Thus, a student in any reasonable undergraduate program today should have some exposure to data science.”

Nick Tsinoremas, founding director of the Institute for Data Science and Computing, and the University’s Vice Provost for Computing and Data Research, agreed.

“We want all students at the University to have more data science training and be more data conscious because it’s our future,” he said. “Making decisions in general today requires being data-aware. So this course is part of our effort as a university to expose our undergraduate students to data science.

This comes at a time when many colleges and universities are trying to educate students in the language of data. However, unlike other Data Science 101 courses, Ogihara and Bonnell have tailored theirs so that students with little or no knowledge of statistics or computer programming can still benefit.

“We tried to make it accessible, so we don’t assume students have a background in math, programming, or statistics,” Ogihara said.

The two even wrote an online textbook for the course, which opens with a list of real-world examples of data science in practice. These include that monitoring patient data can help doctors more accurately diagnose diseases, and tracking social media posts can help data scientists explain a shift in public opinion. . The resource is being edited for publication and is unique because the manual uses “R”, the programming language preferred by many statisticians.

Ogihara and Bonnell chose to use R because it is suitable for statistical analysis, and by incorporating an increasingly popular collection of tools into R called tidyverse, students can easily learn how to process, confuse, transform and model data. by them selves, too much.

“From start to finish, students were touching real data with their assignments,” Bonnell said. “So they could always see the big picture and know they were doing something important.”

For example, the 20 students in the class investigated the 2015 accusation that the New England Patriots deflated footballs during the AFC Championship game because it was easier for the quarterback Tom Brady to throw them in the cold. They tested whether the average bullet pressure drop was due to chance and concluded that it was plausible that the observed pressure drops were due to a reason other than chance. It was one of freshman Eddie Hanlon’s favorite assignments. But he also enjoyed testing whether murder rates are affected by a state’s policy on the death penalty, another task they completed in the class.

“We were able to conclude that when the death penalty is applied, the murder rate goes down,” said Hanlon, a finance student with a computer science minor.

Hanlon said he has always been interested in numbers, but the course helped him learn computer programming which can deepen his analysis. It also taught him some new statistical strategies.

“I’ve never done any programming and had no experience with R software,” he said. “But by the end of the semester, I felt pretty proficient in R.”

He was so motivated by the course that Hanlon spent part of his winter break learning Python, another programming language also widely used by data scientists.

“Initially, I was not interested in data science. I just didn’t know enough about the pitch, he said. “But I am now. Data science is extremely applicable in many different fields, so I really see it as a career opportunity. »

Caroline Hall, a senior student and math major, took the course to improve her R skills for future job opportunities. She had already learned the Java programming language. But the course helped Hall feel so comfortable with R that it has allowed him to learn two other tools since then.SQL and Tableau, which help transform and visualize datasets.

“I now feel confident that I can transform datasets, which means organizing the data and extracting the most useful insights from it,” said Hall, who also has minors in computer science and psychology.

The class also piqued his interest in a career in data science.

“I want to start as a data analyst. But I know they work with data scientists, so I might want to move on to that,” she said.

Comments are closed.