Quantitatively and Qualitatively, Data+ and Its Affiliated Programs Are Big Hits

Quantitatively and Qualitatively, Data+ and Its Affiliated Programs Are Big Hits

Of all the things that make college students anxious, now you can add ghost cars to the list.

Not haunted, unoccupied moving vehicles, Flying Dutchman style. “Ghost cars” is a term Duke Parking & Transportation (DPT) uses to define cars that enter or leave parking lots when the gates are up, like during a football game or evening event. The gate sensors don’t record them both entering and exiting, which causes problems in keeping an accurate count of the cars using a lot.

A few summers ago, DPT asked a group of students participating in the Data+ program to help predict hourly occupation in parking facilities so that permit holders could be proactively redirected if a lot was full. Team members analyzed data collected from gate sensors and were flummoxed by results showing negative numbers of cars in certain lots. They presented this finding to DPT, which responded, Yeah, sure. Ghost cars.

A student speaking to a room of politicians
A Data+ team presents their findings to the Durham City Council.

Once that was cleared up, it was a simple matter of resetting the counters overnight to correct for the spectral Subarus and phantom Priuses. But at the end of the summer with the project winding down, DPT got anxious: the students’ work was coming along so well, DPT wanted to keep them longer so the students could work on other parking projects. This quandary was also resolved quickly when they hired the team for work study during the school year. Eventually, DPT realized an ongoing need for this type of work and created a data analytics position shared with the Duke Office of Information Technology — a need illuminated by the work of undergraduate students.

Stories like this are plentiful in Data+ lore. When the 10-week summer research program for Duke undergraduates began in 2015 it was apparent immediately that students learned valuable real-world lessons, and that campus, governmental and industry partners benefited greatly from the students’ work. Soon, it was clear that Data+ was also beneficial for faculty, who could apply the program’s hands-on research methods in their own work and even their curriculum. And it was great for graduate students who directed projects, because they got team leadership experience that can be difficult to come by. It’s a rare quadruple win.

“You can go to many universities and they’ll teach you the same kind of tools,” says Robert Calderbank, the Charles S. Sydnor Distinguished Professor of Computer Science and founder and director of Duke’s big-data unit, the Rhodes Information Initiative. “Data+ gives you an opportunity to integrate those tools and learn new things in a just-in-time way, so that when you graduate from Duke, you’re actually ready to hit the ground running.”

Since I've never participated in research before, especially not research this independently oriented, the main thing I've gained from this experience is confidence. I have a much better understanding of my own capabilities, and I honestly feel much less intimidated by the idea of pursuing research, not just in data science.

— Donald Pepka, senior, math, political science and creative writing

Broadly Applicable

There is no shortage of Duke undergrads in engineering, computer science, math and statistics interested in exploring data-driven approaches to interdisciplinary challenges. On small Data+ teams, they learn how to marshal, analyze and visualize data while gaining broad exposure to the world of data science. And the variety and quality of opportunities are mind-blowing: developing computational tools to entice children to eat healthy food; creating visualizations to show the effects of urban and agricultural land use on rivers; and even predicting mechanical failures to help the Air Force keep its fleet of F-15E Strike Eagle fighter jets flying. Working on a team with people from different disciplines and mastering the latest data analytics skills — while producing a detailed report of the project’s results for a faculty member or real-world client — are résumé builders in any technical field.

“Not only are partner companies raving about the results our students are producing for them, participants in Data+ continue to land high-profile internships and career opportunities after graduation,” says Ravi Bellamkonda, Vinik Dean of the Pratt School of Engineering. “Data+ is one of the signature programs we’re using to ensure all of our students are comfortable with data since it will be an integral skill for engineers of the future.”

Perhaps the most surprising thing about Data+ is how applicable it is across the humanities. Astrid Giugni, a lecturing fellow of English, was an early adopter of Data+ and has used it to conduct projects as varied as understanding the narrative created by thousands of photojournalistic depictions of Syrian refugees, to applying a computational approach to the intellectual history of consumerism. Data analysis is great at combing through massive amounts of text ­— ancient or modern — as a way to identify primary and secondary themes and contextualize those themes with contemporary math, science and history.

A group of students working together at a table
Working in diverse groups is a key to Data+.

“One of the hardest things to do is to make students pay close attention to text,” Giugni says, “because it’s difficult to deal with the nuance of a different time period, a different culture and the history. Working with the students in Data+, I saw that their interest in attentive readings grew even as they worked on computational, large-scale text analysis.”

Following Giugni’s first humanities foray into Data+, she and then-graduate student Jessica Hines A.M.’13, Ph.D.’17 (now on the faculty at Birmingham Southern College) repurposed the Syrian refugee photos as a proxy to place modern-day images in the context of medieval artwork meant to convey pity and compassion. This creative approach seems to have hit home with the students, and it was an “aha” moment for Giugni. “Playing around with a way to make something that was near and dear to our research understandable to students,” she says, “started to give us an opening on how to think about humanistic topics in terms of data.”

Calderbank agrees: “You shouldn’t think about data science as delivering tools to people — it’s not the sense that all you have to do is feed your questions into the machine and have it spit out the answers. Data science is something that enables very different disciplines to have a conversation, and a deeper conversation, and an actionable conversation.”

I learned there’s much more to it then looking at data. It’s also a way of thinking and organizing what you have analyzed to help others to understand it. It’s also a bit of storytelling in a way.

- Jessica Ho, junior, math and neuroscience

Teaching Transformation

Data+ is rare in another way — it represents a potential transformation of pedagogy at the collegiate level. An affiliated summer program of Bass Connections, Data+ was launched to extend learning opportunities into the summer. While Bass Connections project teams take place over nine to 12 months, Data+ allows undergraduates to complete a self-contained project during the summer when they can concentrate solely on the project instead of required coursework. Valerie Ashby, dean of Trinity College of Arts & Sciences, says that there are distinct advantages to both the discrete summer experience and the extension of that data analysis and teamwork into the Trinity curriculum.

“We believe strongly in experiential learning, and we get serious about undergraduate research,” Ashby says. “A 10-week program — set aside during the summer — is an ideal length of time for our students. The beauty of Duke is that we are always creating new ways to achieve excellence in teaching, research and service. We adapt to new or emerging fields, but we also consider who our students are and what they need.”

Paul Bendich A.M.’04, Ph.D.’08, an associate research professor of mathematics who co-directs Data+ with Calderbank, says that students who participate in Data+ become team leaders in other classes, even outside of their major. “I teach a senior-level advanced math for data analysis course in the fall, and I have six or seven Data+ alums in there,” Bendich says. “What’s amazing is just how they take over the space. They have figured out they can’t wait for faculty to make things work.”

After several summers leading Data+ programs, Giugni started implementing team-based data work in her classes. It seemed to her a natural extension of the successes of Data+, and it again paid unexpected benefits to her students. “It has given me a new way of doing term projects in my classes that I think would scale up really nicely,” she says. “I’m importing some of the structure of how teams are organized and how they’re expected to be accountable for their work. And, honestly, it’s been a lifesaver when I’m teaching online. I actually had the students ask me to start the Slack [instant messaging] channel earlier than I planned because they wanted to get to know each other socially — because they’re so isolated.”

Pratt dean Bellamkonda has also used Data+ as a model for refreshing the school’s curriculum with courses such as the First-Year Design Experience. That class uses small teams of students to solve a single semester-long problem, often for a local client. “Data+ is a brilliant approach for introducing students to data science through real-world, hands-on learning,” Bellamkonda says. “When I first learned of the program, I knew that its leaders were on to something.”

Beyond solid technical machine learning skills, I received a greater appreciation for data science as a tool to understand everything from aircraft maintenance to the humanities. Before, I’d never expected that conducting humanities research would teach me how to wield and utilize the most cutting-edge research in machine learning and natural language processing.

— Albert Sun, sophomore, computer science and public policy

Skills for Leaders

While Data+ is a slam dunk for undergraduates, it is equally beneficial for graduate students. Bendich says that the program overcomes some of the aspects of graduate study that can be dispiriting, and also achieves real leadership training at the same time. The traditional model of Ph.D. work can sometimes feel like trudging toward a dissertation in a subject in which doctoral students might feel that their work is not significant. By leading a Data+ team, grad students increase their value as a researcher and as a future faculty member or private sector employee by learning the significant skill of data analysis.

With every Data+ project they lead, graduate students produce a measurable work output which might lead to authoring a research paper, building a website or securing a recommendation from a client. Bendich also serves as chief scientist of a local data analysis firm, where translating complex science to clients or funders is a necessary skill. Even brilliant graduate students need to learn that skill. Leading Data+ teaches them how to do that.

“It’s really clear that mentoring these projects often gives the graduate students ownership over a thing for the first time,” Bendich says. “When they go out into the world and look for jobs, they’re going to get the question, What experience do you have managing teams? Data+ gives them an experience of managing a team, aligning what the undergraduates are able to do with the questions that a client might have. That’s really important for almost any job, including academic jobs, frankly.”

Two students looking at a poster
Following their summer of research, Data+ teams present posters detailing their work.

Bendich and Calderbank also see graduate students as the crucial link to the curriculum transformation that Data+ appears to represent. Expanding Pratt’s First-Year Design Experience type of success to other schools and departments is not doable without help from grad students. The two professors’ dream would be to break the old model of giant lecture classes for first-year students with traditional discussion sections and labs led by graduate students. Instead, first-year undergrads would complete work in teams led by grad students.

“If you want to transform the educational experience, you can’t do that by hiring more faculty or more professors of the practice, because you can’t afford it,” Calderbank says. “What you can do is make more effective use of your graduate students. Data+ is a proof of concept model for things that you might try to do.”

As a biology pre-med, I made the mistake of thinking that coding was irrelevant to me. Having to learn Python on my own in a very short amount of time, and quickly turning around and using those skills, taught me that I am capable of flexibility and learning on the job.

— Ellen Mines, senior, biology and philosophy

The Future of Plus

With its proof points readily apparent, the influence of Data+ is spreading fast. Already, the “plus” model of experiential education has expanded to computer science (CS+), information technology (Code+), the humanities (Story+) and other disciplines. It’s no surprise to Larry Carin, Duke’s vice president for research, that the model has mushroomed. “Robert Calderbank and his team have created a program that has shaped hundreds of Duke students, giving them the opportunity to learn by doing with meaningful, hands-on projects,” Carin said. “By emphasizing the additive qualities of real-world experience in education, Robert and Paul broke important new ground in student experiential education.”

Duke’s new Center for Computational Thinking (CCT) will also springboard off some Data+ concepts. The CCT recognizes a need to prepare students for a workforce where computational skills are a key to success. The center provides customized training in computation, modeling, data science and the ethics of emerging technologies.

“The Center for Computational Thinking will build on the Data+ foundation as it delivers an array of expanded experiential learning programs,” says Tracy Futhey, Duke’s vice president for information technology and chief information officer. “Data+ provided a phenomenal demonstration of the value that co-curricular and project-based student engagement could contribute to the students’ education, and to the project stakeholders as well. The impact that Data+ has had on student education through its experiential learning approach, complementing in-class education, has been remarkable.”

Calderbank and Bendich admit that a happy accident of Data+ is its appeal to women and underrepresented minority students. Fifty percent of applicants are women, which they believe shows the broad appeal of the team-based research model. Data+ is a departure from the male-dominated experience some female engineering and computer science students have — an experience that can be part of the reason promising women abandon science, technology, engineering and math (STEM) pathways in favor of humanities and social sciences. “Being part of a team going after a problem is much more attractive,” Calderbank says. “And so we would submit that this way of organizing has the potential to transform the demographics of computing and STEM subjects.”

Dean Ashby is a believer, too. She says that Data+ “opens the door for much-needed diversity” in computer science and data science. “Projects like these will enable some students to discover who they are. And it will build the confidence of those heading into careers in those fields.”

In 2018, partly due to the success of Data+, Duke launched a master’s program in interdisciplinary data science. Calderbank says it is likely an early step toward that type of degree becoming as valuable as a master’s in business administration (M.B.A.), which applies widely across many fields. “I think data science is going to become like that too,” he says. “You can’t work for McKinsey as a consultant and not really know data science.”

Calderbank and Bendich have scaled Data+ from three projects to 30 in just a little more than five years, reaching 150 students per summer. They envision a day when the program is widely available, enabling perhaps 1,000 students to participate each summer on as many as 200 projects. That vision fits nicely with the university’s goals of ensuring that all students have the technical skills they need for a productive life and career.

“What does it mean to be a citizen in the 21st century?” Calderbank asks. “We live in a world that’s increasingly governed by algorithms. And how do you equip people to play the citizen role in the world that’s emerging?

“The story is that unprecedented access to data and computing is transforming every aspect of our world, and we need to equip our students to fully participate.”