Working on fair data for everyone: DataWorks at Georgia Tech

By guest columnist CARL DISALVOassociate professor in the School of Interactive Computing at the Georgia Institute of Technology, with BETSY DISALVOfrom Georgia Tech, and BEN SHAPIROfrom Georgia State University

When people talk about the roles and responsibilities of higher education in the 21st century, those conversations often focus on the challenges of educating students for changing work environments and the increasing role of technology in those environments. Sure, that’s part of what colleges and universities do, but not all of it.

Carl DiSalvo, Betsy DiSalvo, Ben Shapiro (left to right)

Higher education institutions are expanding their mission and offering to engage and educate students other than the traditional full-time student, outside the familiar classroom environment. And for some, there is a return to seeing colleges and universities as part of an ecology of social institutions and organizations that are the fabric of our local democracies. It is within this context that we have created and run DataWorks.

DataWorks is part of the Constellations Center for Equity in Computing at the Georgia Institute of Technology, in the College of Computing. Through DataWorks, we hire and train young adults in entry-level data science skills, such as data cleansing and formatting, using tools ranging from ready-made spreadsheet software to custom scripts in programming languages ​​such as Python.

One of DataWorks’ intentions is to broaden participation in data science. There is a tendency to assume that the work of data science is done by engineers with higher education or that it is largely unskilled work. These assumptions are not true and through DataWorks we hope to demonstrate a more pluralistic approach to what data science is or could be. The young adults at the heart of DataWorks come from underrepresented technology communities. Overwhelmingly, tech jobs go to upper-middle class cis-white males. With DataWorks, we try to counteract that pattern, at least to a small extent. We hope to offer the DataWorks employees growth opportunities outside of the program as well.

Customers bring projects to DataWorks and through these projects, employees gain hands-on experience working with data and refine or create data sets that support the work of customers. DataWorks employees are full-time, with reasonable pay and benefits that reflect efforts to create fairer and more equitable work practices around entry-level data work.

There are many ways to think about DataWorks. In some ways it is a staff development program. In other respects, it is a platform to teach data science skills outside of the classroom to employees rather than students, and to study how data literacy evolves in the workplace. It’s also an opportunity to explore and experiment with what else a college or university could be, in addition to departments and faculties conducting research and awarding degrees. It’s an opportunity to think of another way that colleges and universities might embrace to be a civic institution.

We created DataWorks because we noticed that local governments, nonprofits, and small businesses wanted to use data, but the data they needed wasn’t available or in accessible formats. So some of our projects have made that data accessible and usable.

For example, in collaboration with the Atlanta-based nonprofit Center for Civic Innovation, we collected 10 years of records from the Zoning Review Board and the Board of Zoning Adjustment and converted the data from static PDF files into structured data sets. Why bother? Well, those records contained information about the voting patterns of those governing and the development patterns in the city that remained inaccessible and not searchable or comparable to PDF files. Now professionals and community members can study such voting and development patterns to gain insights and make decisions that support Atlanta’s communities.

In this way, DataWorks, and by extension Georgia Tech, provides a valuable service to the city of Atlanta and a local non-profit organization. Unlike the extractive nature of so much academic research and engagement, projects like this one hope to contribute to the societal ecologies in which we are nestled, to better allocate Georgia Tech’s resources to issues of local importance.

With DataWorks, we also hope to set an example for how data work environments can be fair. For others working in this industry, much of the data work is done as “gig work”, outside of conventional work structures. For example, to power the artificial intelligence and machine learning that underpin digital services, people often have to do the manual work of tagging images that must be algorithmically classified and processed. Just as there is a lot of concern about what those algorithms do, there is also a lot to worry about how the data behind those algorithms is created.

Too often, gigs, especially with regard to labeling, cleaning and formatting data, exploit employees. But what if data work environments put the work and growth of employees at the center of the organization? Colleges and universities – especially those that are public – are well prepared to host such environments. If we view our commitment to learning as lifelong, and therefore our commitment to learners extends beyond traditional students, creating and maintaining safe and equitable environments – whether in a classroom or an internship program – is part of our institutional responsibility .

As an experiment, we don’t know if DataWorks will thrive and in what form. We’ve been on it for two years now, but we all know it’s been two weird years. One thing is certain: the employees develop skills that will benefit them, regardless of the form DataWorks takes. These skills are the technical skills of working with data, taking inaccessible or unstructured data and transforming it so that it is usable and usable. They also develop critical perspectives on technology while encountering and managing the frequent limitations and biases of data.

Part of the nature of experiments is that we don’t know the outcome. The not-knowing is essential to research – we learn through experimentation, we find edges, limits and possibilities. Whether or not DataWorks thrives, and in what form, is a contribution in itself and it helps us understand the limits of what a public college or university is or could be. The idea of ​​a public college or university as a means of really serving the community, of redistributing its resources, of being a model for honest work, is hopeful. It is an ambition worth working towards.

Notes to Readers: This column was coordinated by Serve-Learn-Sustain of the Georgia Institute of Technology. This material is based on work supported by the National Science Foundation under Grant No. 1951818.

Leave a Comment