Working toward fair data for all: DataWorks at Georgia Tech

By Guest Columnist CARL DISALVO, associate professor in the School of Interactive Computing at Georgia Institute of Technology, with BETSY DISALVO, of Georgia Tech, and BEN SHAPIRO, of Georgia State University

When people talk about the roles and responsibilities of higher education in the 21st Century, often those conversations focus on the challenges of educating students for changing work environments and the ever-increasing role of technology in those environments. Certainly, that’s part of what colleges and universities do, but not all of it.

Carl DiSalvo, Betsy DiSalvo, Ben Shapiro (left to right)

Higher education institutions are expanding their mission and offerings to engage and educate learners other than the traditional full-time student, outside the familiar classroom environment. And for some, there’s a return to seeing colleges and universities as part of an ecology of civic institutions and organizations that make up the fabric of our local democracies. It’s within this context that we created and run DataWorks.

DataWorks is part of the Constellations Center for Equity in Computing at the Georgia Institute of Technology, in the College of Computing. Through DataWorks, we hire young adults and train them in entry-level data science skills, such as cleaning and formatting data, using tools ranging from off-the-shelf spreadsheet software to custom scripts in programming languages such as Python.

One intention of DataWorks is broadening participation in data science. There is a tendency to assume the labor of data science is done by engineers with advanced degrees or that it is largely unskilled work. These assumptions are untrue, and through DataWorks we hope to demonstrate a more pluralistic approach to what data science is or could be. The young adults at the core of DataWorks come from communities underrepresented in technology fields. Overwhelmingly, technology jobs go to cis white men who are upper middle class. Through DataWorks, we attempt to counter that pattern, at least in small measure. We also hope to provide pathways to careers for the DataWorks employees beyond the program.

Clients bring projects to DataWorks, and through these projects, employees gain hands-on experience working with data and refine or create data sets that support clients’ work. The employees of DataWorks are full-time, with reasonable pay and benefits that reflect efforts to create more fair and equitable work practices around entry level data work.

There are many ways to think about DataWorks. In some respects, it’s a workforce development program. In other respects, it’s a platform to teach data science skills outside of the classroom to workers rather than students, and to study how data literacy develops in the workplace. It’s also a chance to explore and experiment with what else a college or university might be, in addition to departments and faculty that conduct research and confer degrees. It’s a chance to consider another way colleges and universities might embrace being a civic institution.

We created DataWorks because we noticed that local government, non-profits, and small businesses wanted to use data, but the data they needed wasn’t available or wasn’t in accessible formats. So, some of our projects have made that data accessible and usable.

For example, working with the Atlanta nonprofit organization Center for Civic Innovation, we took 10 years of records from the Zoning Review Board and the Board of Zoning Adjustment and transformed the data from static PDF files into structured data sets. Why bother? Well, those records contain information about the voting patterns of those boards and the patterns of development in the city that remained inaccessible and not searchable or comparable as PDF files. Now, professionals and community members can study such voting and development patterns to derive insights and make decisions that support Atlanta communities.

In this way, DataWorks, and by extension Georgia Tech, provides a service of value to the City of Atlanta and a local-non-profit. Different from the extractive nature of so much academic research and engagement, projects such as these hope to contribute to the civic ecologies we are nested within, to better distribute the resources of Georgia Tech towards matters of local concern.

Through DataWorks we also hope to set an example for how data work environments can be fair. For others working in this sector, much of the data work happens as “gig work,” outside conventional work structures. For instance, to power the artificial intelligence and machine learning that underlies digital services often requires that people do the manual labor of labeling images to be classified and processed algorithmically. Just as there’s much to be concerned about with what those algorithms are doing, there’s also much to be concerned about with how the data behind those algorithms is made.

Too often, gig work, especially involving data labeling, cleaning, and formatting, exploits workers. But what if data work environments were putting the labor and growth of the workers as central to the organization? Colleges and universities – particularly those that are public – are well poised to host such environments. If we consider our commitment to learning as lifelong, and therefore our commitment to learners extends beyond traditional students, then establishing and maintaining safe and just environments – whether in a classroom or an apprenticeship program – is part of our institutional responsibility.

As an experiment, we don’t know if DataWorks will thrive or in what form. We’ve been at it for two years, but we all know it’s been a weird two years. One thing is for certain: the workers are developing skills that will serve them no matter what form DataWorks takes. These skills are the technical skills of working with data, of taking inaccessible or unstructured data and transforming it so that it’s useful and usable. They are also developing critical perspectives on technology as they encounter and manage the frequent limitations and bias of data.

Part of the nature of experiments is that we don’t know the outcome. The not knowing is essential to inquiry – we learn through experiments, we find edges, limits, and potentials. Whether or not DataWorks thrives, and in what form, is a contribution in itself and it helps us learn the boundaries of what a public college or university is or might be. The idea of a public college or university as a means of truly serving the community, of redistributing its resources, of being a model for fair work, is hopeful. It’s an aspiration worth working towards.

Notes to readers: This column was coordinated by Serve-Learn-Sustain at the Georgia Institute of Technology. This material is based upon work supported by the National Science Foundation under Grant No. 1951818.

Join the Conversation

3 Comments

Pingback: Working toward fair data for all: DataWorks at Georgia Tech - SaportaReport - CodeCap
robbers alice says:

August 2, 2022 9:15 am at 9:15 am

There are several types of data quality https://provectus.com/data-quality-assurance/ . Accuracy and completeness are the most common. Both measure the extent to which data is consistent. Data should be available in multiple sources and reflect the reality. Incomplete records pose one of the biggest challenges for organizations, especially those that rely on consumer-provided data. Data should also be complete and accurate, so that it can be used as reference material. The following steps can help ensure your data is reliable.

Pingback: SEA/SAW 2024 – Annabel Rothschild