Event box

Data Organization and OpenRefine

Good data organization is the foundation of any research project. Most researchers have data in spreadsheets, so it’s the place that many research projects start. We organize data in spreadsheets in the ways that we as humans want to work with the data, but computers require that data be organized in particular ways. In order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data. Since this is where most research projects start, this is where we want to start too! In this lesson, you will learn: good data entry practices, formatting data tables in spreadsheets, avoiding common formatting mistakes, and approaches for handling dates in spreadsheets.

Before you can analyze data you need to clean it. Data cleaning identifies errors and corrects formatting to create consistent data. This step must be taken with extreme care and attention because without clean data the results of analysis may be false and non-reproducible. OpenRefine is a powerful free and open source tool for working with messy data: cleaning it and transforming it from one format into another. This lesson will teach you to use OpenRefine to clean and format data effectively and automatically track any changes that you make. Many people comment that this tool saves them literally months of work trying to make these edits by hand.

Day 1 will be Data Organization and OpenRefine Part 1
Day 2 will be OpenRefine Part 2

To register for Day 2 of this series (Friday, September 30th), click here.

Thursday, September 29, 2022
8:30am - 11:00am
Registration has closed.

