Thursday, January 27, 2011

Google Refine

I can't remember exactly where I found this, probably Twitter.

From the website:
Google Refine is a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase.

I installed it this afternoon and played around with it a little bit. I was hoping to use it more for analysis, but I just didn't understand what it was built for.

Basically, it allows you to clean up sets of data. You know, you get an excel file from a customer and you want to make sure all instances of State match. Usually, you get a few different variations of it.

- FL
- Fl
- Florida
- FLORIDA
- florida
- FLOR.

I am sure I could go on and on. But you can select all those values and then update them with a single value. Yes, I know you could update those values in a single SQL statement. Perhaps you don't have time to create the table and load the data. This is a simple tool to allow you to do some basic data cleansing.

Check out the video to get a better example of how to use it. Cool stuff.



1 comment:

Anonymous said...

I downloaded the application yesterday, and tried to load in a couple of data mining files. I couldn't get data into the application. Didn't matter whether using a .xlxs, .csv or .txt (tab delimited) --- nothing happened, the Refine software status bar said "Done", but when I clicked on the "Next" button, the status bar changed to "Error on the page". Couldn't find any other message or clue as to what could be done to make the upload work.