A key task that any aspiring data-driven organization needs to learn is data wrangling, the process of converting raw data into something truly useful. This practical guide provides business analysts with an overview of various data wrangling techniques and tools, and puts the practice of data wrangling into context by asking, "What are you trying to do and why?"
Wrangling data consumes roughly 50-80% of an analyst's time before any kind of analysis is possible. Written by key executives at Trifacta, this book walks you through the wrangling process by exploring several factors--time, granularity, scope, and structure--that you need to consider as you begin to work with data. You'll learn a shared language and a comprehensive understanding of data wrangling, with an emphasis on recent agile analytic processes used by many of today's data-driven organizations.
Appreciate the importance--and the satisfaction--of wrangling data the right way.
Understand what kind of data is available
Choose which data to use and at what level of detail
Meaningfully combine multiple sources of data
Decide how to distill the results to a size and shape that can drive downstream analysis
About the Author
Tye Rattenbury is Trifacta's lead data scientist. He holds a Ph.D. in Computer Science from UC Berkeley. Prior to Trifacta, he was a Data Scientist at Facebook and the Director of Data Science Strategy at R/GA.Joe Hellerstein is Trifacta's Chief Strategy Officer and a Professor of Computer Science at Berkeley. His career in research and industry has focused on data-centric systems and the way they drive computing. In 2010, Fortune Magazine included him in their list of 50 smartest people in technology, and MIT Technology Review magazine included his Bloom language for cloud computing on their TR10 list of the 10 technologies "most likely to change our world".Jeffrey Heer is Trifacta's Chief Experience Officer and a Professor of Computer Science at the University of Washington, where he directs the Interactive Data Lab. Jeffrey's passion is the design of novel user interfaces for exploring, managing and communicating data. The data visualization tools developed by his lab (D3.js, Protovis, Prefuse) are used by thousands of data enthusiasts around the world. In 2009, Jeffrey was named in MIT Technology Review's list of "Top Innovators under 35".Sean Kandel is Trifacta's Chief Technical Officer. He completed his Ph.D. at Stanford University, where his research focused on user interfaces for database systems. At Stanford, Sean led development of new tools for data transformation and discovery, such as Data Wrangler. He previously worked as a data analyst at Citadel Investment Group.Connor Carreras is Trifacta's Manager for Customer Success, Americas, where she helps customers use cutting-edge data wrangling techniques in support of their big data initiatives. Connor brings her prior experience in the data integration space to help customers understand how to adopt self-service data preparation as part of an analytics process. She holds a B.A. from Princeton University.
From 1974, University Press Books has stoked the blaze of well over ten thousand minds on fire, carrying new scholarship published by the great university presses in the English-speaking world.
For more than 45 years, UPB operated out of the west half of 2430 Bancroft Way in Berkeley. We presently operate on line and stock a sales wall within the Musical Offering, same street address, east half.
Order from us. Shop us. Let well-wrought words churn and burn within.