Feb 03

Python For Data Analysis-Book Review

Python For Data Analysis. Author: Wes McKinney, Publisher: O’Reilly Media, Inc, Sebastopol, CA 95472 isbn: 978-1-449-31979-3 copyright 2013, 472 pages, cover price: $39.99

No matter what your skill level, you need this book.

Python For Data Analysis

Once you get the basics of the Python language down you need to lift your skills to the next level to do useful work. For me, I wanted to put up a few web sites so that encouraged me to learn about Django, a web framework written in Python. This leap left me with some major gaps in my conceptual understanding of many Python idioms, which with time I guess I’ll fill in.

Another area where I spend time is doing various forms of Data Analysis. The Python tools and skills needed for this work are quite different from building web sites. I was very interested when I first saw the Pandas data analysis library which recommended the use of I-Python, another tool that I had not really begun to explore.

In short, the Pandas library and I-Python tool set make for a very powerful data manipulation and analysis toolset. There is a fascinating confluence of activity in the Python world with packages such as Numpy, scipy and I-Python and now Pandas, stats models, scikit learn and Numba increasingly supporting the Scientific and Data analysis communities. While a lot of the tools emphasize Matrix (array) operations, which initially put me off, Pandas makes it way more approachable since it more closely resembles spreadsheet structures which in fact resemble matrices once you wrap your head around the concepts.

Another major data manipulation capability introduced by Pandas, as explained in the text, is a set of SQL- like operators for Array operations enabling joining, summarizing and other SQL like operations on in-memory datasets.

I bought an early access copy of Python for Data Analysis and I have since kept it up to date which is a great feature of O’Reilly early access publications.

The book covers basic prerequisite information on the following:

  • Python
  • NumPy
  • Pandas
  • matplotlib
  • Ipython
  • SciPy

The book is excellent taking one through the conceptual issues through to the execution of sophisticated analysis of data sets from a variety of sources. The problems are well documented and the code can be executed with available data (something I have yet to do). Examples include: Getting and using data from:

  • usa.gov
  • usda
  • Federal Election Commission
  • Yahoo finance

Throughout the book examples and code are presented with thorough explanations, way beyond simple code commentary in a teacher-like style. Due to the nature of the code being explained there was for me a constant set of aha moments as I began to understand not just the syntax of the code but why the code should be used to achieve the desired result, and also how to use some of the less obvious Python and Pandas language elements to better effect, in short helping me to be a more fluent coder.

Wes is truly a polymath, who understands analysis, advanced math and statistics as well as being an awesome coder of the Pandas package itself, and, to boot, he can write clearly in a way to be understood by mere mortals attempting to get up to speed on the tools and concepts embraced and enabled by the tools he has built and assembled. I also find the appendix summary of the Python language to be compact and useful in its own right.

Can’t recommend the book highly enough.


