



Data Analysis Using SQL and Excel
M**Y
Comments from a colleague
Gordon Linoff and I have written three an a half books together. (Four, if we get to count the second edition of Data Mining Techniques as a whole new book; it didn't feel like any less work.) Neither of us has written a book without the other before, so I must admit to a tiny twinge of regret upon first seeing the cover of this one without my name on it next to Gordon's. The feeling passed very quickly as recollections of the authorial life came flooding back--vacations spent at the keyboard instead of in or on the lake, opportunities missed, relationships strained. More importantly, this is a book that only Gordon Linoff could have written. His unique combination of talents and experiences informs every chapter.I first met Gordon at Thinking Machines Corporation, a now long-defunct manufacturer of parallel supercomputers where we both worked in the late eighties and early nineties. Among other roles, Gordon managed the implementation of a parallel relational database designed to support complex analytical queries on very large databases. The design point for this database was radically different from other relational database systems available at the time in that no trade-offs were made to support transaction processing. The requirements for a system designed to quickly retrieve or update a single record are quite different from the requirements for a system to scan and join huge tables. Jettisoning the requirement to support transaction processing made for a cleaner, more efficient database for analytical processing. This part of Gordon's background means he understands SQL for data analysis literally from the inside out.Just as a database designed to answer big important questions has a different structure from one designed to process many individual transactions, a book about using databases to answer big important questions requires a different approach to SQL. Many books on SQL are written for database administrators. Others are written for users wishing to prepare simple reports. Still others attempt to introduce some particular dialect of SQL in every detail. This one is written for data analysts, data miners, and anyone who wants to extract maximum information value from large corporate databases. Jettisoning the requirement to address all the disparate types of database user makes this a better, more focused book for the intended audience. In short, this is a book about how to use databases the way we ourselves use them.Even more important than Gordon's database technology background, is his many years as a data mining consultant. This has given him a deep understanding of the kinds of questions businesses need to ask and of the data they are likely to have available to answer them. Years spent exploring corporate databases has given Gordon an intuitive feel for how to approach the kinds of problems that crop up time and again across many different business domains:* How to take advantage of geographic data. A zip code field looks much richer when you realize that from zip code you can get to latitude and longitude and from latitude and longitude you can get to distance. It looks richer still when your realize that you can use it to join in census bureau data to get at important attributes such as population density, median income, percentage of people on public assistance, and the like.* How to take advantage of dates. Order dates, ship dates, enrollment dates, birth dates. Corporate data is full of dates. These fields look richer when you understand how to turn dates into tenures, analyze purchases by day of week, and track trends in fulfillment time. They look richer still when you know how to use this data to analyze time-to-event problems such as time to next purchase or expected remaining life time of a customer relationship.* How to build data mining models directly in SQL. This book shows you how to do things in SQL that you probably never imagined possible including generating association rules for market basket analysis, building regression models, and implementing naïve Bayes classifiers and scorecards.* How to prepare data for use with data mining tools. Although more than most people realize can be done using just SQL and Excel, eventually you will want to use more specialized data mining tools. These tools need data in a specific format known as a customer signature. This book shows you how to create these data mining extracts.The book is rich in examples and they all use real data. This point is worth saying more about. Unrealistic datasets lead to unrealistic results. This is frustrating to the student. In real life, the more you know about the business context, the better your data mining results will be. Subject matter expertise gives you a head start. You know what variables ought to be predictive and have good ideas about new ones to derive. Fake data does not reward these good ideas because patterns that should be in the data are missing and patterns that shouldn't be there have been introduced inadvertently. Real data is hard to come by, not least because real data may reveal more than its owners are willing to share about their business operations. As a result, many books and courses make do with artificially constructed datasets. Best of all, the datasets used in the book are all available for download at the companion web site [...]I reviewed the chapters of this book as they were written. This process was very beneficial to my own use of SQL and Excel. The exercise of thinking about the fairly complex queries used in the examples greatly increased my understanding of how SQL actually works. As a result, I have lost my fear of nested queries, multi-way joins, giant case statements, and other formerly daunting aspects of the language. In well over a decade of collaboration, I have always turned to Gordon for help using SQL to best advantage. Now, I can turn to this book. And you can too.
K**K
What is possible with Excel?
Gordon Linoff is a name I know from his co-authorship, with Michael Berry, of Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. I have read and rated highly the 2nd edition. The new edition of that excellent book came out this year.I am a long time Data Miner, but in almost all instances, my clients have access to specialized, dedicated, Data Mining tools, which have tons of powerful features. It's always been a bit of a mystery to me what Data Miners can do with Excel. I thought that it would have some SQL cases studies, and some Excel case studies. Upon reflection I feel a bit naive because, as his book makes clear, he is using the two tools in partnership. It is the SQL that is doing the heavy lifting. You can NOT do the kinds of analyses that he describes unless you are using SQL AND Excel. In fact, he is using Excel primarily for basic Data Visualization. What really surprised me is that he does some of the Modeling in SQL. Association Rules was one of the examples. I wouldn't have thought that was practical.What will you find in the book? In the first three chapters he introduces his main topics: SQL, Excel, and Statistics. His statistics review in this section and elsewhere in the book is quite readable, although very basic. In the next few (4-8), he uses some case studies to illustrate mostly data preparation tasks. Obviously it is majority SQL material, although he uses Excel charts and Excel functions in these chapters. If you are an Excel expert, but know nothing about SQL, don't expect that you will be able to coast on your Excel knowledge. You will be learning SQL. If you already know SQL, you will find it useful as well as these are not typical SQL tasks. He is performing analyses. The good news for all readers is that it is well written. In 9 and 10 he gets into modeling, but he does the modeling in SQL. In 11, he covers Regression, and here he uses mostly Excel. Finally in 12, he discusses how to prep the data to use it in a dedicated Data Mining tool, like the ones that I use, because "although powerful, the combination (of SQL and Excel) has its limits." So clearly SQL is the star, and Excel is the costar.I would recommend it for those that simply do not have access to a dedicated Data Mining tool, although beware that you will hit those "limits" eventually. I would also recommend it for Data Mining consultants that might encounter a client that does not have access to Data Mining software. It would be a good choice if your organization isn't ready for a large scale Data Mining project, but you want to stick your toe in the water. Finally, it might assist in putting together a demonstration for a tool-neutral audience. You can't go wrong with Linoff. He is an expert Data Miner, and he is good at writing about technical topics for a general audience.
J**R
Terrific Reference Book
This is an excellent reference on data mining techniques and how to use SQL to pull the data for Excel analysis. However, there are two things about this book that bother me. First, the vocabulary used by the author is quite extensive and usually elegant. However, it was sometimes annoying to have to stop and check the dictionary to find out what the author meant. Maybe this is more of a reflection on me than on the author but it seemed like there were many times where simpler language could have been used. Also, don't expect to learn how to run SQL queries from within Excel. That is outside the scope of this book so don't be misled by the book's title.
Trustpilot
3 weeks ago
1 month ago