Machine Learning Projects for .NET Developers.pdf

(7399 KB) Pobierz
www.it-ebooks.info
For your convenience Apress has placed some of the front
matter material after the index. Please use the Bookmarks
and Contents at a Glance links to access them.
www.it-ebooks.info
Contents at a Glance
About the Author �½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½
xiii
About the Technical Reviewer �½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½xv
Acknowledgments �½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½xvii
Introduction �½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½xix
■Chapter
1: 256 Shades of Gray �½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½
1
■Chapter
2: Spam or Ham? �½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½
33
■Chapter
3: The Joy of Type Providers �½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½
67
■Chapter
4: Of Bikes and Men �½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½
93
■Chapter
5: You Are Not a Unique Snowflake �½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½
131
■Chapter
6: Trees and Forests �½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½
179
■Chapter
7: A Strange Game �½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½
211
■Chapter
8: Digits, Revisited �½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½
239
■Chapter
9: Conclusion �½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½
267
Index �½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½�½
271
iii
www.it-ebooks.info
Introduction
If you are holding this book, I have to assume that you are a .NET developer interested in machine learning.
You are probably comfortable with writing applications in C#, most likely line-of-business applications.
Maybe you have encountered F# before, maybe not. And you are very probably curious about machine
learning. The topic is getting more press every day, as it has a strong connection to software engineering, but
it also uses unfamiliar methods and seemingly abstract mathematical concepts. In short, machine learning
looks like an interesting topic, and a useful skill to learn, but it’s difficult to figure out where to start.
This book is intended as an introduction to machine learning for developers. My main goal in writing
it was to make the topic accessible to a reader who is comfortable writing code, and is not a mathematician.
A taste for mathematics certainly doesn’t hurt, but this book is about learning some of the core concepts
through code by using practical examples that illustrate how and why things work.
But first, what is machine learning? Machine learning is the art of writing computer programs that get
better at performing a task as more data becomes available, without requiring you, the developer, to change
the code.
This is a fairly broad definition, which reflects the fact that machine learning applies to a very broad
range of domains. However, some specific aspects of that definition are worth pointing out more closely.
Machine learning is about writing programs—code that runs in production and performs a task—which
makes it different from statistics, for instance. Machine learning is a cross-disciplinary area, and is a topic
relevant to both the mathematically-inclined researcher and the software engineer.
The other interesting piece in that definition is data. Machine learning is about solving practical
problems using the data you have available. Working with data is a key part of machine learning;
understanding your data and learning how to extract useful information from it are quite often more
important than the specific algorithm you will use. For that reason, we will approach machine learning
starting with data. Each chapter will begin with a real dataset, with all its real-world imperfections and
surprises, and a specific problem we want to address. And, starting from there, we will build a solution to the
problem from the ground up, introducing ideas as we need them, in context. As we do so, we will create a
foundation that will help you understand how different ideas work together, and will make it easy later on to
productively use libraries or frameworks, if you need them.
Our exploration will start in the familiar grounds of C# and Visual Studio, but as we progress we will
introduce F#, a .NET language that is particularly suited for machine learning problems. Just like machine
learning, programming in a functional style can be intimidating at first. However, once you get the hang of it,
F# is both simple and extremely productive. If you are a complete F# beginner, this book will walk you through
what you need to know about the language, and you will learn how to use it productively on real-world,
interesting problems.
Along the way, we will explore a whole range of diverse problems, which will give you a sense for the
many places and perhaps unexpected ways that machine learning can make your applications better. We
will explore image recognition, spam filters, and a self-learning game, and much more. And, as we take that
journey together, you will see that machine learning is not all that complicated, and that fairly simple models
can produce surprisingly good results. And, last but not least, you will see that machine learning is a lot of
fun! So, without further ado, let’s start hacking on our first machine learning problem.
xix
www.it-ebooks.info
Chapter 1
256 Shades of Gray
Building a Program to Automatically Recognize
Images of Numbers
If you were to create a list of current hot topics in technology, machine learning would certainly be
somewhere among the top spots. And yet, while the term shows up everywhere, what it means exactly is
often shrouded in confusion. Is it the same thing as “big data,” or perhaps “data science”? How is it different
from statistics? On the surface, machine learning might appear to be an exotic and intimidating specialty
that uses fancy mathematics and algorithms, with little in common with the daily activities of a software
engineer.
In this chapter, and in the rest of this book, my goal will be to demystify machine learning by working
through real-world projects together. We will solve problems step by step, primarily writing code from
the ground up. By taking this approach, we will be able to understand the nuts and bolts of how things
work, illustrating along the way core ideas and methods that are broadly applicable, and giving you a solid
foundation on which to build specialized libraries later on. In our first chapter, we will dive right in with a
classic problem—recognizing hand-written digits—doing a couple of things along the way:
Establish a methodology applicable across most machine learning problems.
Developing a machine learning model is subtly different from writing standard
line-of-business applications, and it comes with specific challenges. At the end of
this chapter, you will understand the notion of cross-validation, why it matters, and
how to use it.
Get you to understand how to “think machine learning” and how to look at ML
problems. We will discuss ideas like similarity and distance, which are central
to most algorithms. We will also show that while mathematics is an important
ingredient of machine learning, that aspect tends to be over-emphasized, and some
of the core ideas are actually fairly simple. We will start with a rather straightforward
algorithm and see that it actually works pretty well!
Know how to approach the problem in C# and F#. We’ll begin with implementing the
solution in C# and then present the equivalent solution in F#, a .NET language that is
uniquely suited for machine learning and data science.
Tackling such a problem head on in the first chapter might sound like a daunting task at first—but
don’t be intimidated! It is a hard problem on the surface, but as you will see, we will be able to create a
pretty effective solution using only fairly simple methods. Besides, where would be the fun in solving trivial
toy problems?
1
www.it-ebooks.info
Zgłoś jeśli naruszono regulamin