Description
Efnisyfirlit
- Contents
- Intro to Python® for Computer Science and Data Science
- Deitel® Series Page
- Intro to Python® for Computer Science and Data Science
- Intro to Python® for Computer Science and Data Science
- Contents
- Preface
- Python for Computer Science and Data Science Education
- Modular Architecture
- Audiences for the Book
- Key Features
- Chapter Dependencies
- Computing and Data Science Curricula
- Data Science Overlaps with Computer Science28
- Jobs Requiring Data Science Skills
- Jupyter Notebooks
- Docker
- Class Tested
- “Flipped Classroom”
- Special Feature: IBM Watson Analytics and Cognitive Computing
- Teaching Approach
- Software Used in the Book
- Python Documentation
- Getting Your Questions Answered
- Student and Instructor Supplements
- Instructor Supplements on Pearson’s Instructor Resource Center
- Instructor Examination Copies
- Keeping in Touch with the Authors
- Acknowledgments
- About the Authors
- About Deitel® & Associates, Inc.
- Before You Begin
- 1 Introduction to Computers and Python
- Objectives
- Outline
- 1.1 Introduction
- 1.2 Hardware and Software
- 1.2.1 Moore’s Law
- 1.2.2 Computer Organization
- Input Unit
- Output Unit
- Memory Unit
- Arithmetic and Logic Unit (ALU)
- Central Processing Unit (CPU)
- Secondary Storage Unit
- Self Check for Section 1.2
- 1.3 Data Hierarchy
- Self Check
- 1.4 Machine Languages, Assembly Languages and High-Level Languages
- Self Check
- 1.5 Introduction to Object Technology
- Self Check for Section 1.5
- 1.6 Operating Systems
- Self Check for Section 1.6
- 1.7 Python
- Self Check
- 1.8 It’s the Libraries!
- 1.8.1 Python Standard Library
- 1.8.2 Data-Science Libraries
- Self Check for Section 1.8
- 1.9 Other Popular Programming Languages
- Self Check
- 1.10 Test-Drive: Using IPython and Jupyter Notebooks
- 1.10.1 Using IPython Interactive Mode as a Calculator
- Entering IPython in Interactive Mode
- Evaluating Expressions
- Exiting Interactive Mode
- Self Check
- 1.10.2 Executing a Python Program Using the IPython Interpreter
- Changing to This Chapter’s Examples Folder
- Executing the Script
- Creating Scripts
- Problems That May Occur at Execution Time
- Self Check
- 1.10.3 Writing and Executing Code in a Jupyter Notebook
- Opening JupyterLab in Your Browser
- Creating a New Jupyter Notebook
- Renaming the Notebook
- Evaluating an Expression
- Adding and Executing Another Cell
- Saving the Notebook
- Notebooks Provided with Each Chapter’s Examples
- Opening and Executing an Existing Notebook
- Closing JupyterLab
- JupyterLab Tips
- More Information on Working with JupyterLab
- Self Check
- 1.11 Internet and World Wide Web
- 1.11.1 Internet: A Network of Networks
- 1.11.2 World Wide Web: Making the Internet User-Friendly
- 1.11.3 The Cloud
- Mashups
- 1.11.4 Internet of Things
- Self Check for Section 1.11
- 1.12 Software Technologies
- Self Check
- 1.13 How Big Is Big Data?
- Self Check
- 1.13.1 Big Data Analytics
- 1.13.2 Data Science and Big Data Are Making a Difference: Use Cases
- 1.14 Case Study—A Big-Data Mobile Application
- 1.15 Intro to Data Science: Artificial Intelligence—at the Intersection of CS and Data Science
- Self Check
- Exercises
- 2 Introduction to Python Programming
- Objectives
- Outline
- 2.1 Introduction
- 2.2 Variables and Assignment Statements
- Self Check
- 2.3 Arithmetic
- Self Check
- 2.4 Function print and an Intro to Single- and Double-Quoted Strings
- Self Check
- 2.5 Triple-Quoted Strings
- Self Check
- 2.6 Getting Input from the User
- Self Check
- 2.7 Decision Making: The if Statement and Comparison Operators
- Self Check
- 2.8 Objects and Dynamic Typing
- Self Check
- 2.9 Intro to Data Science: Basic Descriptive Statistics
- Self Check
- 2.10 Wrap-Up
- Exercises
- 3 Control Statements and Program Development
- Objectives
- Outline
- 3.1 Introduction
- 3.2 Algorithms
- Self Check
- 3.3 Pseudocode
- Self Check
- 3.4 Control Statements
- Self Check
- 3.5 if Statement
- Self Check
- 3.6 if…else and if…elif…else Statements
- Self Check
- 3.7 while Statement
- Self Check
- 3.8 for Statement
- 3.8.1 Iterables, Lists and Iterators
- 3.8.2 Built-In range Function
- Off-By-One Errors
- Self Check
- 3.9 Augmented Assignments
- Self Check
- 3.10 Program Development: Sequence-Controlled Repetition
- 3.10.1 Requirements Statement
- 3.10.2 Pseudocode for the Algorithm
- 3.10.3 Coding the Algorithm in Python
- Execution Phases
- Initialization Phase
- Processing Phase
- Termination Phase
- 3.10.4 Introduction to Formatted Strings
- Self Check
- 3.11 Program Development: Sentinel-Controlled Repetition
- Self Check
- 3.12 Program Development: Nested Control Statements
- Self Check
- 3.13 Built-In Function range: A Deeper Look
- Self Check
- 3.14 Using Type Decimal for Monetary Amounts
- Self Check
- 3.15 break and continue Statements
- 3.16 Boolean Operators and, or and not
- Self Check
- 3.17 Intro to Data Science: Measures of Central Tendency—Mean, Median and Mode
- Self Check
- 3.18 Wrap-Up
- Exercises
- 4 Functions
- Objectives
- Outline
- 4.1 Introduction
- 4.2 Defining Functions
- Self Check
- 4.3 Functions with Multiple Parameters
- Self Check
- 4.4 Random-Number Generation
- Self Check
- 4.5 Case Study: A Game of Chance
- Self Check
- 4.6 Python Standard Library
- Self Check
- 4.7 math Module Functions
- 4.8 Using IPython Tab Completion for Discovery
- Self Check
- 4.9 Default Parameter Values
- Self Check
- 4.10 Keyword Arguments
- Self Check
- 4.11 Arbitrary Argument Lists
- Self Check
- 4.12 Methods: Functions That Belong to Objects
- 4.13 Scope Rules
- Self Check
- 4.14 import: A Deeper Look
- Self Check
- 4.15 Passing Arguments to Functions: A Deeper Look
- Self Check
- 4.16 Function-Call Stack
- Self Check
- 4.17 Functional-Style Programming
- Pure Functions
- 4.18 Intro to Data Science: Measures of Dispersion
- Self Check
- 4.19 Wrap-Up
- Exercises
- 5 Sequences: Lists and Tuples
- Objectives
- Outline
- 5.1 Introduction
- 5.2 Lists
- Self Check
- 5.3 Tuples
- Self Check
- 5.4 Unpacking Sequences
- Self Check
- 5.5 Sequence Slicing
- Self Check
- 5.6 del Statement
- Self Check
- 5.7 Passing Lists to Functions
- Self Check
- 5.8 Sorting Lists
- Self Check
- 5.9 Searching Sequences
- Self Check
- 5.10 Other List Methods
- Self Check
- 5.11 Simulating Stacks with Lists
- Self Check
- 5.12 List Comprehensions
- Self Check
- 5.13 Generator Expressions
- Self Check
- 5.14 Filter, Map and Reduce
- Self Check
- 5.15 Other Sequence Processing Functions
- Self Check
- 5.16 Two-Dimensional Lists
- Self Check
- 5.17 Intro to Data Science: Simulation and Static Visualizations
- 5.17.1 Sample Graphs for 600, 60,000 and 6,000,000 Die Rolls
- Self Check
- 5.17.2 Visualizing Die-Roll Frequencies and Percentages
- Launching IPython for Interactive Matplotlib Development
- Importing the Libraries
- Rolling the Die and Calculating Die Frequencies
- Creating the Initial Bar Plot
- Setting the Window Title and Labeling the x- and y-Axes
- Finalizing the Bar Plot
- Rolling Again and Updating the Bar Plot—Introducing IPython Magics
- Saving Snippets to a File with the %save Magic
- Command-Line Arguments; Displaying a Plot from a Script
- Self Check
- 5.18 Wrap-Up
- Exercises
- Exercises 5.24 through 5.26 are reasonably challenging. Once you’ve done them, you ought to be able to implement many popular card games.
- 6 Dictionaries and Sets
- Objectives
- Outline
- 6.1 Introduction
- 6.2 Dictionaries
- 6.2.1 Creating a Dictionary
- Determining if a Dictionary Is Empty
- Self Check
- 6.2.2 Iterating through a Dictionary
- Self Check
- 6.2.3 Basic Dictionary Operations
- Accessing the Value Associated with a Key
- Updating the Value of an Existing Key–Value Pair
- Adding a New Key–Value Pair
- Removing a Key–Value Pair
- Attempting to Access a Nonexistent Key
- Testing Whether a Dictionary Contains a Specified Key
- Self Check
- 6.2.4 Dictionary Methods keys and values
- Dictionary Views
- Converting Dictionary Keys, Values and Key–Value Pairs to Lists
- Processing Keys in Sorted Order
- Self Check
- 6.2.5 Dictionary Comparisons
- Self Check
- 6.2.6 Example: Dictionary of Student Grades
- 6.2.7 Example: Word Counts2
- Python Standard Library Module collections
- Self Check
- 6.2.8 Dictionary Method update
- 6.2.9 Dictionary Comprehensions
- Self Check
- 6.3 Sets
- Self Check
- 6.3.1 Comparing Sets
- Self Check
- 6.3.2 Mathematical Set Operations
- Union
- Intersection
- Difference
- Symmetric Difference
- Disjoint
- Self Check
- 6.3.3 Mutable Set Operators and Methods
- Mutable Mathematical Set Operations
- Methods for Adding and Removing Elements
- Self Check
- 6.3.4 Set Comprehensions
- 6.4 Intro to Data Science: Dynamic Visualizations
- Self Check
- 6.4.1 How Dynamic Visualization Works
- Animation Frames
- Running RollDieDynamic.py
- Sample Executions
- Self Check
- 6.4.2 Implementing a Dynamic Visualization
- Importing the Matplotlib animation Module
- Function update
- Function update: Rolling the Die and Updating the frequencies List
- Function update: Configuring the Bar Plot and Text
- Variables Used to Configure the Graph and Maintain State
- Calling the animation Module’s FuncAnimation Function
- Self Check
- 6.5 Wrap-Up
- Exercises
- 7 Array-Oriented Programming with NumPy
- Objectives
- Outline
- 7.1 Introduction
- Self Check
- 7.2 Creating arrays from Existing Data
- Self Check
- 7.3 array Attributes
- Self Check
- 7.4 Filling arrays with Specific Values
- 7.5 Creating arrays from Ranges
- 7.6 List vs. array Performance: Introducing %timeit
- 7.7 array Operators
- 7.8 NumPy Calculation Methods
- 7.9 Universal Functions
- 7.10 Indexing and Slicing
- 7.11 Views: Shallow Copies
- 7.12 Deep Copies
- 7.13 Reshaping and Transposing
- 7.14 Intro to Data Science: pandas Series and DataFrames
- 7.14.1 pandas Series
- Creating a Series with Default Indices
- Displaying a Series
- Creating a Series with All Elements Having the Same Value
- Accessing a Series’ Elements
- Producing Descriptive Statistics for a Series
- Creating a Series with Custom Indices
- Dictionary Initializers
- Accessing Elements of a Series Via Custom Indices
- Creating a Series of Strings
- Self Check
- 7.14.2 DataFrames
- Creating a DataFrame from a Dictionary
- Customizing a DataFrame’s Indices with the index Attribute
- Accessing a DataFrame’s Columns
- Selecting Rows via the loc and iloc Attributes
- Selecting Rows via Slices and Lists with the loc and iloc Attributes
- Selecting Subsets of the Rows and Columns
- Boolean Indexing
- Accessing a Specific DataFrame Cell by Row and Column
- Descriptive Statistics
- Transposing the DataFrame with the T Attribute
- Sorting by Rows by Their Indices
- Sorting by Column Indices
- Sorting by Column Values
- Copy vs. In-Place Sorting
- Self Check
- 7.15 Wrap-Up
- Exercises
- 8 Strings: A Deeper Look
- Objectives
- Outline
- 8.1 Introduction
- 8.2 Formatting Strings
- 8.2.1 Presentation Types
- Integers
- Characters
- Strings
- Floating-Point and Decimal Values
- Self Check
- 8.2.2 Field Widths and Alignment
- Explicitly Specifying Left and Right Alignment in a Field
- Centering a Value in a Field
- Self Check
- 8.2.3Numeric Formatting
- Formatting Positive Numbers with Signs
- Using a Space Where a + Sign Would Appear in a Positive Value
- Grouping Digits
- Self Check
- 8.2.4String’s format Method
- Multiple Placeholders
- Referencing Arguments By Position Number
- Referencing Keyword Arguments
- Self Check
- 8.3 Concatenating and Repeating Strings
- 8.4 Stripping Whitespace from Strings
- 8.5 Changing Character Case
- 8.6 Comparison Operators for Strings
- 8.7 Searching for Substrings
- 8.8 Replacing Substrings
- 8.9 Splitting and Joining Strings
- 8.10 Characters and Character-Testing Methods
- 8.11 Raw Strings
- 8.12 Introduction to Regular Expressions
- 8.12.1 re Module and Function fullmatch
- Matching Literal Characters
- Metacharacters, Character Classes and Quantifiers
- Other Predefined Character Classes
- Custom Character Classes
- * vs. + Quantifier
- Other Quantifiers
- Self Check
- 8.12.2 Replacing Substrings and Splitting Strings
- Function sub—Replacing Patterns
- Function split
- Self Check
- 8.12.3 Other Search Functions; Accessing Matches
- Function search—Finding the First Match Anywhere in a String
- Ignoring Case with the Optional flags Keyword Argument
- Metacharacters That Restrict Matches to the Beginning or End of a String
- Function findall and finditer—Finding All Matches in a String
- Capturing Substrings in a Match
- Self Check
- 8.13 Intro to Data Science: Pandas, Regular Expressions and Data Munging
- Self Check
- 8.14 Wrap-Up
- Exercises
- Regular Expression Exercises
- More Challenging String-Manipulation Exercises
- 9 Files and Exceptions
- Objectives
- Outline
- 9.1 Introduction
- 9.2 Files
- 9.3 Text-File Processing
- 9.3.1 Writing to a Text File: Introducing the with Statement
- The with Statement
- Built-In Function open
- Writing to the File
- Contents of accounts.txt File
- Self Check
- 9.3.2 Reading Data from a Text File
- File Method readlines
- Seeking to a Specific File Position
- Self Check
- 9.4 Updating Text Files
- Self Check
- 9.5 Serialization with JSON
- Self Check
- 9.6 Focus on Security: pickle Serialization and Deserialization
- 9.7 Additional Notes Regarding Files
- Self Check
- 9.8 Handling Exceptions
- 9.8.1 Division by Zero and Invalid Input
- Division By Zero
- Invalid Input
- 9.8.2 try Statements
- try Clause
- except Clause
- else Clause
- Flow of Control for a ZeroDivisionError
- Flow of Control for a ValueError
- Flow of Control for a Successful Division
- Self Check
- 9.8.3 Catching Multiple Exceptions in One except Clause
- 9.8.4 What Exceptions Does a Function or Method Raise?
- 9.8.5 What Code Should Be Placed in a try Suite?
- 9.9 finally Clause
- Self Check
- 9.10 Explicitly Raising an Exception
- Self Check
- 9.11 (Optional) Stack Unwinding and Tracebacks
- Self Check
- 9.12 Intro to Data Science: Working with CSV Files
- 9.12.1 Python Standard Library Module csv
- Writing to a CSV File
- Reading from a CSV File
- Caution: Commas in CSV Data Fields
- Caution: Missing Commas and Extra Commas in CSV Files
- Self Check
- 9.12.2 Reading CSV Files into Pandas DataFrames
- Datasets
- Working with Locally Stored CSV Files
- 9.12.3 Reading the Titanic Disaster Dataset
- Loading the Titanic Dataset via a URL
- Viewing Some of the Rows in the Titanic Dataset
- Customizing the Column Names
- 9.12.4 Simple Data Analysis with the Titanic Disaster Dataset
- 9.12.5 Passenger Age Histogram
- Self Check
- 9.13 Wrap-Up
- Exercises
- 10 Object-Oriented Programming
- Objectives
- Outline
- 10.1 Introduction
- 10.2 Custom Class Account
- 10.2.1 Test-Driving Class Account
- Importing Classes Account and Decimal
- Create an Account Object with a Constructor Expression
- Getting an Account’s Name and Balance
- Depositing Money into an Account
- Account Methods Perform Validation
- Self Check
- 10.2.2 Account Class Definition
- Defining a Class
- Initializing Account Objects: Method __init__
- Method deposit
- 10.2.3 Composition: Object References as Members of Classes
- Self Check
- 10.3 Controlling Access to Attributes
- Self Check
- 10.4 Properties for Data Access
- 10.4.1 Test-Driving Class Time
- Creating a Time Object
- Displaying a Time Object
- Getting an Attribute Via a Property
- Setting the Time
- Setting an Attribute via a Property
- Attempting to Set an Invalid Value
- Self Check
- 10.4.2 Class Time Definition
- Class Time: __init__ Method with Default Parameter Values
- Class Time: hour Read-Write Property
- Class Time: minute and second Read-Write Properties
- Class Time: Method set_time
- Class Time: Special Method __repr__
- Class Time: Special Method __str__
- Self Check
- 10.4.3 Class Time Definition Design Notes
- Interface of a Class
- Attributes Are Always Accessible
- Internal Data Representation
- Evolving a Class’s Implementation Details
- Properties
- Utility Methods
- Module datetime
- Self Check
- 10.5 Simulating “Private” Attributes
- Self Check
- 10.6 Case Study: Card Shuffling and Dealing Simulation
- 10.6.1 Test-Driving Classes Card and DeckOfCards
- Creating, Shuffling and Dealing the Cards
- Dealing Cards
- Class Card’s Other Features
- 10.6.2 Class Card—Introducing Class Attributes
- Class Attributes FACES and SUITS
- Card Method __init__
- Read-Only Properties face, suit and image_name
- Methods That Return String Representations of a Card
- 10.6.3 Class DeckOfCards
- Method __init__
- Method shuffle
- Method deal_card
- Method __str__
- 10.6.4 Displaying Card Images with Matplotlib
- Enable Matplotlib in IPython
- Create the Base Path for Each Image
- Import the Matplotlib Features
- Create the Figure and Axes Objects
- Configure the Axes Objects and Display the Images
- Maximize the Image Sizes
- Shuffle and Re-Deal the Deck
- Self Check
- 10.7 Inheritance: Base Classes and Subclasses
- Self Check
- 10.8 Building an Inheritance Hierarchy; Introducing Polymorphism
- 10.8.1 Base Class CommissionEmployee
- All Classes Inherit Directly or Indirectly from Class object
- Testing Class CommissionEmployee
- Self Check
- 10.8.2 Subclass SalariedCommissionEmployee
- Declaring Class SalariedCommissionEmployee
- Inheriting from Class CommissionEmployee
- Method __init__ and Built-In Function super
- Overriding Method earnings
- Overriding Method __repr__
- Testing Class SalariedCommissionEmployee
- Testing the “is a” Relationship
- Self Check
- 10.8.3 Processing CommissionEmployees and SalariedCommissionEmployees Polymorphically
- Self Check
- 10.8.4A Note About Object-Based and Object-Oriented Programming
- 10.9 Duck Typing and Polymorphism
- 10.10 Operator Overloading
- Operator Overloading Restrictions
- Complex Numbers
- 10.10.1 Test-Driving Class Complex
- 10.10.2 Class Complex Definition
- Method __init__
- Overloaded + Operator
- Overloaded += Augmented Assignment
- Method __repr__
- Self Check
- 10.11 Exception Class Hierarchy and Custom Exceptions
- 10.12 Named Tuples
- Self Check
- 10.13 A Brief Intro to Python 3.7’s New Data Classes
- 10.13.1 Creating a Card Data Class
- Importing from the dataclasses and typing Modules
- Using the @dataclass Decorator
- Variable Annotations: Class Attributes
- Variable Annotations: Data Attributes
- Defining a Property and Other Methods
- Variable Annotation Notes
- Self Check
- 10.13.2 Using the Card Data Class
- Self Check
- 10.13.3 Data Class Advantages over Named Tuples
- 10.13.4 Data Class Advantages over Traditional Classes
- More Information
- 10.14 Unit Testing with Docstrings and doctest
- Self Check
- 10.15 Namespaces and Scopes
- 10.16 Intro to Data Science: Time Series and Simple Linear Regression
- Self Check
- 10.17 Wrap-Up
- Exercises
- 11 Computer Science Thinking: Recursion, Searching, Sorting and Big O
- Objectives
- Outline
- 11.1 Introduction
- 11.2 Factorials
- 11.3 Recursive Factorial Example
- Self Check
- 11.4 Recursive Fibonacci Series Example
- Self Check
- 11.5 Recursion vs. Iteration
- 11.6 Self Check
- 11.6 Searching and Sorting
- 11.7 Linear Search
- Self Check
- 11.8 Efficiency of Algorithms: Big O
- Self Check
- 11.9 Binary Search
- Self Check
- 11.9.1 Binary Search Implementation
- Function binary_search
- Function remaining_elements
- Function main
- 11.9.2 Big O of the Binary Search
- 11.10 Sorting Algorithms
- 11.11 Selection Sort
- 11.11.1 Selection Sort Implementation
- Function selection_sort
- Function main
- 11.11.2 Utility Function print_pass
- 11.11.3 Big O of the Selection Sort
- Self Check
- 11.12 Insertion Sort
- 11.12.1 Insertion Sort Implementation
- Function insertion_sort
- 11.12.2 Big O of the Insertion Sort
- Self Check
- 11.13 Merge Sort
- 11.13.1 Merge Sort Implementation
- Function merge_sort
- Recursive Function sort_array
- Function merge
- Function subarray_string
- Function main
- 11.13.2 Big O of the Merge Sort
- Self Check
- 11.14 Big O Summary for This Chapter’s Searching and Sorting Algorithms
- 11.15 Visualizing Algorithms
- 11.15.1 Generator Functions
- yield Statements
- 11.15.2 Implementing the Selection Sort Animation
- import Statements
- update Function That Displays Each Animation Frame
- flash_bars Function That Flashes the Bars About to Be Swapped
- selection_sort Generator Function
- main Function That Launches the Animation
- Sound Utility Functions
- 11.16 Wrap-Up
- Exercises
- 12 Natural Language Processing (NLP)
- Objectives
- Outline
- 12.1 Introduction
- 12.2 TextBlob1
- Self Check
- 12.2.1 Create a TextBlob
- Self Check
- 12.2.2 Tokenizing Text into Sentences and Words
- Self Check
- 12.2.3 Parts-of-Speech Tagging
- Self Check
- 12.2.4 Extracting Noun Phrases
- Self Check
- 12.2.5 Sentiment Analysis with TextBlob’s Default Sentiment Analyzer
- Getting the Sentiment of a TextBlob
- Getting the polarity and subjectivity from the Sentiment Object
- Getting the Sentiment of a Sentence
- Self Check
- 12.2.6 Sentiment Analysis with the NaiveBayesAnalyzer
- Self Check
- 12.2.7 Language Detection and Translation
- Self Check
- 12.2.8 Inflection: Pluralization and Singularization
- Self Check
- 12.2.9 Spell Checking and Correction
- Self Check
- 12.2.10 Normalization: Stemming and Lemmatization
- Self Check
- 12.2.11 Word Frequencies
- Self Check
- 12.2.12 Getting Definitions, Synonyms and Antonyms from WordNet
- Getting Definitions
- Getting Synonyms
- Getting Antonyms
- Self Check
- 12.2.13 Deleting Stop Words
- Self Check
- 12.2.14 n-grams
- Self Check
- 12.3 Visualizing Word Frequencies with Bar Charts and Word Clouds
- 12.3.1 Visualizing Word Frequencies with Pandas
- Loading the Data
- Getting the Word Frequencies
- Eliminating the Stop Words
- Sorting the Words by Frequency
- Getting the Top 20 Words
- Convert top20 to a DataFrame
- Visualizing the DataFrame
- 12.3.2 Visualizing Word Frequencies with Word Clouds
- Installing the wordcloud Module
- Loading the Text
- Loading the Mask Image that Specifies the Word Cloud’s Shape
- Configuring the WordCloud Object
- Generating the Word Cloud
- Saving the Word Cloud as an Image File
- Generating a Word Cloud from a Dictionary
- Displaying the Image with Matplotlib
- Self Check
- 12.4 Readability Assessment with Textatistic
- Self Check
- 12.5 Named Entity Recognition with spaCy
- Self Check
- 12.6 Similarity Detection with spaCy
- Self Check
- 12.7 Other NLP Libraries and Tools
- 12.8 Machine Learning and Deep Learning Natural Language Applications
- 12.9 Natural Language Datasets
- 12.10 Wrap-Up
- Exercises
- 13 Data Mining Twitter
- Objectives
- Outline
- 13.1 Introduction
- Self Check
- 13.2 Overview of the Twitter APIs
- Self Check
- 13.3 Creating a Twitter Account
- 13.4 Getting Twitter Credentials—Creating an App
- Self Check
- 13.5 What’s in a Tweet?
- Key Properties of a Tweet Object
- Sample Tweet JSON
- Twitter JSON Object Resources
- Self Check
- 13.6 Tweepy
- 13.7 Authenticating with Twitter Via Tweepy
- Self Check
- 13.8 Getting Information About a Twitter Account
- Self Check
- 13.9 Introduction to Tweepy Cursors: Getting an Account’s Followers and Friends
- 13.9.1 Determining an Account’s Followers
- Creating a Cursor
- Getting Results
- Automatic Paging
- Getting Follower IDs Rather Than Followers
- Self Check
- 13.9.2 Determining Whom an Account Follows
- Self Check
- 13.9.3 Getting a User’s Recent Tweets
- Grabbing Recent Tweets from Your Own Timeline
- Self Check
- 13.10 Searching Recent Tweets
- 13.11 Spotting Trends: Twitter Trends API
- 13.11.1 Places with Trending Topics
- Self Check
- 13.11.2 Getting a List of Trending Topics
- Worldwide Trending Topics
- New York City Trending Topics
- Self Check
- 13.11.3 Create a Word Cloud from Trending Topics
- Self Check
- 13.12 Cleaning/Preprocessing Tweets for Analysis
- Self Check
- 13.13 Twitter Streaming API
- 13.13.1 Creating a Subclass of StreamListener
- Class TweetListener
- Class TweetListener: __init__ Method
- Class TweetListener: on_connect Method
- Class TweetListener: on_status Method
- 13.13.2 Initiating Stream Processing
- Authenticating
- Creating a TweetListener
- Creating a Stream
- Starting the Tweet Stream
- Asynchronous vs. Synchronous Streams
- Other filter Method Parameters
- Twitter Restrictions Note
- Self Check
- 13.14 Tweet Sentiment Analysis
- 13.15 Geocoding and Mapping
- Self Check
- 13.15.1 Getting and Mapping the Tweets
- Get the API Object
- Collections Required By LocationListener
- Creating the LocationListener
- Configure and Start the Stream of Tweets
- Displaying the Location Statistics
- Geocoding the Locations
- Displaying the Bad Location Statistics
- Cleaning the Data
- Creating a Map with Folium
- Creating Popup Markers for the Tweet Locations
- Saving the Map
- Self Check
- 13.15.2 Utility Functions in tweetutilities.py
- get_tweet_content Utility Function
- get_geocodes Utility Function
- Self Check
- 13.15.3 Class LocationListener
- 13.16 Ways to Store Tweets
- 13.17 Twitter and Time Series
- 13.18 Wrap-Up
- Exercises
- 14 IBM Watson and Cognitive Computing
- Outline
- 14.1 Introduction: IBM Watson and Cognitive Computing
- Self Check
- 14.2 IBM Cloud Account and Cloud Console
- Self Check
- 14.3 Watson Services
- Watson Assistant
- Visual Recognition
- Speech to Text
- Text to Speech
- Language Translator
- Natural Language Understanding
- Discovery
- Personality Insights
- Tone Analyzer
- Natural Language Classifier
- Synchronous and Asynchronous Capabilities
- Self Check
- 14.4 Additional Services and Tools
- Watson Studio
- Knowledge Studio
- Machine Learning
- Knowledge Catalog
- Cognos Analytics
- Self Check
- 14.5 Watson Developer Cloud Python SDK
- Modules We’ll Need for Audio Recording and Playback
- SDK Examples
- Self Check
- 14.6 Case Study: Traveler’s Companion Translation App
- Self Check
- 14.6.1 Before You Run the App
- Registering for the Speech to Text Service
- Registering for the Text to Speech Service
- Registering for the Language Translator Service
- Retrieving Your Credentials
- Self Check
- 14.6.2 Test-Driving the App
- Processing the Question
- Processing the Response
- Self Check
- 14.6.3 SimpleLanguageTranslator.py Script Walkthrough
- Importing Watson SDK Classes
- Other Imported Modules
- Main Program: Function run_translator
- Function speech_to_text
- Function translate
- Function text_to_speech
- Function record_audio
- Function play_audio
- Executing the run_translator Function
- Self Check
- 14.7 Watson Resources
- Self Check
- 14.8 Wrap-Up
- Exercises
- 15 Machine Learning: Classification, Regression and Clustering
- Outline
- 15.1 Introduction to Machine Learning
- 15.1.1 Scikit-Learn
- Which Scikit-Learn Estimator Should You Choose for Your Project
- 15.1.2 Types of Machine Learning
- Supervised Machine Learning
- Datasets
- Classification
- Regression
- Unsupervised Machine Learning
- K-Means Clustering and the Iris Dataset
- Big Data and Big Computer Processing Power
- 15.1.3 Datasets Bundled with Scikit-Learn
- 15.1.4 Steps in a Typical Data Science Study
- Self Check
- 15.2 Case Study: Classification with k-Nearest Neighbors and the Digits Dataset, Part 1
- Self Check
- 15.2.1 k-Nearest Neighbors Algorithm
- Hyperparameters and Hyperparameter Tuning
- Self Check
- 15.2.2 Loading the Dataset
- Displaying the Description
- Checking the Sample and Target Sizes
- A Sample Digit Image
- Preparing the Data for Use with Scikit-Learn
- Self Check
- 15.2.3 Visualizing the Data
- Creating the Diagram
- Displaying Each Image and Removing the Axes Labels
- Self Check
- 15.2.4 Splitting the Data for Training and Testing
- Training and Testing Set Sizes
- Self Check
- 15.2.5 Creating the Model
- 15.2.6 Training the Model
- Self Check
- 15.2.7 Predicting Digit Classes
- Self Check
- 15.3 Case Study: Classification with k-Nearest Neighbors and the Digits Dataset, Part 2
- 15.3.1 Metrics for Model Accuracy
- Estimator Method score
- Confusion Matrix
- Classification Report
- Visualizing the Confusion Matrix
- Self Check
- 15.3.2 K-Fold Cross-Validation
- KFold Class
- Using the KFold Object with Function cross_val_score
- Self Check
- 15.3.3 Running Multiple Models to Find the Best One
- Scikit-Learn Estimator Diagram
- Self Check
- 15.3.4 Hyperparameter Tuning
- Self Check
- 15.4 Case Study: Time Series and Simple Linear Regression
- Self Check
- 15.5 Case Study: Multiple Linear Regression with the California Housing Dataset
- 15.5.1 Loading the Dataset
- Loading the Data
- Displaying the Dataset’s Description
- 15.5.2 Exploring the Data with Pandas
- Self Check
- 15.5.3 Visualizing the Features
- Self Check
- 15.5.4 Splitting the Data for Training and Testing
- 15.5.5 Training the Model
- Self Check
- 15.5.6 Testing the Model
- 15.5.7 Visualizing the Expected vs. Predicted Prices
- 15.5.8 Regression Model Metrics
- Self Check
- 15.5.9 Choosing the Best Model
- 15.6 Case Study: Unsupervised Machine Learning, Part 1—Dimensionality Reduction
- Loading the Digits Dataset
- Creating a TSNE Estimator for Dimensionality Reduction
- Transforming the Digits Dataset’s Features into Two Dimensions
- Visualizing the Reduced Data
- Visualizing the Reduced Data with Different Colors for Each Digit
- Self Check
- 15.7 Case Study: Unsupervised Machine Learning, Part 2—k-Means Clustering
- Self Check
- 15.7.1 Loading the Iris Dataset
- Checking the Numbers of Samples, Features and Targets
- 15.7.2 Exploring the Iris Dataset: Descriptive Statistics with Pandas
- 15.7.3 Visualizing the Dataset with a Seaborn pairplot
- Displaying the pairplot in One Color
- Self Check
- 15.7.4 Using a KMeans Estimator
- Creating the Estimator
- Fitting the Model
- Comparing the Computer Cluster Labels to the Iris Dataset’s Target Values
- Self Check
- 15.7.5 Dimensionality Reduction with Principal Component Analysis
- Creating the PCA Object
- Transforming the Iris Dataset’s Features into Two Dimensions
- Visualizing the Reduced Data
- Self Check
- 15.7.6 Choosing the Best Clustering Estimator
- 15.8 Wrap-Up
- Exercises
- 16 Deep Learning
- Objectives
- Outline
- 16.1 Introduction
- Self Check
- 16.1.1 Deep Learning Applications
- 16.1.2 Deep Learning Demos
- 16.1.3 Keras Resources
- 16.2 Keras Built-In Datasets
- 16.3 Custom Anaconda Environments
- Self Check
- 16.4 Neural Networks
- Self Check
- 16.5 Tensors
- Self Check
- 16.6 Convolutional Neural Networks for Vision; Multi-Classification with the MNIST Dataset
- Self Check
- 16.6.1 Loading the MNIST Dataset
- Self Check
- 16.6.2 Data Exploration
- Visualizing Digits
- 16.6.3 Data Preparation
- Reshaping the Image Data
- Normalizing the Image Data
- One-Hot Encoding: Converting the Labels From Integers to Categorical Data
- Self Check
- 16.6.4 Creating the Neural Network
- Adding Layers to the Network
- Convolution
- Adding a Convolution Layer
- Dimensionality of the First Convolution Layer’s Output
- Overfitting
- Adding a Pooling Layer
- Adding Another Convolutional Layer and Pooling Layer
- Flattening the Results
- Adding a Dense Layer to Reduce the Number of Features
- Adding Another Dense Layer to Produce the Final Output
- Printing the Model’s Summary
- Visualizing a Model’s Structure
- Compiling the Model
- Self Check
- 16.6.5 Training and Evaluating the Model
- Evaluating the Model
- Making Predictions
- Locating the Incorrect Predictions
- Visualizing Incorrect Predictions
- Displaying the Probabilities for Several Incorrect Predictions
- Self Check
- 16.6.6 Saving and Loading a Model
- Self Check
- 16.7 Visualizing Neural Network Training with TensorBoard
- Self Check
- 16.8 ConvnetJS: Browser-Based Deep-Learning Training and Visualization
- 16.9 Recurrent Neural Networks for Sequences; Sentiment Analysis with the IMDb Dataset
- Self Check
- 16.9.1 Loading the IMDb Movie Reviews Dataset
- Self Check
- 16.9.2 Data Exploration
- Movie Review Encodings
- Decoding a Movie Review
- 16.9.3 Data Preparation
- Splitting the Test Data into Validation and Test Data
- Self Check
- 16.9.4 Creating the Neural Network
- Adding an Embedding Layer
- Adding an LSTM Layer
- Adding a Dense Output Layer
- Compiling the Model and Displaying the Summary
- Self Check
- 16.9.5 Training and Evaluating the Model
- 16.10 Tuning Deep Learning Models
- Self Check
- 16.11 Convnet Models Pretrained on ImageNet
- 16.12 Reinforcement Learning
- 16.12.1 Deep Q-Learning
- 16.12.2 OpenAI Gym
- 16.13 Wrap-Up
- Exercises
- Convolutional Neural Networks
- Recurrent Neural Networks
- ConvnetJS Visualization
- Convolutional Neural Network Projects and Research
- Recurrent Neural Network Projects and Research
- Automated Deep Learning Project
- Reinforcement Learning Projects and Research
- Generative Deep Learning
- Deep Fakes
- Additional Research
- 17 Big Data: Hadoop, Spark, NoSQL and IoT
- Objectives
- Outline
- 17.1 Introduction
- Self Check for Section 17.1
- 17.2 Relational Databases and Structured Query Language (SQL)
- Self Check
- 17.2.1 A books Database
- Self Check
- 17.2.2 SELECT Queries
- 17.2.3 WHERE Clause
- Pattern Matching: Zero or More Characters
- Pattern Matching: Any Character
- Self Check
- 17.2.4 ORDER BY Clause
- Sorting By Multiple Columns
- Combining the WHERE and ORDER BY Clauses
- Self Check
- 17.2.5 Merging Data from Multiple Tables: INNER JOIN
- Self Check
- 17.2.6 INSERT INTO Statement
- Note Regarding Strings That Contain Single Quotes
- 17.2.7 UPDATE Statement
- 17.2.8 DELETE FROM Statement
- Self Check for Section 17.2
- 17.3 NoSQL and NewSQL Big-Data Databases: A Brief Tour
- 17.3.1 NoSQL Key–Value Databases
- 17.3.2 NoSQL Document Databases
- 17.3.3 NoSQL Columnar Databases
- 17.3.4 NoSQL Graph Databases
- 17.3.5 NewSQL Databases
- Self Check for Section 17.3
- 17.4 Case Study: A MongoDB JSON Document Database
- 17.4.1 Creating the MongoDB Atlas Cluster
- Creating Your First Database User
- Whitelist Your IP Address
- Connect to Your Cluster
- 17.4.2 Streaming Tweets into MongoDB
- Use Tweepy to Authenticate with Twitter
- Loading the Senators’ Data
- Configuring the MongoClient
- Setting up Tweet Stream
- Starting the Tweet Stream
- Class TweetListener
- Counting Tweets for Each Senator
- Show Tweet Counts for Each Senator
- Get the State Locations for Plotting Markers
- Grouping the Tweet Counts by State
- Creating the Map
- Creating a Choropleth to Color the Map
- Creating the Map Markers for Each State
- Displaying the Map
- Self Check for Section 17.4
- 17.5 Hadoop
- 17.5.1 Hadoop Overview
- HDFS, MapReduce and YARN
- Hadoop Ecosystem
- Hadoop Providers
- Hadoop 3
- 17.5.2 Summarizing Word Lengths in Romeo and Juliet via MapReduce
- 17.5.3 Creating an Apache Hadoop Cluster in Microsoft Azure HDInsight
- Creating an HDInsight Hadoop Cluster
- 17.5.4 Hadoop Streaming
- 17.5.5 Implementing the Mapper
- 17.5.6 Implementing the Reducer
- 17.5.7 Preparing to Run the MapReduce Example
- Copying the Script Files to the HDInsight Hadoop Cluster
- Copying RomeoAndJuliet into the Hadoop File System
- 17.5.8 Running the MapReduce Job
- Viewing the Word Counts
- Deleting Your Cluster So You Do Not Incur Charges
- Self Check for Section 17.5
- 17.6 Spark
- 17.6.1 Spark Overview
- History
- Architecture and Components
- Providers
- 17.6.2 Docker and the Jupyter Docker Stacks
- Docker
- Installing Docker
- Jupyter Docker Stacks
- Run Jupyter Docker Stack
- Opening JupyterLab in Your Browser
- Accessing the Docker Container’s Command Line
- Stopping and Restarting a Docker Container
- 17.6.3 Word Count with Spark
- Loading the NLTK Stop Words
- Configuring a SparkContext
- Reading the Text File and Mapping It to Words
- Removing the Stop Words
- Counting Each Remaining Word
- Locating Words with Counts Greater Than or Equal to 60
- Sorting and Displaying the Results
- 17.6.4 Spark Word Count on Microsoft Azure
- Create an Apache Spark Cluster in HDInsight Using the Azure Portal
- Install Libraries into a Cluster
- Copying RomeoAndJuliet.txt to the HDInsight Cluster
- Accessing Jupyter Notebooks in HDInsight
- Uploading the RomeoAndJulietCounter.ipynb Notebook
- Modifying the Notebook to Work with Azure
- Self Check for Section 17.6
- 17.7 Spark Streaming: Counting Twitter Hashtags Using the pyspark-notebook Docker Stack
- 17.7.1 Streaming Tweets to a Socket
- Executing the Script in the Docker Container
- starttweetstream.py import Statements
- Class TweetListener
- Main Application
- 17.7.2 Summarizing Tweet Hashtags; Introducing Spark SQL
- Importing the Libraries
- Utility Function to Get the SparkSession
- Utility Function to Display a Barchart Based on a Spark DataFrame
- Utility Function to Summarize the Top-20 Hashtags So Far
- Getting the SparkContext
- Getting the StreamingContext
- Setting Up a Checkpoint for Maintaining State
- Connecting to the Stream via a Socket
- Tokenizing the Lines of Hashtags
- Mapping the Hashtags to Tuples of Hashtag-Count Pairs
- Totaling the Hashtag Counts So Far
- Specifying the Method to Call for Every RDD
- Starting the Spark Stream
- Self Check for Section 17.7
- 17.8 Internet of Things and Dashboards
- 17.8.1 Publish and Subscribe
- 17.8.2 Visualizing a PubNub Sample Live Stream with a Freeboard Dashboard
- Signing up for Freeboard.io
- Creating a New Dashboard
- Adding a Data Source
- Adding a Pane for the Humidity Sensor
- Adding a Gauge to the Humidity Pane
- Adding a Sparkline to the Humidity Pane
- Completing the Dashboard
- 17.8.3 Simulating an Internet-Connected Thermostat in Python
- Installing Dweepy
- Invoking the simulator.py Script
- Sending Dweets
- 17.8.4 Creating the Dashboard with Freeboard.io
- 17.8.5 Creating a Python PubNub Subscriber
- Message Format
- Importing the Libraries
- List and DataFrame Used for Storing Company Names and Prices
- Class SensorSubscriberCallback
- Function Update
- Configuring the Figure
- Configuring the FuncAnimation and Displaying the Window
- Configuring the PubNub Client
- Subscribing to the Channel
- Ensuring the Figure Remains on the Screen
- Self Check for Section 17.8
- 17.9 Wrap-Up
- Exercises
- SQL and RDBMS Exercises
- NoSQL Database Exercises
- Hadoop Exercises
- Spark Exercises
- IoT and Pub/Sub Exercises
- Platform Exercises
- Other Exercises
- Index
- Symbols
- Numerics
- A
- B
- C
- D
- E
- F
- G
- H
- I
- J
- K
- L
- M
- N
- O
- P
- Q
- R
- S
- T
- U
- V
- W
- X
- Y
- Z




