SOLUTION: SIU R Script R in Action Data Analysis and Graphics with R Programming Worksheet

SECOND EDITION
IN ACTION
Data analysis and graphics with R
Robert I. Kabacoff
MANNING
Praise for the First Edition
Lucid and engaging—this is without doubt the fun way to learn R!
—Amos A. Folarin, University College London
Be prepared to quickly raise the bar with the sheer quality that R can produce.
—Patrick Breen, Rogers Communications Inc.
An excellent introduction and reference on R from the author of the best R website.
—Christopher Williams, University of Idaho
Thorough and readable. A great R companion for the student or researcher.
—Samuel McQuillin, University of South Carolina
Finally, a comprehensive introduction to R for programmers.
—Philipp K. Janert, Author of Gnuplot in Action
Essential reading for anybody moving to R for the first time.
—Charles Malpas, University of Melbourne
One of the quickest routes to R proficiency. You can buy the book on Friday and
have a working program by Monday.
—Elizabeth Ostrowski, Baylor College of Medicine
One usually buys a book to solve the problems they know they have. This book
solves problems you didn’t know you had.
—Carles Fenollosa, Barcelona Supercomputing Center
Clear, precise, and comes with a lot of explanations and examples…the book can
be used by beginners and professionals alike, and even for teaching R!
—Atef Ouni, Tunisian National Institute of Statistics
A great balance of targeted tutorials and in-depth examples.
—Landon Cox, 360VL Inc.
ii
R in Action
SECOND EDITION
Data analysis and graphics with R
ROBERT I. KABACOFF
MANNING
SHELTER ISLAND
iv
For online information and ordering of this and other Manning books, please visit
www.manning.com. The publisher offers discounts on this book when ordered in quantity.
For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 761
Shelter Island, NY 11964
Email: orders@manning.com
©2015 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by means electronic, mechanical, photocopying, or otherwise, without prior written
permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in the book, and Manning
Publications was aware of a trademark claim, the designations have been printed in initial caps
or all caps.
Recognizing the importance of preserving what has been written, it is Manning’s policy to have
the books we publish printed on acid-free paper, and we exert our best efforts to that end.
Recognizing also our responsibility to conserve the resources of our planet, Manning books are
printed on paper that is at least 15 percent recycled and processed without elemental chlorine.
Manning Publications Co.
20 Baldwin Road
PO Box 761
Shelter Island, NY 11964
ISBN: 9781617291388
Printed in the United States of America
1 2 3 4 5 6 7 8 9 10 – EBM – 20 19 18 17 16 15
Development editor:
Copyeditor:
Proofreader:
Typesetter:
Cover designer:
Jennifer Stout
Tiffany Taylor
Toma Mulligan
Marija Tudor
Marija Tudor
brief contents
PART 1
PART 2
PART 3
GETTING STARTED ……………………………………………… 1
1

Introduction to R
3
2

Creating a dataset
20
3

Getting started with graphs
4

Basic data management
5

Advanced data management
46
71
89
BASIC METHODS ……………………………………………… 115
6

Basic graphs
7

117
Basic statistics
137
INTERMEDIATE METHODS …………………………………. 165
8

Regression
167
9

Analysis of variance
10

Power analysis
11

Intermediate graphs
12

Resampling statistics and bootstrapping
212
239
v
255
279
vi
PART 4
PART 5
BRIEF CONTENTS
ADVANCED METHODS ……………………………………….. 299
13

Generalized linear models
14

301
Principal components and factor analysis
15

Time series
16

Cluster analysis
17

Classification
18

Advanced methods for missing data
319
340
369
389
414
EXPANDING YOUR SKILLS ………………………………….. 435
19

Advanced graphics with ggplot2
437
20

Advanced programming
21

Creating a package
22

Creating dynamic reports
23

Advanced graphics with the lattice package
463
491
513
1
online only
contents
preface xvii
acknowledgments xix
about this book xxi
about the cover illustration
PART 1
1
xxvii
GETTING STARTED ……………………………………. 1
Introduction to R
1.1
1.2
1.3
3
Why use R? 5
Obtaining and installing R
Working with R 7
7
Getting started 8 Getting help
Input and output 13

1.4
Packages
10

The workspace
15
What are packages? 15 Installing a package
Loading a package 15 Learning about a
package 16


1.5
1.6
1.7
Batch processing 16
Using output as input: reusing results
Working with large datasets 17
vii
17
15
11
viii
CONTENTS
1.8
1.9
2
Working through an example
Summary 19
Creating a dataset
2.1
2.2
2.3
20
Understanding datasets
Data structures 22
Vectors 22
Factors 28
Data input
18


21
Matrices 23
Lists 30

Arrays
24

Data frames
25
32
Entering data from the keyboard 33 Importing data from a
delimited text file 34 Importing data from Excel 37
Importing data from XML 38 Importing data from the
web 38 Importing data from SPSS 38 Importing data
from SAS 39 Importing data from Stata 40 Importing
data from NetCDF 40 Importing data from HDF5 40
Accessing database management systems (DBMSs) 40
Importing data via Stat/Transfer 42








2.4
Annotating datasets
Variable labels
2.5
2.6
3
43

43
Value labels
Useful functions for working with data objects
Summary 44
Getting started with graphs
3.1
3.2
3.3
43
46
Working with graphs 47
A simple example 49
Graphical parameters 50
Symbols and lines 51 Colors 52
Graph and margin dimensions 54

3.4
43

Text characteristics
Adding text, customized axes, and legends
56
Titles 56 Axes 57 Reference lines 60 Legend
Text annotations 61 Math annotations 63




3.5
Combining graphs
64
Creating a figure arrangement with fine control
3.6
4
Summary
70
Basic data management
4.1
4.2
71
A working example 71
Creating new variables 73
68
60
53
ix
CONTENTS
4.3
4.4
4.5
Recoding variables 75
Renaming variables 76
Missing values 77
Recoding values to missing
from analyses 78
4.6
Date values
78
Excluding missing values

79
Converting dates to character variables
further 81
4.7
4.8
4.9
Going

Type conversions 81
Sorting data 82
Merging datasets 83
Adding columns to a data frame
a data frame 84
4.10
81
Subsetting datasets
83

Adding rows to
84
Selecting (keeping) variables 84 Excluding (dropping)
variables 84 Selecting observations 85 The subset()
function 86 Random samples 87




4.11
4.12
5
Using SQL statements to manipulate data
frames 87
Summary 88
Advanced data management
5.1
5.2
89
A data-management challenge 90
Numerical and character functions 91
Mathematical functions 91 Statistical functions 92
Probability functions 94 Character functions 97
Other useful functions 98 Applying functions to matrices
and data frames 99



5.3
5.4
A solution for the data-management challenge
Control flow 105
Repetition and looping
execution 106
5.5
5.6

Conditional
User-written functions 107
Aggregation and reshaping 109
Transpose 110
package 111
5.7
105
101
Summary
113

Aggregating data
110

The reshape2
x
CONTENTS
PART 2
6
BASIC METHODS …………………………………… 115
Basic graphs
6.1
117
Bar plots
118
Simple bar plots 118 Stacked and grouped bar plots
Mean bar plots 120 Tweaking bar plots 121
Spinograms 122

119

6.2
6.3
6.4
6.5
Pie charts 123
Histograms 125
Kernel density plots
Box plots 129
127
Using parallel box plots to compare groups
plots 132
6.6
6.7
7

Violin
Dot plots 133
Summary 136
Basic statistics
7.1
129
137
Descriptive statistics
138
A menagerie of methods 138 Even more methods 140
Descriptive statistics by group 142 Additional methods
by group 143 Visualizing results 144



7.2
Frequency and contingency tables
144
Generating frequency tables 145 Tests of
independence 151 Measures of association
Visualizing results 153


7.3
Correlations
152
153
Types of correlations 153 Testing correlations for
significance 156 Visualizing correlations 158


7.4
T-tests
158
Independent t-test 158 Dependent t-test 159
When there are more than two groups 160

7.5
Nonparametric tests of group differences
Comparing two groups
groups 161
7.6
7.7
160

160
Comparing more than two
Visualizing group differences
Summary 164
163
xi
CONTENTS
PART 3
8
INTERMEDIATE METHODS ……………………….. 165
Regression
8.1
167
The many faces of regression
168
Scenarios for using OLS regression
know 170
8.2
OLS regression
169

What you need to
171
Fitting regression models with lm() 172 Simple linear
regression 173 Polynomial regression 175
Multiple linear regression 178 Multiple linear regression
with interactions 180



8.3
Regression diagnostics
182
A typical approach 183 An enhanced approach 187
Global validation of linear model assumption 193
Multicollinearity 193

8.4
Unusual observations
194
Outliers 194 High-leverage points
observations 196

8.5
Corrective measures
8.7
9
Summary
202
206

203
206
Relative importance
One-way ANOVA
215
One-way ANCOVA

208
213
The order of formula terms
216
218
219

Assessing test assumptions
222
223
Assessing test assumptions
9.5
201
Variable selection
A crash course on terminology
Fitting ANOVA models 215
Multiple comparisons
9.4

212
The aov() function
9.3
Transforming variables 199
201 Trying a different
211
Analysis of variance
9.1
9.2
Influential

Taking the analysis further
Cross-validation
8.8

Selecting the “best” regression model
Comparing models

198
Deleting observations 199
Adding or deleting variables
approach 201
8.6
195
Two-way factorial ANOVA
225

226
Visualizing the results
225
xii
CONTENTS
9.6
9.7
Repeated measures ANOVA 229
Multivariate analysis of variance (MANOVA)
Assessing test assumptions
9.8
9.9
10
ANOVA as regression
Summary 238
Power analysis
10.1
10.2
234

232
Robust MANOVA
235
236
239
A quick review of hypothesis testing 240
Implementing power analysis with the pwr package
t-tests 243 ANOVA 245 Correlations 245
Linear models 246 Tests of proportions 247
Chi-square tests 248 Choosing an appropriate effect size
in novel situations 249




10.3
10.4
10.5
11
Creating power analysis plots
Other packages 252
Summary 253
Intermediate graphs
11.1
Scatter plots
251
255
256
Scatter-plot matrices 259 High-density scatter plots 261
3D scatter plots 263 Spinning 3D scatter plots 265
Bubble plots 266


11.2
11.3
11.4
11.5
12
Line charts 268
Corrgrams 271
Mosaic plots 276
Summary 278
Resampling statistics and bootstrapping
12.1
12.2
279
Permutation tests 280
Permutation tests with the coin package
282
Independent two-sample and k-sample tests 283
Independence in contingency tables 285 Independence
between numeric variables 285 Dependent two-sample
and k-sample tests 286 Going further 286



12.3
Permutation tests with the lmPerm package
Simple and polynomial regression 287 Multiple
regression 288 One-way ANOVA and ANCOVA
Two-way ANOVA 290
287


289
242
xiii
CONTENTS
12.4
12.5
12.6
Additional comments on permutation tests
Bootstrapping 291
Bootstrapping with the boot package 292
Bootstrapping a single statistic
statistics 296
12.7
PART 4
13
Summary
294

291
Bootstrapping several
298
ADVANCED METHODS …………………………….. 299
Generalized linear models
13.1
301
Generalized linear models and the glm() function
The glm() function 303 Supporting functions
Model fit and regression diagnostics 305

13.2
Logistic regression
302
304
306
Interpreting the model parameters 308 Assessing the impact
of predictors on the probability of an outcome 309
Overdispersion 310 Extensions 311


13.3
Poisson regression
312
Interpreting the model parameters
Extensions 317
13.4
14
Summary
314

Overdispersion
318
Principal components and factor analysis
14.1
14.2
315
319
Principal components and factor analysis in R
Principal components 322
321
Selecting the number of components to extract 323
Extracting principal components 324 Rotating principal
components 327 Obtaining principal components scores 328


14.3
Exploratory factor analysis
330
Deciding how many common factors to extract 331
Extracting common factors 332 Rotating factors 333
Factor scores 336 Other EFA-related packages 337


14.4
14.5
15
Other latent variable models
Summary 338
Time series
15.1
337
340
Creating a time-series object in R
343
xiv
CONTENTS
15.2
Smoothing and seasonal decomposition
Smoothing with simple moving averages
decomposition 347
15.3
Exponential forecasting models
345
345

Seasonal
352
Simple exponential smoothing 353 Holt and Holt-Winters
exponential smoothing 355 The ets() function and
automated forecasting 358


15.4
ARIMA forecasting models
359
Prerequisite concepts 359 ARMA and ARIMA models
Automated ARIMA forecasting 366

15.5
15.6
16
Going further 367
Summary 367
Cluster analysis
16.1
16.2
16.3
16.4
369
Common steps in cluster analysis 370
Calculating distances 372
Hierarchical cluster analysis 374
Partitioning cluster analysis 378
K-means clustering
16.5
16.6
17
378
18
393
Random forests 399
Support vector machines

384
Conditional inference trees
401
403
Choosing a best predictive solution 405
Using the rattle package for data mining 408
Summary 413
Advanced methods for missing data
18.1
18.2
382
Preparing the data 390
Logistic regression 392
Decision trees 393
Tuning an SVM
17.6
17.7
17.8
Partitioning around medoids
389
Classical decision trees
17.4
17.5

Avoiding nonexistent clusters
Summary 387
Classification
17.1
17.2
17.3
361
414
Steps in dealing with missing data
Identifying missing values 417
415
397
xv
CONTENTS
18.3
Exploring missing-values patterns
418
Tabulating missing values 419 Exploring missing data
visually 419 Using correlations to explore missing
values 422


18.4
18.5
18.6
18.7
18.8
Understanding the sources and impact of missing data 424
Rational approaches for dealing with incomplete data 425
Complete-case analysis (listwise deletion) 426
Multiple imputation 428
Other approaches to missing data 432
Pairwise deletion 432
imputation 433
18.9
PART 5
19
Summary
Simple (nonstochastic)

433
EXPANDING YOUR SKILLS ……………………….. 435
Advanced graphics with ggplot2
19.1
19.2
19.3
19.4
19.5
19.6
19.7
437
The four graphics systems in R 438
An introduction to the ggplot2 package 439
Specifying the plot type with geoms 443
Grouping 447
Faceting 450
Adding smoothed lines 453
Modifying the appearance of ggplot2 graphs 455
Axes 455 Legends 457 Scales
Multiple graphs per page 461

19.8
19.9
20

470

Creating
460

464
Control structures
Working with environments 475
Object-oriented programming 477
Generic functions
20.4
Themes
463
A review of the language
Data types 464
functions 473
20.2
20.3

Saving graphs 462
Summary 462
Advanced programming
20.1
458
477
Writing efficient code

Limitations of the S3 model
479
479
xvi
CONTENTS
20.5
Debugging
483
Common sources of errors 483 Debugging tools
Session options that support debugging 486

20.6
20.7
21
Going further 489
Summary 490
Creating a package
21.1
491
Nonparametric analysis and the npar package
Comparing groups with the npar package
21.2
484
Developing the package
492
494
496
Computing the statistics 497 Printing the results 500
Summarizing the results 501 Plotting the results 504
Adding sample data to the package 505


21.3
21.4
21.5
21.6
22
Creating the package documentation
Building the package 508
Going further 512
Summary 512
Creating dynamic reports
22.1
22.2
22.3
22.4
22.5
22.6
afterword
appendix A
appendix B
appendix C
appendix D
appendix E
appendix F
appendix G
513
A template approach to reports 515
Creating dynamic reports with R and Markdown 517
Creating dynamic reports with R and LaTeX 522
Creating dynamic reports with R and Open Document 525
Creating dynamic reports with R and Microsoft Word 527
Summary 531
Into the rabbit hole 532
Graphical user interfaces 535
Customizing the startup environment
Exporting data from R 540
Matrix algebra in R 542
Packages used in this book 544
Working with large datasets 551
Updating an R installation 555
references
index
bonus chapter 23
506
538
558
563
Advanced graphics with the lattice package
1
available online at manning.com/RinActionSecondEdition
also available in this eBook
preface
What is the use of a book, without pictures or conversations?
—Alice, Alice’s Adventures in Wonderland
It’s wondrous, with treasures to satiate desires both subtle and gross; but it’s not
for the timid.
—Q, “Q Who?” Stark Trek: The Next Generation
When I began writing this book, I spent quite a bit of time searching for a good quote
to start things off. I ended up with two. R is a wonderfully flexible platform and language for exploring, visualizing, and understanding data. I chose the quote from
Alice’s Adventures in Wonderland to capture the flavor of statistical analysis today—an
interactive process of exploration, visualization, and interpretation.
The second quote reflects the generally held notion that R is difficult to learn.
What I hope to show you is that is doesn’t have to be. R is broad and powerful, with so
many analytic and graphic functions available (more than 50,000 at last count) that it
easily intimidates both novice and experienced users alike. But there is rhyme and reason to the apparent madness. With guidelines and instructions, you can navigate the
tremendous resources available, selecting the tools you need to accomplish your work
with style, elegance, efficiency—and more than a little coolness.
I first encountered R several years ago, when applying for a new statistical consulting position. The prospective employer asked in the pre-interview material if I was
conversant in R. Following the standard advice of recruiters, I immediately said yes,
xvii
xviii
PREFACE
and set off to learn it. I was an experienced statistician and researcher, had 25 years
experience as an SAS and SPSS programmer, and was fluent in a half dozen programming languages. How hard could it be? Famous last words.
As I tried to learn the language (as fast as possible, with an interview looming), I
found either tomes on the underlying structure of the language or dense treatises on
specific advanced statistical methods, written by and for subject-matter experts. The
online help was written in a spartan style that was more reference than tutorial. Every
time I thought I had a handle on the overall organization and capabilities of R, I
found something new that made me feel ignorant and small.
To make sense of it all, I approached R as a data scientist. I thought about what it
takes to successfully process, analyze, and understand data, including







Accessing the data (getting the data into the application from multiple sources)
Cleaning the data (coding missing data, fixing or deleting miscoded data, transforming variables into more useful formats)
Annotating the data (in order to remember what each piece rep …
Purchase answer to see full
attachment

Order a unique copy of this paper
(550 words)

Approximate price: $22

Our Basic features
  • Free title page and bibliography
  • Plagiarism-free guarantee
  • Unlimited revisions
  • Money-back guarantee
  • 24/7 support
Our Options
  • Writer’s samples
  • Expert Proofreading
  • Overnight delivery
  • Part-by-part delivery
  • Copies of used sources
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

AcademicWritingCompany guarantees

Our customer is the center of what we do and thus we offer 100% original essays..
By ordering our essays, you are guaranteed the best quality through our qualified experts.All your information and everything that you do on our website is kept completely confidential.

Money-back guarantee

Academicwritingcompany.com always strives to give you the best of its services. As a custom essay writing service, we are 100% sure of our services. That is why we ensure that our guarantee of money-back stands, always

Read more

Zero-plagiarism tolerance guarantee

The paper that you order at academicwritingcompany.com is 100% original. We ensure that regardless of the position you are, be it with urgent deadlines or hard essays, we give you a paper that is free of plagiarism. We even check our orders with the most advanced anti-plagiarism software in the industry.

Read more

Free-revision guarantee

The Academicwritingcompany.com thrives on excellence and thus we help ensure the Customer’s total satisfaction with the completed Order.To do so, we provide a Free Revision policy as a courtesy service. To receive free revision the Academic writing Company requires that the you provide the request within Fifteen (14) days since the completion date and within a period of thirty (30) days for dissertations and research papers.

Read more

Privacy and Security policy

With Academicwritingcompan.com, your privacy is the most important aspect. First, the academic writing company will never resell your personal information, which include credit cards, to any third party. Not even your lecturer on institution will know that you bought an essay from our academic writing company.

Read more

Adherence to requirements guarantee

The academic writing company writers know that following essay instructions is the most important part of academic writing. The expert writers will, therefore, work extra hard to ensure that they cooperate with all the requirements without fail. We also count on you to help us provide a better academic paper.

Read more

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2020 at 10:52 AM
Total price:
$26
The price is based on these factors:
Customer Academic level
Number of pages required
Urgency of paper