Finished reading the book.
This book is aimed at giving intermediate to advanced level users of R (who have knowledge about datasets) an opportunity to use state-of-the-art approaches in data manipulation. It discusses the types of data that can be handled using R and different types of operations for those data types. Upon reading this book, readers will be able to efficiently manage and check the validity of their datasets with the effective use of R programming, including specialized packages for data management. Readers will come to know about the split-apply-combine strategy, which is the state-of-the-art approach in data management. This book ends with an introduction to how R can be utilized with different database software.
Some basic knowledge of R and statistical data is required to understand this book fully. It firstly talks very briefly about what R is. Then we are introduced to what R objects are and their modes and classes. Finally, it discusses different R objects, such as vector, factor, data frame, matrix, and list. We then move onto basic data manipulation, subscripting and subsetting. It explains what the split-apply-combine strategy is and why it is important in data manipulations,using the plyr package, in which group-wise data manipulation can be implemented efficiently. It then introduces a theoritical framework for reshaping datasets and discusses structural missing and sampling zero, and how to deal with those missing during the melting process. We then learn how to use R with databases using filehash and ff. We also discussed sqldf for faster data manipulation.
Get the book here – http://goo.gl/HruZEr