I got started sorting data in SQL. Nice select functions where some variable equals some value and you can get distinct or unique values. R confused me. I was delighted to find an R package that allows the use of SQL selects in R, but it can occasionally be a bit clumsy due to differences in table and object naming. I kept seeing references to dplyr as the modern way to use R natively to organize my data. So, I have decided it is time to start learning dplyr and already it is helping a ton. I recommend starting with the dplyr vignette. I also found this tutorial with different sample data to be helpful. I tried out chaining (the mysterious %>% operator I have seen lurking in code occasionally) and it was fantastic. No more weird intermediate variables! The tutorial describes a different-package-specific version of chaining, but dplyr implements it as well (the help file says it was formerly '%.%' but the '%>%' version has become standard) so it worked fine even though I hadn't installed the other 'magrittr' package mentioned on that page. So far the biggest help to me is the distinct() function, which gets unique combinations of factors as I get when I do simpler and simpler SQL selects (instead of getting repeated rows of categorical data when I try to subset using base R for some variable that does have additional unique variables that I am not currently interested in).
No comments:
Post a Comment
Comments and suggestions welcome.