Random R commands everyone should know
I hate R with a passion! With the invention of iPython + Statsmodels + Pandas, or even Julia these days, it is becoming painfully obvious that it is the worst option of them all. The syntax sucks, there are 10 ways to do the same thing, and it isn’t scalable. Unfortunately, if you work in data science, it is still unavoidable a lot of statistical work is still done in R.
I dont think you need to learn the language fluently, but here are a series of commands that definitely will help you get by:
Install A Package
install.packages("ggplot2")
Changing your working directory
setwd("path/to/working/directory")
Run an R script
source("path/to/script.R")
Open plot window from within R script
dev.new()
bp <- boxplot(log(posts$all.count+1) ~ posts$post.hour,
col="lightblue",
xaxt="n",
pch=19,
xlab="Hour Of Day [0 is 12AM]",
ylab="log(Likes + Shares + Comments)",
main="Total Number Of Post Interactions vs Hour Of Day")
Reading data from a CSV
posts <- read.csv( '/path/to/file.csv', header=TRUE, strip.white=TRUE)
Converting a Data Frame Column From String to Date
df$date <- as.Date(df$date,format="%y/%m/%d")
Calculate the sum of multiple columns in a Data Frame
df$avg.interaction <- apply(df[,c(3,4,5)],1,sum,na.rm=TRUE)
Calculate the mean of multiple columns in a Data Frame
df$avg.interaction <- apply(df[,c(3,4,5)],1,mean,na.rm=TRUE)
Filtering A Data Frame Column On Upper + Lower Bounds
df$filtered <- df[posts$values<=upper_bound & posts$value>=lower_bound,]
String Concatenation
paste("Hello", " World")
Dump the Results of a Linear Regression
print(summary(lm(y ~ x),data=dataset)))
Number of Rows In A Dataframe
NROW(data)
Number of Columns In A Dataframe
NCOL(data)
Names of dataframe columns
names(data)
List all variables in the workspace
ls()