First, let's generate a dataset.
groups<-rep(c("group1","group2","group3","group4"),10)
individual<-rep(1:10,each=4)
levels<-rep(c("A","B"),each=20)
response<-c((rnorm(20, mean = 2.3, sd = 1)),rnorm(20, mean = 4.7, sd = 2.3))
#Generating some random normal distributions using rnorm.
#That first number is the length (i.e., how many numbers you want in each set).
(original.data<-data.frame(individual,levels,groups, response))
#So, here's how I have my data to begin with:
# individual levels groups response
#1 1 A group1 3.9294112
#2 1 A group2 1.7416195
#3 1 A group3 3.2474873
# etc. to row 40 (40 observations of 10 individuals with
#4 observations each and 1 observation per group).
#Use str(original.data) to check if the response variable is numeric and the others are factors.
str(original.data) #yep!
#And here's what I want it to look like:
# individual levels group1 group2 group3 group4
#1 1 A 2.510969 2.6601334 2.968813 4.294844
#2 2 A 3.096240 2.5438054 2.316189 2.561755
#3 3 A 2.641018 1.7682642 1.540212 1.674456
#4 4 A 3.720303 1.7542154 2.829152 3.165893
#5 5 A 2.339273 0.7974890 3.084249 3.243764
#6 6 B 6.960439 8.5711567 5.805974 7.056487
#7 7 B 5.642962 4.5452340 6.922783 5.644429
#8 8 B 4.449354 6.0973501 5.214315 3.128638
#9 9 B 5.358301 5.5186123 6.792033 5.655760
#10 10 B -2.564175 0.8425047 7.772034 5.613354
#Start the reshape2 package.library(reshape2)
melt.data<-melt(original.data,id=c("levels", "groups", "individual"),
measured=c("response")) #variable measured is response
melt.data #show data
# levels groups individual variable value
#1 A group1 1 response 0.8034790
#2 A group2 1 response 2.7244518
#3 A group3 1 response 2.0623880
#Etc. on to 40 rows.melt.data$variable<-melt.data$groups
#Set the variable here. "groups" is the factor that we want to make #up our four new columns.
melt.data
#Here is what melt.data looks like now:
# levels groups individual variable value
#1 A group1 1 group1 0.8034790
#2 A group2 1 group2 2.7244518
#3 A group3 1 group3 2.0623880
#Etc. on to 40 rows.
data.columns<-dcast(melt.data, individual+levels~variable,mean)#List the factors that do NOT go as the new columns.
#Because we want groups to be the new columns, you list only individual and levels.
#Then after ~ put "variable"and that you want the means of this measurement for each combination of individual and levels.
#In this example, you just have one observation for each group in each individual in each level,
#so it should be the same as your original data.
#If you took more than one observation for each individual in each level and each group,
#and didn't include it in the list with individual and levels,
#using mean would allow you to average across those.
#This might happen if you took multiple observations and hadn't averaged them yet,
#or had an additional factor that you are not examining in this analysis.
#Use dcast if you want a data frame as the resulting object;
#use acast if you want a matrix or vector.
#("cast" is the old function from the original reshape package;
#with the old function you had to specify data.frame if you wanted a data frame).
data.columns
#shows your dataset, now with measurements for each individual in a row,
#with one column each for group1, group2, group3, and group4
#instead of classified by factors in the old "groups" column.
#It's just like we wanted at the beginning!
#Did you start with data in the column form?
#Or do you want to put your data back the way it was
#but now as means for each combination of factors?
#For example, one use I've found for this is to
#put my data in the multi-column form, use na.omit
#to get rid of any individuals with missing data,
#and then convert back to long form.
data.longform<-data.frame(melt(data.columns, id=c("individual","levels")))
data.longform #view the dataset back in a long one-column form.
individual levels variable value
#1 1 A group1 2.5109692
#2 2 A group1 3.0962398
#3 3 A group1 2.6410184
#4 4 A group1 3.7203027
#Etc. until row 40 (end of data).
No comments:
Post a Comment
Comments and suggestions welcome.