尘缘 – CHENYUAN

tidyr (gather, spread) takes the place of reshape2 (melt, cast)

Wide data

Wide data has a column for each variable. wide data are more readable

#   ozone   wind  temp
# 1 23.62 11.623 65.55
# 2 29.44 10.267 79.10
# 3 59.12  8.942 83.90

Long data

Long-format data has a column for possible variable types and a column for the values of those variables. long data are more convenient for data analysis
Long-format data isn’t necessarily only two columns. In other words, there are different levels of “longness”.

#    variable  value
# 1     ozone 23.615
# 2     ozone 29.444
# 3     ozone 59.115
# 4      wind 11.623
# 5      wind 10.267
# 6      wind  8.942
# 7      temp 65.548
# 8     temp 79.100
# 9     temp 83.903

There are four fundamental functions of data tidying:

gather(): takes multiple columns, and gathers them into key-value pairs: it makes “wide” data longer.
spread(): takes two columns (key & value) and spreads in to multiple columns, it makes “long” data wider
separate() splits a single column into multiple columns
unite() combines multiple columns into a single column

#install.packages("tidyr")
suppressPackageStartupMessages(library(tidyr))

`gather()`: Reshaping wide format to long format

gather(data, key, value, ..., na.rm = FALSE, convert = FALSE)   
= data %>% gather(key, value, ..., na.rm = FALSE, convert = FALSE)   
  
  data: data frame     
  key: column name representing new variable    
  value: column name representing variable values   
  ...: names of columns to gather (or not gather)    
  na.rm: whether remove observations with missing values     
  convert: if TRUE will automatically convert values to logical, integer, numeric, complex or factor as appropriate

head(airquality)

##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 3    12     149 12.6   74     5   3
## 4    18     313 11.5   62     5   4
## 5    NA      NA 14.3   56     5   5
## 6    28      NA 14.9   66     5   6

long_DF <- airquality %>% gather(characters, measurements, Ozone:Temp) 
#long_DF <- airquality %>% gather(characters, measurements, 1:4) #same results
#long_DF <- airquality %>% gather(characters, measurements, -Month, -Day) #same results 
#long_DF <- airquality %>% gather(characters, measurements, Ozone, Solar.R, Wind, Temp) #same results
head(long_DF)

##   Month Day characters measurements
## 1     5   1      Ozone           41
## 2     5   2      Ozone           36
## 3     5   3      Ozone           12
## 4     5   4      Ozone           18
## 5     5   5      Ozone           NA
## 6     5   6      Ozone           28

`spread( )`: Reshaping long format to wide format

spread(data, key, value, fill = NA, convert = FALSE)
= data %>% spread(key, value, fill = NA, convert = FALSE)

  data: data frame
  key: column values to convert to multiple columns
  value: single column values to convert to multiple columns' values 
  fill: If there isn't a value for every combination of the other variables and the key column, this value will be substituted
  convert: if TRUE will automatically convert values to logical, integer, numeric, complex or factor as appropriate

wide_DF <- long_DF %>% spread(characters, measurements) #opposite to gather()

`unite()`: Merging two variables into one

unite(data, col, ..., sep = " ", remove = TRUE)
= data %>% unite(col, ..., sep = " ", remove = TRUE)

  data: data frame
  col: column name of new "merged" column
  ...: names of columns to merge
  sep: separator to use between merged values
  remove: if TRUE, remove input column from output data frame

create some fake data:

set.seed(1)
date <- as.Date('2016-01-01') + 0:14
hour <- sample(1:24, 15)
min <- sample(1:60, 15)
second <- sample(1:60, 15)
event <- sample(letters, 15)
data <- data.frame(date, hour, min, second, event)
head(data)

##         date hour min second event
## 1 2016-01-01    7  30     29     u
## 2 2016-01-02    9  43     36     a
## 3 2016-01-03   13  58     60     l
## 4 2016-01-04   20  22     11     q
## 5 2016-01-05    5  44     47     p
## 6 2016-01-06   18  52     37     k

unite_DF <- data %>%
  unite(datehour, date, hour, sep = ' ') %>%
  unite(datetime, datehour, min, second, sep = ':')   
head(unite_DF)

##              datetime event
## 1  2016-01-01 7:30:29     u
## 2  2016-01-02 9:43:36     a
## 3 2016-01-03 13:58:60     l
## 4 2016-01-04 20:22:11     q
## 5  2016-01-05 5:44:47     p
## 6 2016-01-06 18:52:37     k

`separate()`: Splitting a single variable into two

separate(data, col, into, sep = " ", remove = TRUE, convert = FALSE)          
= data %>% separate(col, into, sep = " ", remove = TRUE, convert = FALSE)

  data: data frame   
  col: column name representing current variable        
  into: names of variables representing new variables       
  sep: how to separate current variable (char, num, or symbol).       
       If no spearator is identified, "_" will automatically be used  
  remove: if TRUE, remove input column from output data frame      
  convert: if TRUE will automatically convert values to logical, integer, numeric, complex or factor as appropriate     

separate_DF <- unite_DF %>% 
  separate(datetime, c('date', 'time'), sep = ' ') %>% 
  separate(time, c('hour', 'min', 'second'), sep = ':')
head(separate_DF)

##         date hour min second event
## 1 2016-01-01    7  30     29     u
## 2 2016-01-02    9  43     36     a
## 3 2016-01-03   13  58     60     l
## 4 2016-01-04   20  22     11     q
## 5 2016-01-05    5  44     47     p
## 6 2016-01-06   18  52     37     k

References

Data Processing with dplyr & tidyr
tidyr and reshape2

Favorites

Categories

Tags

Lists

About

Home

尘缘

tidyr

Wide data

Long data

There are four fundamental functions of data tidying:

`gather()`: Reshaping wide format to long format

`spread( )`: Reshaping long format to wide format

`unite()`: Merging two variables into one

`separate()`: Splitting a single variable into two

References

Favorites

Categories

Tags

Lists

About

Home

尘缘

Wide data

Long data

There are four fundamental functions of data tidying:

gather(): Reshaping wide format to long format

spread( ): Reshaping long format to wide format

unite(): Merging two variables into one

separate(): Splitting a single variable into two

References

`gather()`: Reshaping wide format to long format

`spread( )`: Reshaping long format to wide format

`unite()`: Merging two variables into one

`separate()`: Splitting a single variable into two