Saturday, March 9, 2013

R: Data Handling 1


Subsetting Data
Using the subset(…) function.
Example:
> demo.dat
          Date Lat.degN Lon.degE Actual.SST.degC Bears Lctn
1  2009-02-27       1.5    103.5           28.01     1    A
2  2009-02-27       0.5    103.5           28.00     2    B
3  2009-03-06       1.5    103.5           28.44     3    A
4  2009-03-06       0.5    103.5           28.38     4    B
5  2009-03-13       1.5    103.5           28.34     5    A
.      .             .      .                .       .    .
.      .             .      .                .       .    .     
.      .             .      .                .       .    .         
19 2009-05-01       1.5    103.5           29.75    19    A
20 2009-05-01       0.5    103.5           29.84    20    B
> demo.A <- subset(demo.dat,Lctn=="A")
> demo.A
          Date Lat.degN Lon.degE Actual.SST.degC Bears Lctn
1  2009-02-27       1.5    103.5           28.01     1    A
3  2009-03-06       1.5    103.5           28.44     3    A
5  2009-03-13       1.5    103.5           28.34     5    A
7  2009-03-20       1.5    103.5           28.87     7    A
9  2009-03-27       1.5    103.5           29.20     9    A
11 2009-04-03       1.5    103.5           29.30    11    A
13 2009-04-10       1.5    103.5           29.63    13    A
15 2009-04-17       1.5    103.5           29.79    15    A
17 2009-04-24       1.5    103.5           30.00    17    A
19 2009-05-01       1.5    103.5           29.75    19   

Indexing
Selecting via indexing [i,j].
Use either indexing for selecting whole blocks, or just yank them out individually and put them in a dataframe.
The syntax for indexing is as follows:
           df[i,j]
Where df is the dataframe, i is the index number(s) of the cases and j is the index number(s) or the variables.  Either one can be left out – when that happens, all i or j will be selected.
Example:
> demo.dat1 <- demo.dat[,c(1,4)]# variables 1 and 4, all cases.
> demo.dat1
          Date Actual.SST.degC
1  2009-02-27            28.01
2  2009-02-27            28.00
3  2009-03-06            28.44
.      .                   .
.      .                   .
.      .                   .
19 2009-05-01            29.75
20 2009-05-01            29.84
> demo.dat2 <- demo.dat[c(1:5),c(1,4)]# cases 1 to 5, variables 1 and 4.
> demo.dat2
         Date Actual.SST.degC
1 2009-02-27            28.01
2 2009-02-27            28.00
3 2009-03-06            28.44
4 2009-03-06            28.38
5 2009-03-13            28.34
> demo.dat3 <- demo.dat[c(1,7,9),]# cases 1, 7 and 9, all variables.
> demo.dat3
         Date Lat.degN Lon.degE Actual.SST.degC Bears Lctn
1 2009-02-27       1.5    103.5           28.01     1    A
7 2009-03-20       1.5    103.5           28.87     7    A
9 2009-03-27       1.5    103.5           29.20     9    A
> demo.dat4 <- demo.dat[seq(1, length(demo.dat[,1]), 3),]# every 3rd case, starting at the first (1,4,7,…).
> demo.dat4
           Date Lat.degN Lon.degE Actual.SST.degC Bears Lctn
1  2009-02-27       1.5    103.5           28.01     1    A
4  2009-03-06       0.5    103.5           28.38     4    B
7  2009-03-20       1.5    103.5           28.87     7    A
10 2009-03-27       0.5    103.5           29.25    10    B
13 2009-04-10       1.5    103.5           29.63    13    A
16 2009-04-17       0.5    103.5           29.89    16    B
19 2009-05-01       1.5    103.5           29.75    19    A
>

Merging Datasets

To merge 2 data sets, you need a common ID variable in both. They don’t have to have the same name.
The standard function for merging data sets is
merge(…).

Example:


df3 <- merge(df1,df2,by.x=“common.id”,by.y=“common.id”,all=T)# assuming your common id variable in df1 is called “common.id” and the common id in df2 is called “common.id2”. all=T tells R to keep all the cases in both data.frames

> demo.dat$id <- paste(demo.dat$Date,demo.dat$Lctn)
> demo.dat

          Date Lat.degN Lon.degE Actual.SST.degC Bears Lctn            id
1  2009-02-27       1.5    103.5           28.01     1    A 2009-02-27  A
2  2009-02-27       0.5    103.5           28.00     2    B 2009-02-27  B
3  2009-03-06       1.5    103.5           28.44     3    A 2009-03-06  A
.      .             .       .               .       .    .         .   
.      .             .       .               .       .    .         .
.      .             .       .               .       .    .         . 
19 2009-05-01       1.5    103.5           29.75    19    A 2009-05-01  A
20 2009-05-01       0.5    103.5           29.84    20    B 2009-05-01  B
For this demonstration, we will subset the demo data, then merge it:

> demo.dat.SST <- demo.dat[c(1:5),c(4,7)]
> demo.dat.SST
  Actual.SST.degC            id
1           28.01 2009-02-27  A
2           28.00 2009-02-27  B
3           28.44 2009-03-06  A
4           28.38 2009-03-06  B
5           28.34 2009-03-13  A
> demo.dat.Bear <- demo.dat[c(1:5),c(5,7)]
> demo.dat.Bear
  Bears            id
1     1 2009-02-27  A
2     2 2009-02-27  B
3     3 2009-03-06  A
4     4 2009-03-06  B
5     5 2009-03-13  A
> demo.dat1 <- merge(demo.dat.SST,demo.dat.Bear,by.x="id",by.y="id",all=T)# keep all cases.
> demo.dat1
             id Actual.SST.degC Bears
1 2009-02-27  A           28.01     1
2 2009-02-27  B           28.00     2
3 2009-03-06  A           28.44     3
4 2009-03-06  B           28.38     4
5 2009-03-13  A           28.34     5
>
 

No comments:

Post a Comment