Subsetting Data
Using the subset(…) function.
Example:
> demo.dat
Date Lat.degN Lon.degE
Actual.SST.degC Bears Lctn
1 2009-02-27 1.5
103.5 28.01 1
A
2 2009-02-27 0.5
103.5 28.00 2
B
3 2009-03-06 1.5
103.5 28.44 3
A
4 2009-03-06 0.5
103.5 28.38 4
B
5 2009-03-13 1.5
103.5 28.34 5
A
. . . . . .
.
. . .
. . .
.
. . . . . .
.
19 2009-05-01 1.5 103.5 29.75 19
A
20 2009-05-01 0.5 103.5 29.84 20
B
> demo.A <- subset(demo.dat,Lctn=="A")
> demo.A
Date Lat.degN Lon.degE
Actual.SST.degC Bears Lctn
1 2009-02-27 1.5
103.5 28.01 1
A
3 2009-03-06 1.5
103.5 28.44 3
A
5 2009-03-13 1.5
103.5 28.34 5
A
7 2009-03-20 1.5
103.5 28.87 7
A
9 2009-03-27 1.5
103.5 29.20 9
A
11 2009-04-03 1.5 103.5 29.30 11
A
13 2009-04-10 1.5 103.5 29.63 13
A
15 2009-04-17 1.5 103.5 29.79 15
A
17 2009-04-24 1.5 103.5 30.00 17
A
19 2009-05-01 1.5 103.5 29.75 19
A
Indexing
Selecting via
indexing [i,j].
Use either
indexing for selecting whole blocks, or just yank them out individually and put
them in a dataframe.
The syntax for
indexing is as follows:
df[i,j]
Where df is the dataframe, i
is the index number(s) of the cases and j is the index number(s) or the variables. Either one can be left out – when that
happens, all i or j will be selected.
Example:
> demo.dat1
<- demo.dat[,c(1,4)]# variables 1 and 4, all cases.
> demo.dat1
Date Actual.SST.degC
1 2009-02-27 28.01
2 2009-02-27 28.00
3 2009-03-06 28.44
. . .
. . .
. . .
19
2009-05-01 29.75
20
2009-05-01 29.84
> demo.dat2
<- demo.dat[c(1:5),c(1,4)]# cases 1 to 5, variables 1 and 4.
> demo.dat2
Date Actual.SST.degC
1 2009-02-27 28.01
2 2009-02-27 28.00
3 2009-03-06 28.44
4 2009-03-06 28.38
5 2009-03-13 28.34
> demo.dat3
<- demo.dat[c(1,7,9),]# cases 1, 7 and 9, all variables.
> demo.dat3
Date Lat.degN Lon.degE Actual.SST.degC
Bears Lctn
1 2009-02-27 1.5
103.5 28.01 1
A
7 2009-03-20 1.5
103.5 28.87 7
A
9 2009-03-27 1.5
103.5 29.20 9
A
> demo.dat4
<- demo.dat[seq(1, length(demo.dat[,1]), 3),]# every 3rd
case, starting at the first (1,4,7,…).
> demo.dat4
Date Lat.degN Lon.degE
Actual.SST.degC Bears Lctn
1 2009-02-27 1.5
103.5 28.01 1
A
4 2009-03-06 0.5
103.5 28.38 4
B
7 2009-03-20 1.5
103.5 28.87 7
A
10
2009-03-27 0.5 103.5 29.25 10
B
13
2009-04-10 1.5 103.5
29.63 13
A
16
2009-04-17 0.5 103.5 29.89 16
B
19
2009-05-01 1.5 103.5 29.75 19
A
>
Merging Datasets
To merge 2 data sets, you need a common ID variable in both. They don’t have to have the same name.
The standard function for merging data sets is merge(…).
Example:
df3 <- merge(df1,df2,by.x=“common.id”,by.y=“common.id”,all=T)# assuming your common id variable in df1 is called “common.id” and the common id in df2 is called “common.id2”. all=T tells R to keep all the cases in both data.frames
> demo.dat$id <- paste(demo.dat$Date,demo.dat$Lctn)
> demo.dat
Date Lat.degN Lon.degE Actual.SST.degC Bears Lctn id
1 2009-02-27 1.5 103.5 28.01 1 A 2009-02-27 A
2 2009-02-27 0.5 103.5 28.00 2 B 2009-02-27 B
3 2009-03-06 1.5 103.5 28.44 3 A 2009-03-06 A
. . . . . . . .
To merge 2 data sets, you need a common ID variable in both. They don’t have to have the same name.
The standard function for merging data sets is merge(…).
Example:
df3 <- merge(df1,df2,by.x=“common.id”,by.y=“common.id”,all=T)# assuming your common id variable in df1 is called “common.id” and the common id in df2 is called “common.id2”. all=T tells R to keep all the cases in both data.frames
> demo.dat$id <- paste(demo.dat$Date,demo.dat$Lctn)
> demo.dat
Date Lat.degN Lon.degE Actual.SST.degC Bears Lctn id
1 2009-02-27 1.5 103.5 28.01 1 A 2009-02-27 A
2 2009-02-27 0.5 103.5 28.00 2 B 2009-02-27 B
3 2009-03-06 1.5 103.5 28.44 3 A 2009-03-06 A
. . . . . . . .
. . . . . .
. .
. . . . . .
. .
19
2009-05-01 1.5 103.5 29.75 19
A 2009-05-01 A
20 2009-05-01 0.5 103.5 29.84 20 B 2009-05-01 B
20 2009-05-01 0.5 103.5 29.84 20 B 2009-05-01 B
For this demonstration, we will subset the demo data,
then merge it:
> demo.dat.SST <- demo.dat[c(1:5),c(4,7)]
> demo.dat.SST
Actual.SST.degC id
1 28.01 2009-02-27 A
2
28.00 2009-02-27 B
3 28.44 2009-03-06 A
4
28.38 2009-03-06 B
5
28.34 2009-03-13 A
> demo.dat.Bear <- demo.dat[c(1:5),c(5,7)]
>
demo.dat.Bear
Bears id
1 1 2009-02-27 A
2 2 2009-02-27 B
3 3 2009-03-06 A
4 4 2009-03-06 B
5 5 2009-03-13 A
>
demo.dat1 <- merge(demo.dat.SST,demo.dat.Bear,by.x="id",by.y="id",all=T)# keep all cases.
>
demo.dat1
id Actual.SST.degC Bears
1
2009-02-27 A 28.01 1
2
2009-02-27 B 28.00 2
3
2009-03-06 A 28.44 3
4
2009-03-06 B 28.38 4
5
2009-03-13 A 28.34 5
>
No comments:
Post a Comment