【腾讯云】亏本大甩卖，服务器4核16G 1年370元(带宽12M,系统盘120GB SSD盘,月流量2000GB)!!!!!!

MySQL 1核1G 19元/年

I'm working in R. I have a dataframe, `df` that looks like this:

``````> str(exp)
'data.frame':   691200 obs. of  19 variables:
\$ groupname: Factor w/ 8 levels "rowA","rowB",..: 1 1 1 1 1 1 1 1 1 1 ...
\$ location : Factor w/ 96 levels "c1","c10","c11",..: 1 2 3 4 12 23 34 45 56 67 ...
\$ starttime: num  0 0 0 0 0 0 0 0 0 0 ...
\$ inadist  : num  0 0.2 0 0.2 0.6 0 0 0 0 0 ...
\$ smldist  : num  0 2.1 0 1.8 1.2 0 0 0 0 3.3 ...
\$ lardist  : num  0 0 0 0 0 0 0 0 0 1.3 ...
\$ fPhase   : Factor w/ 2 levels "Light","Dark": 2 2 2 2 2 2 2 2 2 2 ...
\$ fCycle   : Factor w/ 6 levels "predark","Cycle 1",..: 1 1 1 1 1 1 1 1 1 1 ...
``````

I'd like to add another column, `timepoint`, that gives the `starttime` relative to the beginning of the `fCycle` it is in. So `starttime=1801` would be `timepoint=1` for `fCycle='Cycle 1'`.

What is the best way to create `df\$timepoint`?

ETA toy dataset:

``````starttime fCycle timepoint
1         1      1
2         1      2
3         1      3
4         1      4
5         2      1
6         2      2
7         2      3
8         2      4
9         3      1
10        3      2
11        3      3
12        4      1
13        4      2
14        4      3
15        5      1
16        5      2
17        6      1
18        6      2
19        6      3
20        6      4
``````

#### #0

You can combine `rle` with `sequence`. Here is some sample code. Is the output what you were looking for?

``````require(plyr)

mydf = data.frame(
starttime = 1:20,
fCycle    = c(rep(1:3, each = 4), rep(4:5, each = 3), rep(6, 2))
)

# sort data in increasing order of cycle and starttime
mydf = arrange(mydf, fCycle, starttime)

mydf = transform(mydf, timepoint = sequence(rle(fCycle)\$lengths))
``````

NOTE: In the light of the fact that there could be identical starttimes within the same fCycle, here is an alternate approach using `rank` and `ddply`

``````# treat same starttimes in an fcycle identically
ddply(mydf, .(fCycle), transform, timepoint = rank(starttime, ties = 'min'))

# treat same starttimes in an fcycle using average
ddply(mydf, .(fCycle), transform, timepoint = rank(starttime, ties = 'average'))
``````

#### #1

This is an outline of a solution, because I'm not quite clear on what you're asking. It seems like you're asking for something derived from run length encoding (RLE), which can begin via the `rle()` function.

1. The `rle()` output will give the lengths of each run (assign this `lengths`).
2. The offsets where each run occurs can be calculated (via `cumsum(c(1,lengths))`).
3. These can then be `rep` (repeated) a sufficient # of times (i.e. for each item in the run).
4. For each position (`1:n`) simply subtract the location of the start of the run.

EDIT: There's no need to use `rep` in step 3. It can be a lookup to the lengths.

iPhone横向模式下的网页