I'm working in R. I have a dataframe, df that looks like this:

> str(exp)
'data.frame':   691200 obs. of  19 variables:
\$ groupname: Factor w/ 8 levels "rowA","rowB",..: 1 1 1 1 1 1 1 1 1 1 ...
\$ location : Factor w/ 96 levels "c1","c10","c11",..: 1 2 3 4 12 23 34 45 56 67 ...
\$ starttime: num  0 0 0 0 0 0 0 0 0 0 ...
\$ inadist  : num  0 0.2 0 0.2 0.6 0 0 0 0 0 ...
\$ smldist  : num  0 2.1 0 1.8 1.2 0 0 0 0 3.3 ...
\$ lardist  : num  0 0 0 0 0 0 0 0 0 1.3 ...
\$ fPhase   : Factor w/ 2 levels "Light","Dark": 2 2 2 2 2 2 2 2 2 2 ...
\$ fCycle   : Factor w/ 6 levels "predark","Cycle 1",..: 1 1 1 1 1 1 1 1 1 1 ...

I'd like to add another column, timepoint, that gives the starttime relative to the beginning of the fCycle it is in. So starttime=1801 would be timepoint=1 for fCycle='Cycle 1'.

What is the best way to create df\$timepoint?

ETA toy dataset:

starttime fCycle timepoint
1         1      1
2         1      2
3         1      3
4         1      4
5         2      1
6         2      2
7         2      3
8         2      4
9         3      1
10        3      2
11        3      3
12        4      1
13        4      2
14        4      3
15        5      1
16        5      2
17        6      1
18        6      2
19        6      3
20        6      4

#0

You can combine rle with sequence. Here is some sample code. Is the output what you were looking for?

require(plyr)

mydf = data.frame(
starttime = 1:20,
fCycle    = c(rep(1:3, each = 4), rep(4:5, each = 3), rep(6, 2))
)

# sort data in increasing order of cycle and starttime
mydf = arrange(mydf, fCycle, starttime)

mydf = transform(mydf, timepoint = sequence(rle(fCycle)\$lengths))

NOTE: In the light of the fact that there could be identical starttimes within the same fCycle, here is an alternate approach using rank and ddply

# treat same starttimes in an fcycle identically
ddply(mydf, .(fCycle), transform, timepoint = rank(starttime, ties = 'min'))

# treat same starttimes in an fcycle using average
ddply(mydf, .(fCycle), transform, timepoint = rank(starttime, ties = 'average'))

#1

This is an outline of a solution, because I'm not quite clear on what you're asking. It seems like you're asking for something derived from run length encoding (RLE), which can begin via the rle() function.

1. The rle() output will give the lengths of each run (assign this lengths).
2. The offsets where each run occurs can be calculated (via cumsum(c(1,lengths))).
3. These can then be rep (repeated) a sufficient # of times (i.e. for each item in the run).
4. For each position (1:n) simply subtract the location of the start of the run.

EDIT: There's no need to use rep in step 3. It can be a lookup to the lengths.

