I have some time-series data that I'm fitting a loess curve in ggplot2, as seen attached. The data takes the shape of an "S" curve. What I really need to find out is the date where the data starts to level off, which looks to be right around time '550' or '600'
Is there some kind of quantitative way that this can be marked off in the graph?
A link to the dataset: http://dl.dropbox.com/u/75403/stover_data.txt
A dput()
of the dataset:
structure(list(date = c(211L, 213L, 215L, 217L, 218L, 221L, 222L,
223L, 224L, 225L, 226L, 228L, 229L, 230L, 231L, 232L, 233L, 234L,
235L, 236L, 237L, 238L, 239L, 240L, 241L, 242L, 244L, 246L, 247L,
248L, 249L, 250L, 251L, 253L, 254L, 255L, 256L, 258L, 259L, 260L,
261L, 262L, 263L, 264L, 265L, 266L, 267L, 268L, 269L, 270L, 271L,
272L, 273L, 274L, 275L, 276L, 277L, 278L, 279L, 281L, 282L, 283L,
285L, 286L, 287L, 288L, 290L, 291L, 292L, 293L, 294L, 295L, 296L,
297L, 298L, 299L, 300L, 301L, 302L, 304L, 305L, 306L, 307L, 308L,
309L, 310L, 311L, 312L, 313L, 314L, 315L, 316L, 317L, 318L, 319L,
320L, 321L, 322L, 323L, 324L, 325L, 326L, 327L, 328L, 329L, 330L,
331L, 332L, 333L, 334L, 335L, 336L, 337L, 338L, 339L, 340L, 341L,
342L, 343L, 344L, 345L, 346L, 347L, 348L, 349L, 350L, 351L, 352L,
353L, 354L, 355L, 356L, 357L, 358L, 359L, 360L, 361L, 362L, 363L,
364L, 365L, 366L, 367L, 368L, 369L, 370L, 371L, 372L, 373L, 374L,
375L, 376L, 377L, 378L, 379L, 380L, 381L, 382L, 383L, 384L, 385L,
386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L, 394L, 395L, 396L,
397L, 398L, 399L, 400L, 402L, 404L, 405L, 406L, 407L, 408L, 410L,
411L, 413L, 414L, 415L, 416L, 418L, 419L, 420L, 421L, 422L, 423L,
424L, 425L, 426L, 427L, 428L, 429L, 430L, 431L, 432L, 433L, 434L,
435L, 436L, 437L, 438L, 439L, 440L, 441L, 442L, 443L, 444L, 445L,
446L, 447L, 448L, 449L, 450L, 451L, 452L, 453L, 455L, 456L, 457L,
458L, 459L, 460L, 461L, 462L, 463L, 464L, 465L, 466L, 467L, 468L,
469L, 470L, 471L, 472L, 473L, 474L, 475L, 476L, 477L, 478L, 479L,
480L, 481L, 482L, 483L, 484L, 485L, 486L, 487L, 488L, 489L, 490L,
491L, 492L, 493L, 494L, 495L, 496L, 497L, 498L, 499L, 500L, 501L,
502L, 503L, 504L, 505L, 506L, 507L, 508L, 509L, 510L, 511L, 512L,
513L, 514L, 515L, 516L, 517L, 518L, 519L, 520L, 521L, 522L, 523L,
524L, 527L, 528L, 529L, 530L, 531L, 532L, 533L, 534L, 535L, 536L,
537L, 538L, 539L, 540L, 541L, 544L, 545L, 546L, 547L, 548L, 549L,
550L, 551L, 552L, 553L, 554L, 555L, 556L, 557L, 558L, 559L, 560L,
561L, 562L, 563L, 564L, 565L, 566L, 567L, 568L, 569L, 570L, 571L,
572L, 573L, 574L, 575L, 576L, 577L, 578L, 579L, 580L, 581L, 582L,
583L, 587L, 588L, 589L, 590L, 591L, 592L, 593L, 594L, 595L, 596L,
597L, 598L, 599L, 600L, 601L, 602L, 603L, 604L, 605L, 606L, 607L,
608L, 609L, 610L, 611L, 612L, 613L, 614L, 615L, 616L, 617L, 618L,
619L, 620L, 621L, 622L, 623L, 624L, 625L, 626L, 627L, 628L, 629L,
630L, 631L, 632L, 634L, 635L, 636L, 637L, 638L, 639L, 640L, 641L,
642L, 643L, 644L, 645L, 646L, 647L, 648L, 649L, 650L, 651L, 652L,
653L, 654L, 655L, 656L, 657L, 658L, 659L, 660L, 661L, 662L, 663L,
664L, 665L, 666L, 667L, 668L, 669L, 670L, 671L, 672L, 673L, 674L,
675L, 676L, 677L, 678L, 679L, 680L, 681L, 684L, 685L, 686L, 687L,
688L, 689L, 690L, 691L, 692L, 693L, 694L, 695L, 696L, 697L, 698L,
699L, 700L, 701L, 702L, 703L, 704L, 705L, 706L, 707L, 708L, 709L,
710L, 711L, 712L, 713L, 714L, 715L, 716L, 717L, 718L, 719L, 720L,
721L, 722L, 723L, 724L, 725L, 726L, 727L, 728L, 729L, 730L, 731L,
732L, 733L, 734L, 735L, 736L, 737L, 738L, 739L, 740L, 741L, 742L,
743L, 744L, 745L, 746L, 747L, 748L, 749L, 750L, 751L, 752L, 753L,
754L, 755L, 756L, 757L, 758L, 759L, 760L, 761L, 762L, 763L, 764L,
765L, 766L, 767L, 768L, 769L, 770L, 771L, 772L, 773L, 774L, 775L,
776L, 777L, 778L, 781L, 782L, 783L, 784L, 785L, 786L, 787L, 788L,
789L, 790L, 791L, 792L, 793L, 794L, 795L, 796L, 797L, 798L, 799L,
800L, 801L, 802L, 803L, 804L, 805L, 806L, 807L, 808L, 809L, 810L,
811L, 812L, 813L, 814L, 815L, 816L, 817L, 818L, 819L, 820L, 821L,
822L, 823L, 824L, 825L, 826L, 827L, 828L, 829L, 830L, 831L, 832L,
833L, 834L, 835L, 836L, 837L, 838L, 839L, 840L, 841L), org_count = c(2L,
1L, 3L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 2L, 5L, 3L, 2L, 1L, 4L, 1L,
1L, 10L, 10L, 4L, 5L, 4L, 1L, 2L, 2L, 1L, 1L, 3L, 1L, 1L, 2L,
1L, 3L, 6L, 4L, 2L, 1L, 3L, 1L, 2L, 4L, 4L, 6L, 3L, 2L, 6L, 12L,
13L, 14L, 8L, 7L, 5L, 11L, 11L, 1L, 40L, 13L, 1L, 2L, 4L, 2L,
5L, 2L, 1L, 2L, 3L, 5L, 1L, 3L, 4L, 1L, 4L, 7L, 12L, 3L, 3L,
2L, 2L, 2L, 2L, 2L, 3L, 4L, 2L, 5L, 6L, 4L, 5L, 6L, 3L, 6L, 4L,
16L, 79L, 61L, 31L, 43L, 40L, 38L, 25L, 22L, 29L, 22L, 5L, 6L,
11L, 6L, 6L, 8L, 7L, 4L, 7L, 11L, 4L, 18L, 10L, 13L, 10L, 8L,
12L, 14L, 11L, 22L, 13L, 16L, 16L, 6L, 5L, 11L, 17L, 11L, 11L,
16L, 15L, 13L, 16L, 15L, 12L, 16L, 14L, 9L, 15L, 18L, 20L, 13L,
15L, 21L, 16L, 6L, 22L, 20L, 13L, 19L, 15L, 23L, 19L, 18L, 21L,
21L, 12L, 15L, 41L, 26L, 14L, 12L, 11L, 11L, 9L, 9L, 8L, 7L,
5L, 2L, 7L, 6L, 2L, 3L, 4L, 2L, 2L, 1L, 7L, 3L, 3L, 4L, 2L, 3L,
1L, 2L, 1L, 2L, 2L, 2L, 6L, 5L, 7L, 8L, 6L, 5L, 8L, 6L, 5L, 5L,
4L, 4L, 8L, 5L, 3L, 6L, 6L, 6L, 6L, 5L, 6L, 4L, 1L, 4L, 2L, 5L,
1L, 2L, 1L, 1L, 1L, 2L, 3L, 5L, 1L, 1L, 3L, 3L, 4L, 3L, 4L, 6L,
6L, 1L, 2L, 3L, 6L, 4L, 7L, 17L, 6L, 5L, 2L, 4L, 6L, 8L, 1L,
3L, 2L, 4L, 4L, 2L, 3L, 4L, 3L, 3L, 7L, 9L, 6L, 14L, 12L, 12L,
6L, 15L, 33L, 19L, 13L, 17L, 12L, 16L, 10L, 7L, 7L, 6L, 20L,
20L, 8L, 14L, 9L, 22L, 21L, 6L, 6L, 8L, 54L, 44L, 22L, 21L, 14L,
13L, 64L, 34L, 26L, 21L, 61L, 43L, 47L, 42L, 37L, 57L, 46L, 38L,
33L, 32L, 51L, 76L, 36L, 31L, 45L, 35L, 27L, 17L, 17L, 12L, 7L,
77L, 69L, 18L, 28L, 37L, 35L, 40L, 47L, 36L, 37L, 33L, 17L, 24L,
13L, 19L, 28L, 22L, 27L, 49L, 37L, 25L, 30L, 35L, 20L, 16L, 20L,
10L, 15L, 67L, 35L, 32L, 28L, 48L, 66L, 76L, 68L, 38L, 16L, 18L,
37L, 29L, 37L, 53L, 31L, 30L, 20L, 48L, 36L, 35L, 31L, 33L, 16L,
13L, 32L, 56L, 47L, 32L, 39L, 20L, 27L, 53L, 62L, 60L, 49L, 41L,
17L, 25L, 26L, 42L, 33L, 48L, 34L, 25L, 24L, 51L, 31L, 44L, 37L,
27L, 17L, 35L, 32L, 34L, 28L, 28L, 28L, 28L, 53L, 48L, 58L, 49L,
25L, 25L, 34L, 33L, 63L, 75L, 112L, 74L, 29L, 36L, 36L, 42L,
42L, 44L, 49L, 16L, 24L, 27L, 47L, 40L, 37L, 33L, 13L, 25L, 31L,
45L, 40L, 53L, 51L, 30L, 41L, 43L, 60L, 46L, 39L, 24L, 39L, 48L,
59L, 43L, 71L, 31L, 21L, 37L, 45L, 41L, 45L, 34L, 19L, 19L, 25L,
45L, 40L, 28L, 33L, 19L, 25L, 25L, 31L, 25L, 29L, 31L, 30L, 27L,
40L, 31L, 25L, 42L, 29L, 18L, 11L, 27L, 34L, 35L, 59L, 32L, 23L,
22L, 29L, 38L, 39L, 35L, 47L, 21L, 16L, 33L, 22L, 15L, 18L, 16L,
20L, 16L, 36L, 44L, 58L, 35L, 21L, 20L, 14L, 55L, 34L, 30L, 40L,
27L, 34L, 31L, 47L, 53L, 42L, 59L, 55L, 41L, 43L, 29L, 26L, 32L,
40L, 33L, 28L, 27L, 47L, 40L, 52L, 48L, 58L, 38L, 35L, 29L, 37L,
19L, 19L, 22L, 15L, 16L, 21L, 31L, 25L, 31L, 23L, 32L, 30L, 80L,
45L, 49L, 32L, 18L, 29L, 35L, 23L, 27L, 21L, 21L, 29L, 43L, 106L,
58L, 117L, 49L, 28L, 24L, 43L, 49L, 34L, 23L, 28L, 16L, 21L,
45L, 37L, 29L, 32L, 26L, 16L, 18L, 26L, 24L, 21L, 18L, 16L, 23L,
10L, 19L, 24L, 29L, 11L, 26L, 15L, 14L, 19L)), .Names = c("date",
"org_count"), class = "data.frame", row.names = c(NA, -599L))
Graph:
Code:
> p<-qplot(date,org_count, data=christi)
> p+stat_smooth(method="loess",size=1.5)
If you are asking for a way of determining the point where the curve is a maximum (i.e. flat), this is the same as finding the point where the slope of the line is at its maximum (from basic calculus).
First, read your data:
christi <- read.table("http://dl.dropbox.com/u/75403/stover_data.txt", sep="\t", header=TRUE)
Next, use loess
to fit a smoothed model:
fit <- loess(org_count~date, data=christi)
Then, predict the values in your range of x-values (with predict.loess
), determine the slope (diff
is close enough`), and find the
x <- 200:800
px <- predict(fit, newdata=x)
px1 <- diff(px)
which.max(px1)
[1] 367
Since the start value of x is 200, this means the curve is flat at position 200+367=567
.
If you wanted to plot this:
par(mfrow=c(1, 2))
plot(x, px, main="loess model")
plot(x[-1], px1, main="diff(loess model)")
abline(v=567, col="red")
It all depends on what you mean by "where the data starts to level off". You need to put this into math. LOESS curves can be really bumpy depending on what bandwidth you use. You might want to modify the line below the comment marked "line A" to specify what you mean. For example, you might want to look at more than just the first previous value. You could, for example, look at the sum of the previous 5 the_diff
values.
library(ggplot2)
christi <- read.table("stover_data.txt",header=TRUE)
the_fit = loess(org_count ~ date, data=christi)
pred = predict(the_fit, christi, se=FALSE) #could change data with larger grid
with_loess <- cbind(christi,pred)
p<-qplot(date,org_count, data=christi)
the_plot <- p+stat_smooth(method="loess",size=1.5)
the_diff <- diff(with_loess$pred)
tolerance <- .1
#line A: the following line is what you want to modify.
vline <- min(with_loess$date[the_diff < -tolerance])
new_plot <- the_plot + geom_vline(xintercept=vline)
for example, you could do the following (replace the last part of the above code starting with the the_diff
line)
the_diff <- diff(with_loess$pred, lag=20)
tolerance <- 1
vline <- min(with_loess$date[the_diff < -tolerance])
new_plot <- the_plot + geom_vline(xintercept=vline)
Also note that you might want to shift the the_diff
vector depending on what you mean by "start to level off" (ie in the future it's going to level off, or it's already starting to level off, etc.). Also remember that the_diff
is a shorter by the length of its lag
argument than your dataset.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With