I have data that consists of values on dates, here represented as timestamps:
json <- structure(list(creation_date = c(1689356840L, 1689366017L, 1689376446L,
1689504809L, 1690366268L, 1690472012L, 1690478723L, 1690701972L,
1690701973L, 1690746389L, 1690746392L, 1690831235L, 1691022236L,
1692097787L, 1692625463L, 1692699453L, 1692828527L, 1692828527L,
1692828550L, 1692829734L, 1692878775L, 1692879861L, 1693058927L,
1693082719L, 1693106370L, 1693115869L, 1693120632L, 1693121481L,
1693137142L, 1693164782L, 1693172783L, 1693175185L, 1693208767L,
1693909005L, 1693914013L, 1693921364L, 1693928549L, 1693930560L,
1693999563L, 1694001298L, 1694003421L, 1694003758L, 1694005624L,
1694015620L, 1694015743L, 1694074163L, 1694153769L, 1694181062L,
1694194729L, 1694194740L, 1694194740L, 1694194744L, 1694209208L,
1694594207L, 1694798845L, 1694798846L, 1694809497L, 1694883734L,
1694901062L, 1695124304L, 1695221997L, 1695222830L, 1695259750L,
1695259766L, 1695260229L, 1695283945L, 1695329985L, 1695330735L,
1695363048L, 1695399875L, 1695403486L, 1695409526L, 1695458043L,
1695464399L, 1695469820L, 1695472988L, 1695479007L, 1695483407L,
1695485270L, 1695485763L, 1695485763L, 1695493324L, 1695505234L,
1695505235L, 1695509011L, 1695517370L, 1695527643L, 1695556326L,
1695567795L, 1695567798L, 1695575998L, 1695576929L, 1695589633L,
1695596154L, 1695607375L, 1695633830L, 1695657647L, 1695658242L,
1695710762L, 1695719101L), cumulative_reputation = c(100L, 110L,
120L, 130L, 140L, 150L, 160L, 170L, 185L, 200L, 185L, 195L, 205L,
215L, 213L, 228L, 230L, 240L, 230L, 240L, 250L, 252L, 262L, 272L,
282L, 292L, 302L, 312L, 322L, 332L, 342L, 352L, 362L, 372L, 382L,
392L, 402L, 412L, 422L, 432L, 442L, 452L, 462L, 472L, 482L, 492L,
502L, 512L, 522L, 512L, 510L, 512L, 522L, 532L, 547L, 557L, 567L,
577L, 587L, 597L, 607L, 617L, 632L, 617L, 627L, 637L, 647L, 657L,
672L, 682L, 692L, 707L, 717L, 727L, 737L, 747L, 745L, 755L, 757L,
747L, 745L, 755L, 753L, 755L, 765L, 775L, 785L, 795L, 805L, 795L,
805L, 815L, 825L, 835L, 845L, 847L, 857L, 867L, 877L, 887L)), row.names = c(670L,
669L, 668L, 667L, 666L, 665L, 664L, 663L, 662L, 661L, 660L, 659L,
658L, 657L, 656L, 655L, 653L, 654L, 652L, 651L, 650L, 649L, 648L,
647L, 646L, 645L, 644L, 643L, 642L, 641L, 640L, 639L, 638L, 637L,
636L, 635L, 634L, 633L, 632L, 631L, 630L, 629L, 628L, 627L, 626L,
625L, 624L, 623L, 622L, 620L, 621L, 619L, 618L, 617L, 616L, 615L,
614L, 613L, 612L, 611L, 610L, 609L, 608L, 607L, 606L, 605L, 604L,
603L, 602L, 601L, 600L, 599L, 598L, 597L, 596L, 595L, 594L, 593L,
592L, 590L, 591L, 589L, 588L, 587L, 586L, 585L, 584L, 583L, 582L,
581L, 580L, 579L, 578L, 577L, 576L, 575L, 574L, 573L, 572L, 571L
), class = "data.frame")
I have fit a model to that data and can now calculate the predicted values:
model <- glm(cumulative_reputation ~ creation_date, data = json, family = poisson(link = "log"))
prediction <- predict(model, json)
How can I find the date (or timestamp) for which the model predicts a specific value?
To find that date, I have calculated values from the last date of measurement (1695719101) to a year into the future (1695719101 + 365 * 24 * 60 * 60) and then searched the resulting values for the one closest to the one I want (cumulative_reputation = 10000).
Is there a way to calculate this date (or timestamp) directly?
Note:
This question is a follow up to this question.
If you have the information in the format that @PBulls provided, i.e.
COEF <- coef(model)
pred <- function(t) exp(COEF[1] + t*COEF[2])
Then you don't need uniroot()
to get the answer, you can get the answer analytically:
target = exp(c1 + c2*t)
log(target) = c1 + c2*t
t = (log(target) - c1)/c2
or in R code,
(log(target) - COEF[1])/COEF[2]
The uniroot()
approach is still handy for more complicated models (e.g. splines) where the math is hard or impossible.
The model gives you an equation for the curve, which you can then use for root-finding against a specific value. For example:
## This is essentially what 'predict(model, type="response")' does here
COEF <- coef(model)
pred <- function(t) exp(COEF[1] + t*COEF[2])
## Find the root for a certain target response
optfn <- function(t, target=10000) pred(t) - target
uniroot(optfn, c(1E9, 5E9))$root |> as.POSIXct(tz="UTC")
#> "2023-12-20 12:07:04 UTC"
Just to confirm that this timestamp is indeed the solution:
predict(model, list(creation_date=1703074025), type="response")
#> 10000
Speaking briefly for the statistical aspect, extrapolating so far beyond the observed data might be questionable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With