How to automatically extract the well fitted linear part of a curve which the R^2 is not ideal for the whole curve? for example What I have: <blockquote> data.lm </blockquote> <pre class="prettyprint"><code> x y 1 1 1 2 2 8 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 5 9 9 2 10 10 7 </code></pre> <blockquote> rg.lm<- lm(y~x, data.lm) rg.lm </blockquote> <pre class="prettyprint"><code>Coefficients: (Intercept) x 3.7333 0.1939 </code></pre> <blockquote> summary(rg.lm) </blockquote> <pre class="prettyprint"><code>Residuals: Min 1Q Median 3Q Max -3.4788 -1.1136 0.0061 1.2712 3.8788 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.7333 1.6111 2.317 0.0491 * x 0.1939 0.2597 0.747 0.4765 Residual standard error: 2.358 on 8 degrees of freedom Multiple R-squared: 0.06519, Adjusted R-squared: -0.05166 F-statistic: 0.5579 on 1 and 8 DF, p-value: 0.4765 </code></pre> What I expect: <blockquote> data.lm.ex<- unknown.function (data.lm) data.lm.ex </blockquote> <pre class="prettyprint"><code> x y 1 3 3 2 4 4 3 5 5 4 6 6 7 7 7 </code></pre> Another example comes from real data: <blockquote> data.lm </blockquote> <pre class="prettyprint"><code> time OD 1 0 2.175 2 30 2.134 3 60 2.189 4 90 2.141 5 120 2.854 6 150 3.331 7 180 3.642 8 210 4.333 9 240 4.987 10 270 5.093 11 300 4.943 12 330 5.198 13 360 4.804 </code></pre> <blockquote> summary(lm(data.lm))$r.squared </blockquote> <pre class="prettyprint"><code>[1] 0.8981063 </code></pre> <blockquote> summary(lm(data.lm[4:9,]))$r.squared </blockquote> <pre class="prettyprint"><code>[1] 0.9886727 </code></pre> <blockquote> </blockquote> As it is shown above, the interval between line 4 to 9 has an absolutely higher r^2 than the whole curve. And would you please let me know the automatical way to find the interval which highest r^2 is presented and with at least certain number of points (due to 2 points always present the r^2=1.0)?

This should work: <pre class="prettyprint"><code>a <- cbind(1:10, c(1,8,3:7,5,2,7)) tmp <- rle(diff(a[,2])) ml <- max(tmp$lengths) i1 <- which(ml==tmp$lengths)[1] a[seq(i1,i1+ml),] </code></pre> Update <pre class="prettyprint"><code>a <- data.frame(x=c(0, 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360), y=c(2.175, 2.134, 2.189, 2.141, 2.854, 3.331, 3.642, 4.333, 4.987, 5.093, 4.943, 5.198, 4.804)) b <- diff(a[,2])/diff(a[,1]) b.k <- kmeans(b,3) b.max <- max(abs(b.k$centers)) b.v <- which(b.k$cluster == match(b.max, b.k$centers)) RES <- a[b.v,] plot(a) points(RES,pch=15) abline(coef(lm(y~x,RES)), col="red") </code></pre> <img src="https://i.stack.imgur.com/WkNir.png" alt="enter image description here"> A refined version: <pre class="prettyprint"><code>library(zoo) a <- data.frame(x=c(0, 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360), y=c(2.175, 2.134, 2.189, 2.141, 2.854, 3.331, 3.642, 4.333, 4.987, 5.093, 4.943, 5.198, 4.804)) f <- function (d) { m <- lm(y~x, as.data.frame(d)) return(coef(m)[2]) } co <- rollapply(a, 3, f, by.column=F) co.cl <- kmeans(co, 2) b.points <- which(co.cl$cluster == match(max(co.cl$centers), co.cl$centers))+1 RES <- a[b.points,] plot(a) points(RES,pch=15,col="red") abline(lm(y~x,RES),col="blue") </code></pre> [<img src="https://i.stack.imgur.com/LxqFf.png" alt="an improved version]">

How to find the linear part of a curve

Q: How do you find the linear range of a curve?

To determine the domain, identify the set of all the x-coordinates on the function's graph. To determine the range, identify the set of all y-coordinates. In addition, ask yourself what are the greatest/least x- and y-values. These values will be your boundary numbers.

Q: WHAT IS curve linear?

While the terms linear and nonlinear have standard definitions in statistics, the term curvilinear does not have a standard meaning. It generally is used to describe a curve that is smooth (no discontinuities) but the underlying mathematical model could be either linear or nonlinear.

Q: Is a linear line a curve?

The formal term to describe a straight line graph is linear, whether or not it goes through the origin, and the relationship between the two variables is called a linear relationship. Similarly, the relationship shown by a curved graph is called non-linear.

Q: How do you find the linear range in Excel?

Select “Linear” under “Trendline Options”. Also select “Display Equation on Chart” and “Display R-Squared Value on Chart”. You should now see a dotted line drawn through your data points and a text box next to it with the best-fit linear equation and the R2 value.

Tags:

r

linear-regression

lm

How to automatically extract the well fitted linear part of a curve which the R^2 is not ideal for the whole curve?

for example What I have:

data.lm

rg.lm<- lm(y~x, data.lm) rg.lm

Coefficients:
(Intercept)            x  
     3.7333       0.1939

summary(rg.lm)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4788 -1.1136  0.0061  1.2712  3.8788 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   3.7333     1.6111   2.317   0.0491 *
x             0.1939     0.2597   0.747   0.4765  

Residual standard error: 2.358 on 8 degrees of freedom
Multiple R-squared:  0.06519,   Adjusted R-squared:  -0.05166 
F-statistic: 0.5579 on 1 and 8 DF,  p-value: 0.4765

What I expect:

data.lm.ex<- unknown.function (data.lm) data.lm.ex

Another example comes from real data:

data.lm

   time    OD
1     0 2.175
2    30 2.134
3    60 2.189
4    90 2.141
5   120 2.854
6   150 3.331
7   180 3.642
8   210 4.333
9   240 4.987
10  270 5.093
11  300 4.943
12  330 5.198
13  360 4.804

summary(lm(data.lm))$r.squared

[1] 0.8981063

summary(lm(data.lm[4:9,]))$r.squared

[1] 0.9886727

As it is shown above, the interval between line 4 to 9 has an absolutely higher r^2 than the whole curve. And would you please let me know the automatical way to find the interval which highest r^2 is presented and with at least certain number of points (due to 2 points always present the r^2=1.0)?

938

asked Sep 08 '17 03:09

Shanqiao Chen

1 Answers

This should work:

a <- cbind(1:10, c(1,8,3:7,5,2,7))
tmp <- rle(diff(a[,2]))
ml <- max(tmp$lengths)
i1 <- which(ml==tmp$lengths)[1]

a[seq(i1,i1+ml),]

Update

a <- data.frame(x=c(0, 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360), 
                y=c(2.175, 2.134, 2.189, 2.141, 2.854, 3.331, 3.642, 4.333, 4.987, 5.093, 4.943, 5.198, 4.804))

b <- diff(a[,2])/diff(a[,1])
b.k <- kmeans(b,3)
b.max <- max(abs(b.k$centers))
b.v <- which(b.k$cluster == match(b.max, b.k$centers))

RES <- a[b.v,]
plot(a)
points(RES,pch=15)
abline(coef(lm(y~x,RES)), col="red")

enter image description here

A refined version:

library(zoo)
a <- data.frame(x=c(0, 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360), 
                y=c(2.175, 2.134, 2.189, 2.141, 2.854, 3.331, 3.642, 4.333, 4.987, 5.093, 4.943, 5.198, 4.804))
f <- function (d) {
  m <- lm(y~x, as.data.frame(d))
  return(coef(m)[2])
}
co <- rollapply(a, 3, f, by.column=F)
co.cl <- kmeans(co, 2)
b.points <- which(co.cl$cluster == match(max(co.cl$centers), co.cl$centers))+1
RES <- a[b.points,]
plot(a)
points(RES,pch=15,col="red")
abline(lm(y~x,RES),col="blue")

[ an improved version]

196

answered Oct 13 '22 15:10

akond

Related questions
                            
                                r shiny - upload all files from shinyDirChoose folder to server
                            
                                How to show significance stars in R Markdown (rmarkdown) html output notes?
                            
                                Installing executable scripts with R package
                            
                                Stream Error in the HTTP/2 framing layer: bigrquery commands error in R studio but not in Base R
                            
                                Stacked bar plot with hierarchical clustering (dendrogram)
                            
                                Rmarkdown password with getPass
                            
                                rJava loading in R(3.4.1) with OS Sierra
                            
                                R How to use which() with floating point values?
                            
                                How to make a value inside of reactive() change, depending on an observeEvent() inside of that reactive()
                            
                                Leaflet Layer Tiles not switchable when .html (= .Rmd output) is opened with browser
                            
                                How to use a predicate while reading from JDBC connection?
                            
                                Generate all possible combinations of rows in R?
                            
                                How to control space between stack bars in ggplot2?
                            
                                Adding Greek letters to LaTeX code while creating documentation for an R package
                            
                                Faster method than "while" loop to find chain of infection in R
                            
                                Coverpage and copyright notice before title in R bookdown?
                            
                                Change date variable to continuous month
                            
                                What is the tidyeval way of using dplyr::filter?
                            
                                How to run code section by section in RStudio?
                            
                                R Extract Hours from Time in factor Format

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With