Different results of lm with same dataset written in two different languages (English and Korean)

Question

Results of lm function applied on two dataset (numeric variables + categorical variables) written in two different languages (one written in English and the other one written in Korean) are different. Except the categorical variables, numeric variable are exactly the same. What could explain the difference in the results?

#data 
df3 <- repmis::source_DropboxData("df3_v0.1.csv","gg30a74n4ew3zzg",header = TRUE)

#the one written in korean 
out1<-lm(YD~SANJI+TAmin8+TMINup18do6+typ_rain6+DTD9,data=df3)
summary(out1)

#the one written in eng 
df3$SANJI[df3$SANJI=="전북"]<-"JB"
df3$SANJI[df3$SANJI=="충북"]<-"CHB"
df3$SANJI[df3$SANJI=="경북"]<-"KB"
df3$SANJI[df3$SANJI=="전남"]<-"JN"
df3$SANJI2[df3$SANJI2=="고창"]<-"Gochang"
df3$SANJI2[df3$SANJI2=="괴산"]<-"Goesan"
df3$SANJI2[df3$SANJI2=="단양"]<-"Danyang"
df3$SANJI2[df3$SANJI2=="봉화"]<-"Fenghua"
df3$SANJI2[df3$SANJI2=="신안"]<-"Sinan"
df3$SANJI2[df3$SANJI2=="안동"]<-"Andong"
df3$SANJI2[df3$SANJI2=="영광"]<-"younggang"
df3$SANJI2[df3$SANJI2=="영양"]<-"youngyang"
df3$SANJI2[df3$SANJI2=="영주"]<-"youngju"
df3$SANJI2[df3$SANJI2=="예천"]<-"Yecheon"
df3$SANJI2[df3$SANJI2=="의성"]<-"Yusaeng"
df3$SANJI2[df3$SANJI2=="제천"]<-"Jechon"
df3$SANJI2[df3$SANJI2=="진안"]<-"Jinan"
df3$SANJI2[df3$SANJI2=="청송"]<-"Changsong"
df3$SANJI2[df3$SANJI2=="해남"]<-"Haenam"
out2<-lm(YD~SANJI+TAmin8+TMINup18do6+typ_rain6+DTD9,data=df3)
summary(out2)

#the one written in korean 
#Call:
#lm(formula = YD ~ SANJI + TAmin8 + TMINup18do6 + typ_rain6 + 
#    DTD9, data = df3)

#Residuals:
#    Min      1Q  Median      3Q     Max 
#-98.836 -23.173  -2.261  22.626 111.367 

#Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
#(Intercept) 970.33251   84.12479  11.534  < 2e-16 ***
#SANJI전남   -33.75664   12.53277  -2.693 0.008158 ** 
#SANJI전북   -44.17939   11.22274  -3.937 0.000144 ***
#SANJI충북   -44.09285    9.16736  -4.810 4.74e-06 ***
#TAmin8      -25.56618    3.36053  -7.608 9.37e-12 ***
#TMINup18do6   4.58052    0.96528   4.745 6.19e-06 ***
#typ_rain6    -0.19754    0.02862  -6.903 3.23e-10 ***
#DTD9        -16.15975    2.65128  -6.095 1.59e-08 ***
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#Residual standard error: 37.2 on 112 degrees of freedom
#Multiple R-squared:   0.58,    Adjusted R-squared:  0.5538 
#F-statistic:  22.1 on 7 and 112 DF,  p-value: < 2.2e-16


#the one written in eng 
#Call:
#lm(formula = YD ~ SANJI + TAmin8 + TMINup18do6 + typ_rain6 + 
#    DTD9, data = df3)

#Residuals:
#    Min      1Q  Median      3Q     Max 
#-98.836 -23.173  -2.261  22.626 111.367 

#Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
#(Intercept) 926.23966   84.32621  10.984  < 2e-16 ***
#SANJIJB      -0.08654   12.32752  -0.007    0.994    
#SANJIJN      10.33620   13.09434   0.789    0.432    
#SANJIKB      44.09285    9.16736   4.810 4.74e-06 ***
#TAmin8      -25.56618    3.36053  -7.608 9.37e-12 ***
#TMINup18do6   4.58052    0.96528   4.745 6.19e-06 ***
#typ_rain6    -0.19754    0.02862  -6.903 3.23e-10 ***
#DTD9        -16.15975    2.65128  -6.095 1.59e-08 ***
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#Residual standard error: 37.2 on 112 degrees of freedom
#Multiple R-squared:   0.58,    Adjusted R-squared:  0.5538 
#F-statistic:  22.1 on 7 and 112 DF,  p-value: < 2.2e-16

MrFlick · Accepted Answer

Your overall model fits are the same, you just have different reference classes for your factor ("SANJIJ"). Having a different reference level will also affect your intercept but won't change the estimation of your continuous covariates.

You can use relevel() to force a particular reference class (assuming SANJIJ is already a factor) or explicitly create the factor() with a levels= parameter, otherwise the default order is sorted alphabetically and the levels may not sort the same way in the different languages.

Different results of lm with same dataset written in two different languages (English and Korean)

Tags:

r

lm

jacobgreen

1 Answers

MrFlick

Recent Activity

Donate For Us

Different results of lm with same dataset written in two different languages (English and Korean)

Tags:

r

lm

jacobgreen

1 Answers

MrFlick

Related questions

Recent Activity

Donate For Us