Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different results of lm with same dataset written in two different languages (English and Korean)

Tags:

r

lm

Results of lm function applied on two dataset (numeric variables + categorical variables) written in two different languages (one written in English and the other one written in Korean) are different. Except the categorical variables, numeric variable are exactly the same. What could explain the difference in the results?

#data 
df3 <- repmis::source_DropboxData("df3_v0.1.csv","gg30a74n4ew3zzg",header = TRUE)

#the one written in korean 
out1<-lm(YD~SANJI+TAmin8+TMINup18do6+typ_rain6+DTD9,data=df3)
summary(out1)

#the one written in eng 
df3$SANJI[df3$SANJI=="전북"]<-"JB"
df3$SANJI[df3$SANJI=="충북"]<-"CHB"
df3$SANJI[df3$SANJI=="경북"]<-"KB"
df3$SANJI[df3$SANJI=="전남"]<-"JN"
df3$SANJI2[df3$SANJI2=="고창"]<-"Gochang"
df3$SANJI2[df3$SANJI2=="괴산"]<-"Goesan"
df3$SANJI2[df3$SANJI2=="단양"]<-"Danyang"
df3$SANJI2[df3$SANJI2=="봉화"]<-"Fenghua"
df3$SANJI2[df3$SANJI2=="신안"]<-"Sinan"
df3$SANJI2[df3$SANJI2=="안동"]<-"Andong"
df3$SANJI2[df3$SANJI2=="영광"]<-"younggang"
df3$SANJI2[df3$SANJI2=="영양"]<-"youngyang"
df3$SANJI2[df3$SANJI2=="영주"]<-"youngju"
df3$SANJI2[df3$SANJI2=="예천"]<-"Yecheon"
df3$SANJI2[df3$SANJI2=="의성"]<-"Yusaeng"
df3$SANJI2[df3$SANJI2=="제천"]<-"Jechon"
df3$SANJI2[df3$SANJI2=="진안"]<-"Jinan"
df3$SANJI2[df3$SANJI2=="청송"]<-"Changsong"
df3$SANJI2[df3$SANJI2=="해남"]<-"Haenam"
out2<-lm(YD~SANJI+TAmin8+TMINup18do6+typ_rain6+DTD9,data=df3)
summary(out2)

#the one written in korean 
#Call:
#lm(formula = YD ~ SANJI + TAmin8 + TMINup18do6 + typ_rain6 + 
#    DTD9, data = df3)

#Residuals:
#    Min      1Q  Median      3Q     Max 
#-98.836 -23.173  -2.261  22.626 111.367 

#Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
#(Intercept) 970.33251   84.12479  11.534  < 2e-16 ***
#SANJI전남   -33.75664   12.53277  -2.693 0.008158 ** 
#SANJI전북   -44.17939   11.22274  -3.937 0.000144 ***
#SANJI충북   -44.09285    9.16736  -4.810 4.74e-06 ***
#TAmin8      -25.56618    3.36053  -7.608 9.37e-12 ***
#TMINup18do6   4.58052    0.96528   4.745 6.19e-06 ***
#typ_rain6    -0.19754    0.02862  -6.903 3.23e-10 ***
#DTD9        -16.15975    2.65128  -6.095 1.59e-08 ***
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#Residual standard error: 37.2 on 112 degrees of freedom
#Multiple R-squared:   0.58,    Adjusted R-squared:  0.5538 
#F-statistic:  22.1 on 7 and 112 DF,  p-value: < 2.2e-16


#the one written in eng 
#Call:
#lm(formula = YD ~ SANJI + TAmin8 + TMINup18do6 + typ_rain6 + 
#    DTD9, data = df3)

#Residuals:
#    Min      1Q  Median      3Q     Max 
#-98.836 -23.173  -2.261  22.626 111.367 

#Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
#(Intercept) 926.23966   84.32621  10.984  < 2e-16 ***
#SANJIJB      -0.08654   12.32752  -0.007    0.994    
#SANJIJN      10.33620   13.09434   0.789    0.432    
#SANJIKB      44.09285    9.16736   4.810 4.74e-06 ***
#TAmin8      -25.56618    3.36053  -7.608 9.37e-12 ***
#TMINup18do6   4.58052    0.96528   4.745 6.19e-06 ***
#typ_rain6    -0.19754    0.02862  -6.903 3.23e-10 ***
#DTD9        -16.15975    2.65128  -6.095 1.59e-08 ***
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#Residual standard error: 37.2 on 112 degrees of freedom
#Multiple R-squared:   0.58,    Adjusted R-squared:  0.5538 
#F-statistic:  22.1 on 7 and 112 DF,  p-value: < 2.2e-16
like image 778
jacobgreen Avatar asked Mar 17 '23 05:03

jacobgreen


1 Answers

Your overall model fits are the same, you just have different reference classes for your factor ("SANJIJ"). Having a different reference level will also affect your intercept but won't change the estimation of your continuous covariates.

You can use relevel() to force a particular reference class (assuming SANJIJ is already a factor) or explicitly create the factor() with a levels= parameter, otherwise the default order is sorted alphabetically and the levels may not sort the same way in the different languages.

like image 163
MrFlick Avatar answered Apr 27 '23 14:04

MrFlick