Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Importing data from geonames.org database into MySQL DB

Tags:

mysql

geonames

Does anyone how to import a geonames.org data into my database? The one i'm trying to import is http://download.geonames.org/export/dump/DO.zip, and my DB its a MySQL db.

like image 590
Luis D Urraca Avatar asked Mar 13 '11 02:03

Luis D Urraca


4 Answers

I found the following by looking in the readme file included in the zip file you linked to in the section called "The main 'GeoName' table has the following fields:"

First create the database and table on your MySQL instance. The type of fields are given in each row of the section I just quoted the title of above.

CREATE DATABASE DO_test;
CREATE TABLE `DO_test`.`DO_table` (
  `geonameid` INT,
  `name` varchar(200),
  `asciiname` varchar(200),
  `alternatenames` varchar(5000),
  `latitude` DECIMAL(10,7),
  `longitude` DECIMAL(10,7),
  `feature class` char(1),
  `feature code` varchar(10),
  `country code` char(2),
  `cc2` char(60),
  `admin1 code` varchar(20),
  `admin2 code` varchar(80),
  `admin3 code` varchar(20),
  `admin4 code` varchar(20),
  `population` bigint,
  `elevation` INT,
  `gtopo30` INT,
  `timezone` varchar(100),
  `modification date` date
)
CHARACTER SET utf8;

After the table is created you can import the data from the file. The fields are delimited by tabs, rows as newlines:

LOAD DATA INFILE '/path/to/your/file/DO.txt' INTO TABLE `DO_test`.`DO_table`;
like image 163
wilbbe01 Avatar answered Nov 18 '22 22:11

wilbbe01


I have made recently a shell script that downloads the latest data from geonames site and imports them into a MySQL database. It is based on the knowledge at GeoNames Forum and saved me a lot of time.

It is in its first version but is fully functional. Maybe it can help.

You can access it at http://codigofuerte.github.com/GeoNames-MySQL-DataImport/

like image 36
Sergio Cruz Avatar answered Nov 18 '22 22:11

Sergio Cruz


For every one in the future : On geonames.org forum in the year 2008, this is "import all geonames dump into MySQL" http://forum.geonames.org/gforum/posts/list/732.page

Also google this : import dump into [postgresql OR SQL server OR MySQL] site:forum.geonames.org

To find more answers even from the year 2006

Edited to provide a synopsis:

In the geoname official read me : http://download.geonames.org/export/dump/. We will find a good description about the dump files and contents of them.

Dump files will be imported to the MySQL datatable directly. for example :

SET character_set_database=utf8;
LOAD DATA INFILE '/home/data/countryInfo.txt' INTO TABLE _geo_countries IGNORE 51 LINES(ISO2,ISO3,ISO_Numeric,FIPSCode,AsciiName,Capital,Area_SqKm,Population,ContinentCode,TLD,CurrencyCode,CurrencyName,PhoneCodes,PostalCodeFormats,PostalCodeRegex,Languages,GeonameID,Neighbours,EquivalentFIPSCodes);
SET character_set_database=default;

be careful about the characterset because if we use the CSV LOAD DATA ready importer of an old phpmyadmin of 2012 we may lose the utf characters even if the collation of columns was set to utf8_general_ci

Currently there are 4 essential datatables : continents, countries(countryInfo.txt), divisions(admin1), cities or locations(geonames)

admin1, 2, 3, 4 dump files are the different levels of internal divisions of countries such as admin 1 which is the states of US or provinces of other countries. admin 2 is more detailed and is the internal divisions of the state or the province. and so on for the 3 and 4

The countries dump files have been listed there contain not only cities but all the locatoins in that country even including a store center. Also there is a huge file as "allCountries.txt" will be more than 1GB after extracting from zip file. If we want only the cities we should choose one of the dump files : cities1000.txt , cities5000.txt , cities15000.txt which the numbers represent the min population of the listed cities. We store cities in the geonames datatable(you may call it geo locations or geo cities).

Before importing *.txt dump files take a few research about the LOAD DATA syntax in the MySQL documentation.

The read me text file(also in the footer of dump page) provides enough description for example :

The main 'geoname' table has the following fields :
---------------------------------------------------
geonameid         : integer id of record in geonames database
name              : name of geographical point (utf8) varchar(200)
asciiname         : name of geographical point in plain ascii characters, varchar(200)
alternatenames    : alternatenames, comma separated varchar(5000)
latitude          : latitude in decimal degrees (wgs84)
longitude         : longitude in decimal degrees (wgs84)
feature class     : see http://www.geonames.org/export/codes.html, char(1)
feature code      : see http://www.geonames.org/export/codes.html, varchar(10)
country code      : ISO-3166 2-letter country code, 2 characters
cc2               : alternate country codes, comma separated, ISO-3166 2-letter country code, 60 characters
admin1 code       : fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20)
admin2 code       : code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80) 
admin3 code       : code for third level administrative division, varchar(20)
admin4 code       : code for fourth level administrative division, varchar(20)
population        : bigint (8 byte int) 
elevation         : in meters, integer
dem               : digital elevation model, srtm3 or gtopo30, average elevation of 3''x3'' (ca 90mx90m) or 30''x30'' (ca 900mx900m) area in meters, integer. srtm processed by cgiar/ciat.
timezone          : the timezone id (see file timeZone.txt) varchar(40)
modification date : date of last modification in yyyy-MM-dd format

Also about the varchar(5000) we should know about the 64kb size of each row in MySQL 5.0 or later: Is a VARCHAR(20000) valid in MySQL?

like image 35
Alix Avatar answered Nov 18 '22 20:11

Alix


This is my note after I imported successfully.

As of writing I was testing with MySQL 5.7.16 on Windows 7. Follow these steps to import:

  1. Download desired data file from the official download page. In my case I chose cities1000.zip because it's much smaller in size (21MB) than the all-inclusive allcountries.zip (1.4GB).

  2. Create the following schema and table according to readme.txt on the download page, where the fields are specified below the text "the main 'geoname' table has the following fields".

    CREATE SCHEMA geonames DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
    
    CREATE TABLE geonames.cities1000 (
     id                INT,
     name              VARCHAR(200),
     ascii_name        VARCHAR(200),
     alternate_names   VARCHAR(10000) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
     latitude          DECIMAL(10, 7),
     longitude         DECIMAL(10, 7),
     feature_class     CHAR(1),
     feature_code      VARCHAR(10),
     country_code      CHAR(2),
     cc2               CHAR(60),
     admin1_code       VARCHAR(20),
     admin2_code       VARCHAR(80),
     admin3_code       VARCHAR(20),
     admin4_code       VARCHAR(20),
     population        BIGINT,
     elevation         INT,
     dem               INT,
     timezone          VARCHAR(100),
     modification_date DATE
    )
     CHARACTER SET utf8;
    

    Field names are arbitrary as long as the column size and field types are the same as specified. alternate_names are specially defined with the character set utf8mb4 because the values for this column in the file contain 4-byte unicode characters which are not supported by the character set utf8 of MySQL.

  3. Check the values of these parameters: character_set_client, character_set_results, character_set_connection. 7

    SHOW VARIABLES LIKE '%char%';
    

    If they are not utf8mb4, then change them:

    SET character_set_client = utf8mb4;
    SET character_set_results = utf8mb4;
    SET character_set_connection = utf8mb4;
    
  4. Import data from file using LOAD DATA INFILE ...

    USE geonames;
    
    LOAD DATA INFILE 'C:\\ProgramData\\MySQL\\MySQL Server 5.7\\Uploads\\cities1000.txt' INTO TABLE cities1000
    CHARACTER SET utf8mb4 (id, name, ascii_name, alternate_names, latitude, longitude, feature_class, feature_code,
                           country_code, cc2, admin1_code, admin2_code, admin3_code, admin4_code, population, @val1,
                           @val2, timezone, modification_date)
    SET elevation = if(@val1 = '', NULL, @val1), dem = if(@val2 = '', NULL, @val2);
    

    Explanation for the statement:

    • The file should be placed under a designated location by MySQL for importing data from files. You can check the location with SHOW VARIABLES LIKE 'secure_file_priv';. In my case it's C:\ProgramData\MySQL\MySQL Server 5.7\Uploads. In Windows you need to use double slashes to represent one slash in the path. This error would be shown when the path is not given correctly: [HY000][1290] The MySQL server is running with the --secure-file-priv option so it cannot execute this statement.

    • With CHARACTER SET utf8mb4 you're telling MySQL what encoding to expect from the file. When this is not given explicitly, or the column encoding is not utf8mb4, an error prompt like this will be seen: [HY000][1300] Invalid utf8 character string: 'Gorad Safija,SOF,Serdica,Sofi,Sofia,Sofiae,Sofie,Sofii,Sofij,Sof'. 5 In my case I found it's due to the existence of Gothic letters in the alternate names, such as πƒπ‰π†πŒΉπŒ° (id 727011), πŒΊπŒΏπ‚πŒΉπ„πŒΉπŒ±πŒ° (id 3464975), and πŒΊπ‰πŒ½πŒΈπŒ΄π€πŒΈπŒΉπ‰πŒ½ (id 3893894). These letters need to be stored as 4-byte characters (utf8mb4) while my then encoding was utf8 which only supports up to 3-byte characters. 6 You can change column encoding after the table is created:

      ALTER TABLE cities1000 MODIFY alternate_names VARCHAR(10000) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
      

      To check the encoding of a column:

      SELECT character_set_name, COLLATION_NAME FROM information_schema.COLUMNS WHERE table_schema = 'geonames' AND table_name = 'cities1000' AND column_name = 'alternate_names';
      

      To test if the characters can be stored:

      UPDATE cities1000 SET alternate_names = 'πƒπ‰π†πŒΉπŒ°' WHERE id = 1;
      
    • Values for some columns need to be "improved" before they are inserted, such as elevation and dem. They are of type INT and values for them from the file could be empty strings, which can't be stored by an INT type column. So you need to convert those empty strings to null for those columns. The latter part of the statement just serves this purpose. This error would be shown when the values are not property converted first: [HY000][1366] Incorrect integer value: '' for column 'elevation' at row 1. 3, 4

References

  1. http://www.geonames.org/
  2. http://download.geonames.org/export/dump/
  3. https://dev.mysql.com/doc/refman/8.0/en/load-data.html
  4. https://dba.stackexchange.com/a/111044/94778
  5. https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-conversion.html
  6. https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html
  7. https://stackoverflow.com/a/35156926/4357087
  8. https://dev.mysql.com/doc/refman/5.7/en/charset-connection.html
like image 30
Ivan Huang Avatar answered Nov 18 '22 21:11

Ivan Huang