Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read CSV file from GitHub using pandas

Tags:

python

pandas

csv

Im trying to read CSV file thats on github with Python using pandas> i have looked all over the web, and I tried some solution that I found on this website, but they do not work. What am I doing wrong?

I have tried this:

import pandas as pd

url = 'https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv'
df = pd.read_csv(url,index_col=0)
#df = pd.read_csv(url)

print(df.head(5))
like image 372
taga Avatar asked Mar 19 '19 11:03

taga


3 Answers

You should provide URL to raw content. Try using this:

import pandas as pd

url = 'https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv'
df = pd.read_csv(url, index_col=0)
print(df.head(5))

Output:

               alpha-2           ...            intermediate-region-code
name                             ...                                    
Afghanistan         AF           ...                                 NaN
Åland Islands       AX           ...                                 NaN
Albania             AL           ...                                 NaN
Algeria             DZ           ...                                 NaN
American Samoa      AS           ...                                 NaN
like image 151
Alderven Avatar answered Oct 19 '22 05:10

Alderven


Add ?raw=true at the end of the GitHub URL to get the raw file link.

In your case,

import pandas as pd

url = 'https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv?raw=true'
df = pd.read_csv(url,index_col=0)
print(df.head(5))

Output:

               alpha-2 alpha-3  country-code     iso_3166-2   region  \
name                                                                   
Afghanistan         AF     AFG             4  ISO 3166-2:AF     Asia   
Åland Islands       AX     ALA           248  ISO 3166-2:AX   Europe   
Albania             AL     ALB             8  ISO 3166-2:AL   Europe   
Algeria             DZ     DZA            12  ISO 3166-2:DZ   Africa   
American Samoa      AS     ASM            16  ISO 3166-2:AS  Oceania   

                     sub-region intermediate-region  region-code  \
name                                                               
Afghanistan       Southern Asia                 NaN        142.0   
Åland Islands   Northern Europe                 NaN        150.0   
Albania         Southern Europe                 NaN        150.0   
Algeria         Northern Africa                 NaN          2.0   
American Samoa        Polynesia                 NaN          9.0   

                sub-region-code  intermediate-region-code  
name                                                       
Afghanistan                34.0                       NaN  
Åland Islands             154.0                       NaN  
Albania                    39.0                       NaN  
Algeria                    15.0                       NaN  
American Samoa             61.0                       NaN 

Note: This works only with GitHub links and not with GitLab or Bitbucket links.

like image 17
Krishnakanth Allika Avatar answered Oct 19 '22 05:10

Krishnakanth Allika


You can copy/paste the url and change 2 things:

  1. Remove "blob"
  2. Replace github.com by raw.githubusercontent.com

For instance this link:

https://github.com/mwaskom/seaborn-data/blob/master/iris.csv

Works this way:

import pandas as pd

pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
like image 3
Nicolas Gervais Avatar answered Oct 19 '22 07:10

Nicolas Gervais