Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read specific columns with pandas or other python module

Tags:

python

pandas

csv

I have a csv file from this webpage. I want to read some of the columns in the downloaded file (the csv version can be downloaded in the upper right corner).

Let's say I want 2 columns:

  • 59 which in the header is star_name
  • 60 which in the header is ra.

However, for some reason the authors of the webpage sometimes decide to move the columns around.

In the end I want something like this, keeping in mind that values can be missing.

data = #read data in a clever way names = data['star_name'] ras = data['ra'] 

This will prevent my program to malfunction when the columns are changed again in the future, if they keep the name correct.

Until now I have tried various ways using the csv module and resently the pandas module. Both without any luck.

EDIT (added two lines + the header of my datafile. Sorry, but it's extremely long.)

# name, mass, mass_error_min, mass_error_max, radius, radius_error_min, radius_error_max, orbital_period, orbital_period_err_min, orbital_period_err_max, semi_major_axis, semi_major_axis_error_min, semi_major_axis_error_max, eccentricity, eccentricity_error_min, eccentricity_error_max, angular_distance, inclination, inclination_error_min, inclination_error_max, tzero_tr, tzero_tr_error_min, tzero_tr_error_max, tzero_tr_sec, tzero_tr_sec_error_min, tzero_tr_sec_error_max, lambda_angle, lambda_angle_error_min, lambda_angle_error_max, impact_parameter, impact_parameter_error_min, impact_parameter_error_max, tzero_vr, tzero_vr_error_min, tzero_vr_error_max, K, K_error_min, K_error_max, temp_calculated, temp_measured, hot_point_lon, albedo, albedo_error_min, albedo_error_max, log_g, publication_status, discovered, updated, omega, omega_error_min, omega_error_max, tperi, tperi_error_min, tperi_error_max, detection_type, mass_detection_type, radius_detection_type, alternate_names, molecules, star_name, ra, dec, mag_v, mag_i, mag_j, mag_h, mag_k, star_distance, star_metallicity, star_mass, star_radius, star_sp_type, star_age, star_teff, star_detected_disc, star_magnetic_field 11 Com b,19.4,1.5,1.5,,,,326.03,0.32,0.32,1.29,0.05,0.05,0.231,0.005,0.005,0.011664,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,2008,2011-12-23,94.8,1.5,1.5,2452899.6,1.6,1.6,Radial Velocity,,,,,11 Com,185.1791667,17.7927778,4.74,,,,,110.6,-0.35,2.7,19.0,G8 III,,4742.0,, 11 UMi b,10.5,2.47,2.47,,,,516.22,3.25,3.25,1.54,0.07,0.07,0.08,0.03,0.03,0.012887,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,2009,2009-08-13,117.63,21.06,21.06,2452861.05,2.06,2.06,Radial Velocity,,,,,11 UMi,229.275,71.8238889,5.02,,,,,119.5,0.04,1.8,24.08,K4III,1.56,4340.0,, 
like image 410
Daniel Thaagaard Andreasen Avatar asked Sep 26 '14 15:09

Daniel Thaagaard Andreasen


People also ask

How do I read certain columns in pandas?

This can be done with the help of the pandas. read_csv() method. We will pass the first parameter as the CSV file and the second parameter the list of specific columns in the keyword usecols. It will return the data of the CSV file of specific columns.


1 Answers

An easy way to do this is using the pandas library like this.

import pandas as pd fields = ['star_name', 'ra']  df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields) # See the keys print df.keys() # See content in 'star_name' print df.star_name 

The problem here was the skipinitialspace which remove the spaces in the header. So ' star_name' becomes 'star_name'

like image 82
Daniel Thaagaard Andreasen Avatar answered Sep 19 '22 00:09

Daniel Thaagaard Andreasen