Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

download url in txt format into pandas dataframe

I am having trouble trying to download the data from this particular URL and store it in a pandas data-frame. Can anyone help with this?

url ='http://www2.conectiv.com/cpd/tps/archives/nj/2017/12/20171205NJA1.txt'

I need to store each Segment as a row with corresponding numbers in separate columns. so, I need it in this format:

NJAAP, 12/5/2017, 37.63, 36.34, 35.97,..., 38.52
NJAAS, 12/5/2017, 37.63, 36.34, ...        etc

I tried the following method:

import pandas as pd
from urllib.request import urlopen

df = pd.read_csv(url, skiprows=4) 

But, I am not getting what i wanted. I am getting this instead:

Segment:NJAAP 12/05/2017 37.63 36.34 35.97 35.76 36.71 39.90 46.36 52.49 56.16 58.41 58.98 59.60 59.58 58.52 57.40 54.34 53.90 53.15 51.44 49.49 46.96 44.12 41.02 38.52
0   Segment:NJAAS 12/05/2017 ...
1   Segment:NJADC 12/05/2017 ...
2   Segment:NJAGN 12/05/2017 ...
3   Segment:NJAGT 12/05/2017 ...

Can someone please help? Thanks

like image 746
turtle_in_mind Avatar asked Jan 30 '23 01:01

turtle_in_mind


1 Answers

read_csv() has many useful options

  • header=None - and first row is not treated as headers.
  • sep='\s+' - and it uses spaces to split columns (instead of comma ,). It is regex.

.

import pandas as pd

url ='http://www2.conectiv.com/cpd/tps/archives/nj/2017/12/20171205NJA1.txt'
df = pd.read_csv(url, skiprows=4, header=None, sep='\s+')

After you load data you can change values in columns.

This removes Segments: in first column

df[0] = df[0].str.replace('Segment:', '')
like image 135
furas Avatar answered Jan 31 '23 21:01

furas