Given an input of something like:
"Date 3" "Location A" "some data"
"Date 3" "Location B" "some data"
"Date 3" "Location C" "some data"
"Date 2" "Location A" "some data"
"Date 2" "Location B" "some data"
"Date 1" "Location A" "some data"
"Date 1" "Location C" "some data"
I want to arrange it into columns (ultimately to put it into a spreadsheet) like so:
Location A Location B Location C
Date 3 some data some data some data
Date 2 some data some data None
Date 1 some data None some data
Using the following code, I got it to work when I separated the the date into "Month" and "Day", and treated the Date as an integer, but after a month, they same Day integer is used so it writes over it.
log = [["Location A", "somedata", 3, "Month"],["Location B", "somedata", 3, "Month"],
["Location C", "somedata", 3, "Month"],["Location A", "somedata", 2, "Month"],
["Location B", "somedata", 2, "Month"],["Location A", "somedata", 1, "Month"],
["Location C","somedata",1,"Month"]]
locations = ["Location A","Location B","Location C"]
location = locations
days = []
for location, time, day, month in log:
for i in range(len(days),day):
days.append([i+1] + [None for x in locations])
days[day - 1][1 + locations.index(location)] = time
days[day - 1][0] = month + " " + str(day) # I just hack the date together here
days = [i for i in days if i.count(None) < len(locations)]
locations.insert(0,"Date")
days.insert(0,locations)
days = list(zip(*days))
Which will give me (correctly)
['Date', 'Location A', 'Location B', 'Location C']
['Month 1', 'somedata', None, 'somedata']
['Month 2', 'somedata', 'somedata', None]
['Month 3', 'somedata', 'somedata', 'somedata']
But I want to keep the date together as one string, and move to the next column every time the string changes, not use the day as an integer.
locations = ["A","B","C"]
log = [ ["Date 2", "A", "Time"],["Date 2", "B", "Time"],["Date 2", "C", "Time"],
["Date 1", "A", "Time"],["Date 1", "B", "Time"],["Date 1", "C", "Time"] ]
out = []
j = 0
for index, day in enumerate(log):
date, location, time = day
out.append([date] + [None for x in locations])
if(log[index][0] != log[index-1][0] and index != 0):
j += 1
out[j][1 + locations.index(location)] = location
Using something like this, I can get:
['Date 2', 'A', None, 'C']
['Date 2', 'A', 'B', 'C']
['Date 1', None, None, None]
['Date 1', None, None, None]
['Date 1', None, None, None]
But it fills up too many columns with None, so the data doesn't correspond to the date.
Anyone have any ideas? I'm a beginner and I'm using Python 3.3
Thank you very much in advance.
[Community wiki, because it's really a suggestion for a different approach.]
That operation is often called "pivoting". Libraries like pandas make this very simple, and if you're writing code to do intermediate work for later spreadsheet processing, it can come in very handy.
Something like
import pandas as pd
df = pd.read_csv("source.dat", delim_whitespace=True, header=None)
pivoted = df.pivot(index=0, columns=1, values=2)
pivoted = pivoted.fillna("None")
pivoted.index.name = ""
pivoted.to_csv("final.csv")
produces
>>> !cat final.csv
,Location A,Location B,Location C
Date 1,some data,None,some data
Date 2,some data,some data,None
Date 3,some data,some data,some data
[I should mention that many spreadsheet programs, including the world's most common one, can also do this natively.]
Step-by-step:
First, read the file into a DataFrame (like a spreadsheet page):
>>> df = pd.read_csv("source.dat", delim_whitespace=True, header=None)
>>> df
0 1 2
0 Date 3 Location A some data
1 Date 3 Location B some data
2 Date 3 Location C some data
3 Date 2 Location A some data
4 Date 2 Location B some data
5 Date 1 Location A some data
6 Date 1 Location C some data
[7 rows x 3 columns]
Then use the pivot method to reshape it:
>>> pivoted = df.pivot(index=0, columns=1, values=2)
>>> pivoted
1 Location A Location B Location C
0
Date 1 some data NaN some data
Date 2 some data some data NaN
Date 3 some data some data some data
[3 rows x 3 columns]
pandas uses NaN for missing values, but we can make that "None" if you prefer:
>>> pivoted = pivoted.fillna("None")
>>> pivoted
1 Location A Location B Location C
0
Date 1 some data None some data
Date 2 some data some data None
Date 3 some data some data some data
[3 rows x 3 columns]
You don't seem to want a named index, so let's get rid of it:
>>> pivoted.index.name = ""
>>> pivoted
1 Location A Location B Location C
Date 1 some data None some data
Date 2 some data some data None
Date 3 some data some data some data
[3 rows x 3 columns]
and then we can use to_csv to write it out. (We could also write it directly into an Excel-format workbook if we wanted.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With