Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to rearrange two dimensional list into a different two dimensional list

Given an input of something like:

"Date 3" "Location A" "some data" 
"Date 3" "Location B" "some data" 
"Date 3" "Location C" "some data" 
"Date 2" "Location A" "some data" 
"Date 2" "Location B" "some data" 
"Date 1" "Location A" "some data" 
"Date 1" "Location C" "some data" 

I want to arrange it into columns (ultimately to put it into a spreadsheet) like so:

        Location A    Location B    Location C
Date 3  some data     some data     some data
Date 2  some data     some data     None
Date 1  some data     None          some data

Using the following code, I got it to work when I separated the the date into "Month" and "Day", and treated the Date as an integer, but after a month, they same Day integer is used so it writes over it.

    log = [["Location A", "somedata", 3, "Month"],["Location B", "somedata", 3, "Month"],
       ["Location C", "somedata", 3, "Month"],["Location A", "somedata", 2, "Month"],
       ["Location B", "somedata", 2, "Month"],["Location A", "somedata", 1, "Month"],
       ["Location C","somedata",1,"Month"]]

    locations = ["Location A","Location B","Location C"]

    location = locations
    days = []

    for location, time, day, month in log: 

        for i in range(len(days),day): 
            days.append([i+1] + [None for x in locations])

        days[day - 1][1 + locations.index(location)] = time
        days[day - 1][0] =  month + " " + str(day) # I just hack the date together here

    days = [i for i in days if i.count(None) < len(locations)]

    locations.insert(0,"Date")
    days.insert(0,locations)

    days = list(zip(*days))  

Which will give me (correctly)

['Date', 'Location A', 'Location B', 'Location C']
['Month 1', 'somedata', None, 'somedata']
['Month 2', 'somedata', 'somedata', None]
['Month 3', 'somedata', 'somedata', 'somedata']

But I want to keep the date together as one string, and move to the next column every time the string changes, not use the day as an integer.

locations = ["A","B","C"]

log = [ ["Date 2", "A", "Time"],["Date 2", "B", "Time"],["Date 2", "C", "Time"],
        ["Date 1", "A", "Time"],["Date 1", "B", "Time"],["Date 1", "C", "Time"] ]
out = []
j   = 0

for index, day in enumerate(log):

     date, location, time = day

     out.append([date] + [None for x in locations])

     if(log[index][0] != log[index-1][0] and index != 0):
          j += 1

     out[j][1 + locations.index(location)] = location

Using something like this, I can get:

['Date 2', 'A', None, 'C']
['Date 2', 'A', 'B', 'C']
['Date 1', None, None, None]
['Date 1', None, None, None]
['Date 1', None, None, None]

But it fills up too many columns with None, so the data doesn't correspond to the date.

Anyone have any ideas? I'm a beginner and I'm using Python 3.3

Thank you very much in advance.

like image 681
Charles Clayton Avatar asked Nov 26 '25 07:11

Charles Clayton


1 Answers

[Community wiki, because it's really a suggestion for a different approach.]

That operation is often called "pivoting". Libraries like pandas make this very simple, and if you're writing code to do intermediate work for later spreadsheet processing, it can come in very handy.

Something like

import pandas as pd
df = pd.read_csv("source.dat", delim_whitespace=True, header=None)
pivoted = df.pivot(index=0, columns=1, values=2)
pivoted = pivoted.fillna("None")
pivoted.index.name = ""
pivoted.to_csv("final.csv")

produces

>>> !cat final.csv
,Location A,Location B,Location C
Date 1,some data,None,some data
Date 2,some data,some data,None
Date 3,some data,some data,some data

[I should mention that many spreadsheet programs, including the world's most common one, can also do this natively.]


Step-by-step:

First, read the file into a DataFrame (like a spreadsheet page):

>>> df = pd.read_csv("source.dat", delim_whitespace=True, header=None)
>>> df
        0           1          2
0  Date 3  Location A  some data
1  Date 3  Location B  some data
2  Date 3  Location C  some data
3  Date 2  Location A  some data
4  Date 2  Location B  some data
5  Date 1  Location A  some data
6  Date 1  Location C  some data

[7 rows x 3 columns]

Then use the pivot method to reshape it:

>>> pivoted = df.pivot(index=0, columns=1, values=2)
>>> pivoted
1      Location A Location B Location C
0                                      
Date 1  some data        NaN  some data
Date 2  some data  some data        NaN
Date 3  some data  some data  some data

[3 rows x 3 columns]

pandas uses NaN for missing values, but we can make that "None" if you prefer:

>>> pivoted = pivoted.fillna("None")
>>> pivoted
1      Location A Location B Location C
0                                      
Date 1  some data       None  some data
Date 2  some data  some data       None
Date 3  some data  some data  some data

[3 rows x 3 columns]

You don't seem to want a named index, so let's get rid of it:

>>> pivoted.index.name = ""
>>> pivoted
1      Location A Location B Location C

Date 1  some data       None  some data
Date 2  some data  some data       None
Date 3  some data  some data  some data

[3 rows x 3 columns]

and then we can use to_csv to write it out. (We could also write it directly into an Excel-format workbook if we wanted.)

like image 177
2 revsDSM Avatar answered Nov 28 '25 20:11

2 revsDSM



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!