Python Pandas: pivot only certain columns in the DataFrame while keeping others

Tags:

I am trying to re-arrange a DataFrame that I automatically read in from a json using Pandas. I've searched but have had no success.

I have the following json (saved as a string for copy/paste convenience) with a bunch of json objects/dictionarys under the tag 'value'

json_str = '''{"preferred_timestamp": "internal_timestamp",
    "internal_timestamp": 3606765503.684,
    "stream_name": "ctdpf_j_cspp_instrument",
    "values": [{
        "value_id": "temperature",
        "value": 9.8319
    }, {
        "value_id": "conductivity",
        "value": 3.58847
    }, {
        "value_id": "pressure",
        "value": 22.963
    }]
}'''

I use the function 'json_normalize' in order to load the json into a flattened Pandas dataframe.

>>> from pandas.io.json import json_normalize
>>> import simplejson as json
>>> df = json_normalize(json.loads(json_str), 'values', ['preferred_timestamp', 'stream_name', 'internal_timestamp'])
>>> df
      value      value_id preferred_timestamp  internal_timestamp  \
0   9.83190   temperature  internal_timestamp        3.606766e+09   
1   3.58847  conductivity  internal_timestamp        3.606766e+09   
2  22.96300      pressure  internal_timestamp        3.606766e+09   
3  32.89470      salinity  internal_timestamp        3.606766e+09   

               stream_name  
0  ctdpf_j_cspp_instrument  
1  ctdpf_j_cspp_instrument  
2  ctdpf_j_cspp_instrument  
3  ctdpf_j_cspp_instrument

Here is where I am stuck. I want to take the value and value_id columns and pivot these into new columns based off of value_id.

I want the dataframe to look like the following:

stream_name              preferred_timestamp  internal_timestamp  conductivity  pressure  salinity  temperature    
ctdpf_j_cspp_instrument  internal_timestamp   3.606766e+09        3.58847       22.96300  32.89470  9.83190

I've tried both the pivot and pivot_table Pandas functions and even tried to manually pivot the tables by using 'set_index' and 'stack' but it's not quite how I want it.

>>> df.pivot_table(values='value', index=['stream_name', 'preferred_timestamp', 'internal_timestamp', 'value_id'])
stream_name              preferred_timestamp  internal_timestamp  value_id    
ctdpf_j_cspp_instrument  internal_timestamp   3.606766e+09        conductivity     3.58847
                                                                  pressure        22.96300
                                                                  salinity        32.89470
                                                                  temperature      9.83190
Name: value, dtype: float64

This is close, but it didn't seem to pivot the values in 'value_id' into separate columns.

and

>>> df.pivot('stream_name', 'value_id', 'value')
value_id                 conductivity  pressure  salinity  temperature
stream_name                                                           
ctdpf_j_cspp_instrument       3.58847    22.963   32.8947       9.8319

Close again, but it lacks the other columns that I want to be associated with this line.

I'm stuck here. Is there an elegant way of doing this or should I split the DataFrames and re-merge them to how I want?

590

asked Mar 15 '16 18:03

naja

1 Answers

Your first attempt was nearly correct, just use columns='value_id' instead of including it in the index.

# Perform the pivot.
df = df.pivot_table(
    values='value',
    index=['stream_name', 'preferred_timestamp', 'internal_timestamp'],
    columns='value_id'
    )

# Formatting.
df.reset_index(inplace=True)
df.columns.name = None

This isn't an issue in your example data, but keep in mind that pivot_table will aggregate values if multiple values are pivoted to the same position (taking the mean by default).

192

answered Oct 05 '22 06:10

root

Related questions
                            
                                Load Python 2 .npy file in Python 3
                            
                                Starting the ipython notebook
                            
                                "The owner of this website has banned your access based on your browser's signature" ... on a url request in a python program
                            
                                How to extract schema for avro file in python
                            
                                Counting relationships in SQLAlchemy
                            
                                How to Find Documents That are in the same Cluster with KMeans
                            
                                name 'get_config' is not defined
                            
                                how to close pandas dataframe plot
                            
                                Pylint warning: Possible unbalanced tuple unpacking with sequence
                            
                                How do chained comparisons in Python actually work?
                            
                                Why use re.match(), when re.search() can do the same thing?
                            
                                Get row numbers of rows matching a condition in numpy
                            
                                Python win32gui SetAsForegroundWindow function not working properly
                            
                                How to programmatically count the number of files in an archive using python
                            
                                Data type of pandas column changes to object when it's passed to a function via apply?
                            
                                How to select a list of rows by name in Pandas dataframe
                            
                                How to correctly use auto_created attribute in django?
                            
                                Is there a chain calling method in Python?
                            
                                Python multiprocessing - Why is using functools.partial slower than default arguments?
                            
                                Equivalent to get_contents_to_file in boto3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Pandas: pivot only certain columns in the DataFrame while keeping others

Tags:

python

pandas

pivot-table

naja

People also ask

1 Answers

root

Recent Activity

Donate For Us