I have a dataframe as shown below (top 3 rows): <pre class="prettyprint"><code>Sample_Name Sample_ID Sample_Type IS Component_Name IS_Name Component_Group_Name Outlier_Reasons Actual_Concentration Area Height Retention_Time Width_at_50_pct Used Calculated_Concentration Accuracy Index 1 20170824_ELN147926_HexLacCer_Plasma_A-1-1 NaN Unknown True GluCer(d18:1/12:0)_LCB_264.3 NaN NaN NaN 0.1 2.733532e+06 5.963840e+05 2.963911 0.068676 True NaN NaN 2 20170824_ELN147926_HexLacCer_Plasma_A-1-1 NaN Unknown True GluCer(d18:1/17:0)_LCB_264.3 NaN NaN NaN 0.1 2.945190e+06 5.597470e+05 2.745026 0.068086 True NaN NaN 3 20170824_ELN147926_HexLacCer_Plasma_A-1-1 NaN Unknown False GluCer(d18:1/16:0)_LCB_264.3 GluCer(d18:1/17:0)_LCB_264.3 NaN NaN NaN 3.993535e+06 8.912731e+05 2.791991 0.059864 True 125.927659773487 NaN </code></pre> When trying to generate a pivot table: <pre class="prettyprint"><code>pivoted_report_conc = raw_report.pivot(index = "Sample_Name", columns = 'Component_Name', values = "Calculated_Concentration") </code></pre> I get the following error: <pre class="prettyprint"><code>ValueError: Index contains duplicate entries, cannot reshape </code></pre> I tried resetting the index but it did not help. I couldn't find any duplicate values in the "Index" column. Could someone please help identify the problem here? The expected output would be a reshaped dataframe with only the unique component names as columns and respective concentrations for each sample name: <pre class="prettyprint"><code>Sample_Name GluCer(d18:1/12:0)_LCB_264.3 GluCer(d18:1/17:0)_LCB_264.3 GluCer(d18:1/16:0)_LCB_264.3 20170824_ELN147926_HexLacCer_Plasma_A-1-1 NaN NaN 125.927659773487 </code></pre> To clarify, I am not looking to aggregate the data, just reshape it.

You should be able to accomplish what you are looking to do by using the the <code>pandas.pivot_table()</code> functionality as documented here. With your dataframe stored as <code>df</code> use the following code: <pre class="prettyprint"><code>import pandas as pd df = pd.read_table('table_from_which_to_read') new_df = pd.pivot_table(df,index=['Simple Name'], columns = 'Component_Name', values = "Calculated_Concentration") </code></pre> If you want something other than the mean of the concentration value, you will need to change the <code>aggfunc</code> parameter. EDIT Since you don't want to aggregate over the values, you can reshape the data by using the <code>set_index</code> function on your DataFrame with documentation found here. <pre class="prettyprint"><code>import pandas as pd df = pd.DataFrame({'NonUniqueLabel':['Item1','Item1','Item1','Item2'], 'SemiUniqueValue':['X','Y','Z','X'], 'Value':[1.0,100,5,None]) new_df = df.set_index(['NonUniqueLabel','SemiUniqueLabel']) </code></pre> The resulting table should look like what you expect the results to be and will have a multi-index.

Pandas pivot table ValueError: Index contains duplicate entries, cannot reshape

I have a dataframe as shown below (top 3 rows):

Sample_Name Sample_ID   Sample_Type IS  Component_Name  IS_Name Component_Group_Name    Outlier_Reasons Actual_Concentration    Area    Height  Retention_Time  Width_at_50_pct Used    Calculated_Concentration    Accuracy
Index                                                               
1   20170824_ELN147926_HexLacCer_Plasma_A-1-1   NaN Unknown True    GluCer(d18:1/12:0)_LCB_264.3    NaN NaN NaN 0.1 2.733532e+06    5.963840e+05    2.963911    0.068676    True    NaN NaN
2   20170824_ELN147926_HexLacCer_Plasma_A-1-1   NaN Unknown True    GluCer(d18:1/17:0)_LCB_264.3    NaN NaN NaN 0.1 2.945190e+06    5.597470e+05    2.745026    0.068086    True    NaN NaN
3   20170824_ELN147926_HexLacCer_Plasma_A-1-1   NaN Unknown False   GluCer(d18:1/16:0)_LCB_264.3    GluCer(d18:1/17:0)_LCB_264.3    NaN NaN NaN 3.993535e+06    8.912731e+05    2.791991    0.059864    True    125.927659773487    NaN

When trying to generate a pivot table:

pivoted_report_conc = raw_report.pivot(index = "Sample_Name", columns = 'Component_Name', values = "Calculated_Concentration")

I get the following error:

ValueError: Index contains duplicate entries, cannot reshape

I tried resetting the index but it did not help. I couldn't find any duplicate values in the "Index" column. Could someone please help identify the problem here?

The expected output would be a reshaped dataframe with only the unique component names as columns and respective concentrations for each sample name:

Sample_Name    GluCer(d18:1/12:0)_LCB_264.3    GluCer(d18:1/17:0)_LCB_264.3    GluCer(d18:1/16:0)_LCB_264.3
20170824_ELN147926_HexLacCer_Plasma_A-1-1    NaN    NaN    125.927659773487

To clarify, I am not looking to aggregate the data, just reshape it.

How do you solve index contains duplicate entries Cannot reshape?

You can avoid this by retaining the default index column (row #) and while setting the index using " id ", " date " and " location ", add it in " append " mode instead of the default overwrite mode. Once this is done, your index columns will still have the default index along with the set indexes.

What is the difference between pivot () and Pivot_table () functions?

Basically, the pivot_table() function is a generalization of the pivot() function that allows aggregation of values — for example, through the len() function in the previous example. Pivot only works — or makes sense — if you need to pivot a table and show values without any aggregation.

How do you use pivot in pandas?

To use the pivot method in Pandas, you need to specify three parameters: Index: Which column should be used to identify and order your rows vertically. Columns: Which column should be used to create the new columns in our reshaped DataFrame.

You can use groupby() and unstack() to get around the error you're seeing with pivot().

Here's some example data, with a few edge cases added, and some column values removed or substituted for MCVE:

# df
      Sample_Name  Sample_ID     IS Component_Name Calculated_Concentration Outlier_Reasons
Index                                                                    
1             foo        NaN   True              x                  NaN              NaN  
1             foo        NaN   True              y                  NaN              NaN 
2             foo        NaN   False             z            125.92766              NaN 
2             bar        NaN   False             x                 1.00              NaN  
2             bar        NaN   False             y                 2.00              NaN  
2             bar        NaN   False             z                  NaN              NaN  

(df.groupby(['Sample_Name','Component_Name'])
   .Calculated_Concentration
   .first()
   .unstack()
)

Output:

Component_Name    x   y          z
Sample_Name                       
bar             1.0 2.0        NaN
foo             NaN NaN  125.92766

You should be able to accomplish what you are looking to do by using the the pandas.pivot_table() functionality as documented here.

With your dataframe stored as df use the following code:

import pandas as pd
df = pd.read_table('table_from_which_to_read')

new_df = pd.pivot_table(df,index=['Simple Name'], columns = 'Component_Name', values = "Calculated_Concentration")

If you want something other than the mean of the concentration value, you will need to change the aggfunc parameter.

EDIT

Since you don't want to aggregate over the values, you can reshape the data by using the set_index function on your DataFrame with documentation found here.

import pandas as pd
df = pd.DataFrame({'NonUniqueLabel':['Item1','Item1','Item1','Item2'],
     'SemiUniqueValue':['X','Y','Z','X'], 'Value':[1.0,100,5,None])

new_df = df.set_index(['NonUniqueLabel','SemiUniqueLabel'])

The resulting table should look like what you expect the results to be and will have a multi-index.

Pandas pivot table ValueError: Index contains duplicate entries, cannot reshape

Tags:

python

pandas

kkhatri99

People also ask

2 Answers

andrew_reece

J-Eubanks

Recent Activity

Donate For Us

Pandas pivot table ValueError: Index contains duplicate entries, cannot reshape

Tags:

python

pandas

kkhatri99

People also ask

2 Answers

andrew_reece

J-Eubanks

Related questions

Recent Activity

Donate For Us