Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ValueError: cannot reindex from a duplicate axis using isin with pandas

I am trying to short zipcodes into various files but I keep getting

ValueError: cannot reindex from a duplicate axis

I've read through other documentation on Stackoverflow, but I haven't been about to figure out why its duplicating axis.

import csv
import pandas as pd
from pandas import DataFrame as df
fp = '/Users/User/Development/zipcodes/file.csv'
file1 = open(fp, 'rb').read()
df = pd.read_csv(fp, sep=',')

df = df[['VIN', 'Reg Name', 'Reg Address', 'Reg City', 'Reg ST', 'ZIP',
         'ZIP', 'Catagory', 'Phone', 'First Name', 'Last Name', 'Reg NFS',
         'MGVW', 'Make', 'Veh Model','E Mfr', 'Engine Model', 'CY2010',
         'CY2011', 'CY2012', 'CY2013', 'CY2014', 'CY2015', 'Std Cnt', 
        ]]
#reader.head(1)
df.head(1)
zipBlue = [65355, 65350, 65345, 65326, 65335, 64788, 64780, 64777, 64743,
64742, 64739, 64735, 64723, 64722, 64720]

Also contains zipGreen, zipRed, zipYellow, ipLightBlue But did not include in example.

def IsInSort():
    blue = df[df.ZIP.isin(zipBlue)]
    green = df[df.ZIP.isin(zipGreen)]
    red = df[df.ZIP.isin(zipRed)]
    yellow = df[df.ZIP.isin(zipYellow)]
    LightBlue = df[df.ZIP.isin(zipLightBlue)]
def SaveSortedZips():
    blue.to_csv('sortedBlue.csv')
    green.to_csv('sortedGreen.csv')
    red.to_csv('sortedRed.csv')
    yellow.to_csv('sortedYellow.csv')
    LightBlue.to_csv('SortedLightBlue.csv')
IsInSort()
SaveSortedZips()

1864 # trying to reindex on an axis with duplicates 1865
if not self.is_unique and len(indexer): -> 1866 raise ValueError("cannot reindex from a duplicate axis") 1867 1868 def reindex(self, target, method=None, level=None, limit=None):

ValueError: cannot reindex from a duplicate axis

like image 867
icomefromchaos Avatar asked Dec 19 '22 03:12

icomefromchaos


1 Answers

I'm pretty sure your problem is related to your mask

df = df[['VIN', 'Reg Name', 'Reg Address', 'Reg City', 'Reg ST', 'ZIP',
         'ZIP', 'Catagory', 'Phone', 'First Name', 'Last Name', 'Reg NFS',
         'MGVW', 'Make', 'Veh Model','E Mfr', 'Engine Model', 'CY2010',
         'CY2011', 'CY2012', 'CY2013', 'CY2014', 'CY2015', 'Std Cnt', 
        ]]

'ZIP' is in there twice. Removing one of them should solve the problem.

The error ValueError: cannot reindex from a duplicate axis is one of these very very cryptic pandas errors which simply does not tell you what the error is.

The error is often related to two columns being named the same either before or after (internally in) the operation.

like image 133
firelynx Avatar answered Dec 24 '22 03:12

firelynx