Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ValueError: Length of values does not match length of index | Pandas DataFrame.unique()

I am trying to get a new dataset, or change the value of the current dataset columns to their unique values. Here is an example of what I am trying to get :

   A B  ----- 0| 1 1 1| 2 5 2| 1 5 3| 7 9 4| 7 9 5| 8 9  Wanted Result    Not Wanted Result        A B            A B      -----          -----     0| 1 1         0| 1 1     1| 2 5         1| 2 5     2| 7 9         2|      3| 8           3| 7 9                    4|                    5| 8 

I don't really care about the index but it seems to be the problem. My code so far is pretty simple, I tried 2 approaches, 1 with a new dataFrame and one without.

#With New DataFrame  def UniqueResults(dataframe):     df = pd.DataFrame()     for col in dataframe:         S=pd.Series(dataframe[col].unique())         df[col]=S.values     return df  #Without new DataFrame def UniqueResults(dataframe):     for col in dataframe:         dataframe[col]=dataframe[col].unique()     return dataframe 

I have the error "Length of Values does not match length of index" both times.

like image 480
Mayeul sgc Avatar asked Feb 22 '17 03:02

Mayeul sgc


People also ask

How do I fix Valueerror length of values does not match length of index?

The simple solution is that you first convert the list/array to a pandas Series , and then when you do assignment, missing index in the Series will be filled with NaN values .

How do you reset the index of a data frame?

Use DataFrame.reset_index() function We can use DataFrame. reset_index() to reset the index of the updated DataFrame. By default, it adds the current row index as a new column called 'index' in DataFrame, and it will create a new row index as a range of numbers starting at 0.


1 Answers

The error comes up when you are trying to assign a list of numpy array of different length to a data frame, and it can be reproduced as follows:

A data frame of four rows:

df = pd.DataFrame({'A': [1,2,3,4]}) 

Now trying to assign a list/array of two elements to it:

df['B'] = [3,4]   # or df['B'] = np.array([3,4]) 

Both errors out:

ValueError: Length of values does not match length of index

Because the data frame has four rows but the list and array has only two elements.

Work around Solution (use with caution): convert the list/array to a pandas Series, and then when you do assignment, missing index in the Series will be filled with NaN:

df['B'] = pd.Series([3,4])  df #   A     B #0  1   3.0 #1  2   4.0 #2  3   NaN          # NaN because the value at index 2 and 3 doesn't exist in the Series #3  4   NaN 

For your specific problem, if you don't care about the index or the correspondence of values between columns, you can reset index for each column after dropping the duplicates:

df.apply(lambda col: col.drop_duplicates().reset_index(drop=True))  #   A     B #0  1   1.0 #1  2   5.0 #2  7   9.0 #3  8   NaN 
like image 61
Psidom Avatar answered Oct 12 '22 07:10

Psidom