Value Error when Slicing in Pandas

Tags:

I have a DataFrame that I would like to use the 'str.contrains()' method. I believed I had found how to do this when I read pandas + dataframe - select by partial string. However, I keep getting a value error.

My DataFrame is as follow:

Click to copy

ID,ENROLLMENT_DATE,TRAINER_MANAGING,TRAINER_OPERATOR,FIRST_VISIT_DATE
1536D,12-Feb-12,"06DA1B3-Lebanon NH",,15-Feb-12
F15D,18-May-12,"06405B2-Lebanon NH",,25-Jul-12
8096,8-Aug-12,"0643D38-Hanover NH","0643D38-Hanover NH",25-Jun-12
A036,1-Apr-12,"06CB8CF-Hanover NH","06CB8CF-Hanover NH",9-Aug-12
8944,19-Feb-12,"06D26AD-Hanover NH",,4-Feb-12
1004E,8-Jun-12,"06388B2-Lebanon NH",,24-Dec-11
11795,3-Jul-12,"0649597-White River VT","0649597-White River VT",30-Mar-12
30D7,11-Nov-12,"06D95A3-Hanover NH","06D95A3-Hanover NH",30-Nov-11
3AE2,21-Feb-12,"06405B2-Lebanon NH",,26-Oct-12
B0FE,17-Feb-12,"06D1B9D-Hartland VT",,16-Feb-12
127A1,11-Dec-11,"064456E-Hanover NH","064456E-Hanover NH",11-Nov-12
161FF,20-Feb-12,"0643D38-Hanover NH","0643D38-Hanover NH",3-Jul-12
A036,30-Nov-11,"063B208-Randolph VT","063B208-Randolph VT",
475B,25-Sep-12,"06D26AD-Hanover NH",,5-Nov-12
151A3,7-Mar-12,"06388B2-Lebanon NH",,16-Nov-12
CA62,3-Jan-12,,,
D31B,18-Dec-11,"06405B2-Lebanon NH",,9-Jan-12
20F5,8-Jul-12,"0669C50-Randolph VT",,3-Feb-12
8096,19-Dec-11,"0649597-White River VT","0649597-White River VT",9-Apr-12
14E48,1-Aug-12,"06D3206-Hanover NH",,
177F8,20-Aug-12,"063B208-Randolph VT","063B208-Randolph VT",5-May-12
553E,11-Oct-12,"06D95A3-Hanover NH","06D95A3-Hanover NH",8-Mar-12
12D5F,18-Jul-12,"0649597-White River VT","0649597-White River VT",2-Nov-12
C6DC,13-Apr-12,"06388B2-Lebanon NH",,
11795,27-Feb-12,"0643D38-Hanover NH","0643D38-Hanover NH",19-Jun-12
17B43,11-Aug-12,,,22-Oct-12
A036,11-Aug-12,"06D3206-Hanover NH",,19-Jun-12

Then I run the following code:

Click to copy

test = pandas.read_csv('testcsv.csv')
test[test.TRAINER_MANAGING.str.contains('Han', na=False)]

and I get the following error:

Click to copy

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-54-e0c4624c9346> in <module>()
----> 1 test[test.TRAINER_MANAGING.str.contains('Han', na=False)]

.virtualenvs/ipython/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
   1958 
   1959             # also raises Exception if object array with NA values
-> 1960             if com._is_bool_indexer(key):
   1961                 key = np.asarray(key, dtype=bool)
   1962             return self._getitem_array(key)

.virtualenvs/ipython/lib/python2.7/site-packages/pandas/core/common.pyc in _is_bool_indexer(key)
    685         if not lib.is_bool_array(key):
    686             if isnull(key).any():
--> 687                 raise ValueError('cannot index with vector containing '
    688                                  'NA / NaN values')
    689             return False

ValueError: cannot index with vector containing NA / NaN values

I feel like I am missing something simple. Any help would be appreciated.

688

asked Feb 06 '13 14:02

BigHandsome

1 Answers

Your string search still returns nan values whereas the slicing operation works with booleans only. It appears 'na=False' is not working (in this case?), i can replicate it on my machine with the latest (released) Pandas version.

You can workaround it by first applying the .fillna() function to the results like:

Click to copy

test[test.TRAINER_MANAGING.str.contains('Han').fillna(False)]

Which returns:

Click to copy

       ID ENROLLMENT_DATE    TRAINER_MANAGING    TRAINER_OPERATOR FIRST_VISIT_DATE
2    8096        8-Aug-12  0643D38-Hanover NH  0643D38-Hanover NH        25-Jun-12
3    A036        1-Apr-12  06CB8CF-Hanover NH  06CB8CF-Hanover NH         9-Aug-12
4    8944       19-Feb-12  06D26AD-Hanover NH                 NaN         4-Feb-12
7    30D7       11-Nov-12  06D95A3-Hanover NH  06D95A3-Hanover NH        30-Nov-11
10  127A1       11-Dec-11  064456E-Hanover NH  064456E-Hanover NH        11-Nov-12
11  161FF       20-Feb-12  0643D38-Hanover NH  0643D38-Hanover NH         3-Jul-12
13   475B       25-Sep-12  06D26AD-Hanover NH                 NaN         5-Nov-12
19  14E48        1-Aug-12  06D3206-Hanover NH                 NaN              NaN
21   553E       11-Oct-12  06D95A3-Hanover NH  06D95A3-Hanover NH         8-Mar-12
24  11795       27-Feb-12  0643D38-Hanover NH  0643D38-Hanover NH        19-Jun-12
26   A036       11-Aug-12  06D3206-Hanover NH                 NaN        19-Jun-12

I have never used the str.contains function before so im not sure if it doesnt work correctly. We should open an issue on github if it should work as in your example.

102

answered Nov 02 '22 23:11

Rutger Kassies

Related questions
                            
                                Read FORTRAN formatted numbers with Python
                            
                                Password protect a whole django app
                            
                                Calculate daily sums using python pandas
                            
                                Python regular expression for Beautiful Soup
                            
                                What is the proper way to comment code in Python?
                            
                                what does the comma mean in python's unpack?
                            
                                Solving linear system over integers with numpy
                            
                                copy netcdf file using python
                            
                                Reading DBF files with pyodbc
                            
                                How to pass the fields parameter into a google drive python API call
                            
                                Python - Sorting elements in a list of lists
                            
                                lxml truncates text that contains 'less than' character
                            
                                Python: unicode in system commands
                            
                                how to install libmemcached for django framework in ubuntu 10.04
                            
                                Traversing a sequence of generators
                            
                                Python: Opening a file without creating a lock
                            
                                Python 3: When to use dict, when list of tuples?
                            
                                ignore characters during .sort python
                            
                                Python py2exe window showing (tkinter)
                            
                                Rabin-Miller Strong Pseudoprime Test Implementation won't work

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Value Error when Slicing in Pandas

Tags:

python

pandas

BigHandsome

People also ask

1 Answers

Rutger Kassies

Recent Activity

Donate For Us