Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas CSV filter a column with its values N first char

Tags:

python

pandas

csv

i'm using pandas csv to work with a huge csv file, basically i have a python script with some args that are filters criteria, one of them is a string that represents a serie of digits (eg: 83351828) and then export the result to a new csv file. What I want to do is to be able to filter this column by its 4 first char.

Here is my code :

  elif devicePool == '' and css == '' and dirNumber != '' and routePartition == '':
        df = pd.concat(( [chunk[chunk['Directory Number 1'][0:4] == dirNumber] for chunk in pd.read_csv(sourceFile, iterator=True, chunksize=10**4)]))

As you can see I used "[0:4]" but it's not working.

def main(argv):
    inputfile = ''
    outputfile = ''
    devicePool = ''
    css = ''
    dirNumber = ''
    routePartition = ''
    try:
        opts, args = getopt.getopt(argv,"hi:o:p:c:n:r:",["ifile=","ofile=", "dpool=", "css=", "dnumber=", "route="])
    except getopt.GetoptError:
        print('test.py -i <inputfile> -o <outputfile> -p <devicepool> -c <CSS> -n <directorynumber> -r <routepartition>')
        sys.exit(2)
    for opt, arg in opts:
        if opt == '-h':
            print('test.py -i <inputfile> -o <outputfile> -p <devicepool> -c <CSS> -n <directorynumber> -r <routepartition>')
            sys.exit()
        elif opt in ("-i", "--ifile"):
            inputfile = arg
        elif opt in ("-o", "--ofile"):
            outputfile = arg
        elif opt in ("-p", "--dpool"):
            devicePool = arg
        elif opt in ("-c", "--css"):
            css = arg
        elif opt in ("-n", "--dnumber"):
            dirNumber = arg
        elif opt in ("-r", "--route"):
            routePartition = arg

    read_CSV(inputfile, outputfile, devicePool, css, dirNumber, routePartition)

Here is the error message :

pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

like image 546
ryuzak1 Avatar asked Jun 22 '26 17:06

ryuzak1


1 Answers

I think you need indexing with str for get first 4 letters, also 0 should be omitted:

chunk['Directory Number 1'].str[:4]

If values are not strings add Series.astype:

chunk['Directory Number 1'].astype(str).str[:4]
like image 144
jezrael Avatar answered Jun 25 '26 06:06

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!