i'm using pandas csv to work with a huge csv file, basically i have a python script with some args that are filters criteria, one of them is a string that represents a serie of digits (eg: 83351828) and then export the result to a new csv file. What I want to do is to be able to filter this column by its 4 first char.
elif devicePool == '' and css == '' and dirNumber != '' and routePartition == '':
df = pd.concat(( [chunk[chunk['Directory Number 1'][0:4] == dirNumber] for chunk in pd.read_csv(sourceFile, iterator=True, chunksize=10**4)]))
As you can see I used "[0:4]" but it's not working.
def main(argv):
inputfile = ''
outputfile = ''
devicePool = ''
css = ''
dirNumber = ''
routePartition = ''
try:
opts, args = getopt.getopt(argv,"hi:o:p:c:n:r:",["ifile=","ofile=", "dpool=", "css=", "dnumber=", "route="])
except getopt.GetoptError:
print('test.py -i <inputfile> -o <outputfile> -p <devicepool> -c <CSS> -n <directorynumber> -r <routepartition>')
sys.exit(2)
for opt, arg in opts:
if opt == '-h':
print('test.py -i <inputfile> -o <outputfile> -p <devicepool> -c <CSS> -n <directorynumber> -r <routepartition>')
sys.exit()
elif opt in ("-i", "--ifile"):
inputfile = arg
elif opt in ("-o", "--ofile"):
outputfile = arg
elif opt in ("-p", "--dpool"):
devicePool = arg
elif opt in ("-c", "--css"):
css = arg
elif opt in ("-n", "--dnumber"):
dirNumber = arg
elif opt in ("-r", "--route"):
routePartition = arg
read_CSV(inputfile, outputfile, devicePool, css, dirNumber, routePartition)
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
I think you need indexing with str for get first 4 letters, also 0 should be omitted:
chunk['Directory Number 1'].str[:4]
If values are not strings add Series.astype:
chunk['Directory Number 1'].astype(str).str[:4]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With