I am aware of the skiprows that allows you to pass a list with the indices of the rows to skip. However, I have the index of the rows I want to keep.
Say that my cvs file looks like this for millions of rows:
A B
0 1 2
1 3 4
2 5 6
3 7 8
4 9 0
The list of indices i would like to load are only 2,3, so
index_list = [2,3]
The input for the skiprows function would be [0,1,4]. However, I only have available [2,3].
I am trying something like:
pd.read_csv(path, skiprows = ~index_list)
but no luck.. any suggestions?
thank and I appreciate all the help,
You can pass in a lambda function in the skiprows
argument. For example:
rows_to_keep = [2,3]
pd.read_csv(path, skiprows = lambda x: x not in rows_to_keep)
You can read more about it in the documentation here
I think you would need to find the number of lines first, like this.
num_lines = sum(1 for line in open('myfile.txt'))
Then you would need to delete the indices of index_list
:
to_exclude = [i for i in num_lines if i not in index_list]
and then load your data:
pd.read_csv(path, skiprows = to_exclude)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With