Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using str in split in pandas

Here is some dummy data I have created for my question. I have two questions regarding this:

  1. Why is split working by using str in the first part of the query and not in the second part?
  2. How come [0] is picking up the first row in part 1 and the first element from each row in part 2?

chess_data = pd.DataFrame({"winner": ['A:1','A:2','A:3','A:4','B:1','B:2']})

chess_data.winner.str.split(":")[0]
['A', '1']

chess_data.winner.map(lambda n: n.split(":")[0])
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object
like image 663
ShubhamA Avatar asked Aug 18 '18 19:08

ShubhamA


1 Answers

  • chess_data is a dataframe
  • chess_data.winner is a series
  • chess_data.winner.str is an accessor to methods that are string specific and optimized (to a degree)
  • chess_data.winner.str.split is one such method
  • chess_data.winner.map is a different method that takes a dictionary or a callable object and either calls that callable with each element in the series or calls the dictionaries get method on each element of the series.

In the case of using chess_data.winner.str.split Pandas does do a loop and performs a kind of str.split. While map is a more crude way of doing the same thing.


With your data.

chess_data.winner.str.split(':')

0    [A, 1]
1    [A, 2]
2    [A, 3]
3    [A, 4]
4    [B, 1]
5    [B, 2]
Name: winner, dtype: object

In order to get each first element, you'll want to use the string accessor again

chess_data.winner.str.split(':').str[0]

0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

This is the equivalent way of performing what you had done in your map

chess_data.winner.map(lambda x: x.split(':')[0])

You could have also used a comprehension

chess_data.assign(new_col=[x.split(':')[0] for x in chess_data.winner])

  winner new_col
0    A:1       A
1    A:2       A
2    A:3       A
3    A:4       A
4    B:1       B
5    B:2       B
like image 106
piRSquared Avatar answered Oct 05 '22 12:10

piRSquared