Pandas split after month day time from rest of string

Question

I am working with a pandas dataframe. I am trying to split a column after the date and time from the rest of the string.

df
   data
0  Oct 22 12:56:52 server1
1  Oct 22 12:56:52 server2
2  Oct 22 12:56:53 server2
3  Oct 22 12:56:54 server2
4  Oct 22 12:56:56 comp2

Desired output:

df
   date              machine
0  Oct 22 12:56:52   server1
1  Oct 22 12:56:52   server2
2  Oct 22 12:56:53   server2
3  Oct 22 12:56:54   server2
4  Oct 22 12:56:56   comp2

If I try something like df["data"].str.extract('^(.*? [0-9]{2}) (.*)$') it just strips everything after the 22(day)

Quang Hoang · Accepted Answer

You can also pass the exact form of date ime:

df['data'].str.extract('(\w* \d* \d*:\d*:\d*) (.*)')

output:

                 0        1
0  Oct 22 12:56:52  server1
1  Oct 22 12:56:52  server2
2  Oct 22 12:56:53  server2
3  Oct 22 12:56:54  server2
4  Oct 22 12:56:56    comp2

Erfan · Answer

Using positive lookbehind to split on {semicolon}{two numbers}{space}:

Details:

(?<=) is positive lookbehind (check if anything is before the string)
:\d{2} is pattern: {semicolon}{two numbers}
\s is white space

Conclusion: we split on a whitespace but only if its preceeded by the pattern described above.

s = df['data'].str.split('(?<=:\d{2})\s')

df['date'] = s.str[0]
df['machine'] = s.str[1]
df = df.drop(columns='data')

Or as piRSquared & jezrael suggest in the comments, in a one-liner:

df['date'], df['machine'] = zip(*df.pop('data').str.split('(?<=:\d{2})\s'))

Output

              date  machine
0  Oct 22 12:56:52  server1
1  Oct 22 12:56:52  server2
2  Oct 22 12:56:53  server2
3  Oct 22 12:56:54  server2
4  Oct 22 12:56:56    comp2

piRSquared · Answer

comprehension

This depends on the data format always being 15 characters.
Also, since we're going to have to drop a column 'data' anyway, I thought it would be better to simply create a dataframe from scratch.

pd.DataFrame([[s[:15], s[16:]] for s in df.data], columns=['date', 'machine'])

              date  machine
0  Oct 22 12:56:52  server1
1  Oct 22 12:56:52  server2
2  Oct 22 12:56:53  server2
3  Oct 22 12:56:54  server2
4  Oct 22 12:56:56    comp2

`rsplit`

Depends on 'machine' name never having spaces.

This works because the string accessor provided by pandas.Series.str is an iterable and can be used in an assignment statement similar to x, y = (1, 2)

Also note that I unapologetically took the idea to use pop in this instance from @jezrael

df['date'], df['machine'] = df.pop('data').str.rsplit(n=1).str

df

              date  machine
0  Oct 22 12:56:52  server1
1  Oct 22 12:56:52  server2
2  Oct 22 12:56:53  server2
3  Oct 22 12:56:54  server2
4  Oct 22 12:56:56    comp2

Pandas split after month day time from rest of string

Tags:

python

regex

python-3.x

pandas

sectechguy

3 Answers

Quang Hoang

Erfan

comprehension

`rsplit`

piRSquared

Recent Activity

Donate For Us

Pandas split after month day time from rest of string

Tags:

python

regex

python-3.x

pandas

sectechguy

3 Answers

Quang Hoang

Erfan

comprehension

rsplit

piRSquared

Related questions

Recent Activity

Donate For Us

`rsplit`