How do you use a regex in a form?

A regex usually comes within this form / abc /, where the search pattern is delimited by two slash characters /. At the end we can specify a flag with these values (we can also combine them each other): g (global) does not return after the first match, restarting the subsequent searches from the end of the previous match.

Can you match non-printable characters in regex?

Notice that you can match also non-printable characters like tabs , new-lines , carriage returns . We are learning how to construct a regex but forgetting a fundamental concept: flags. A regex usually comes within this form / abc /, where the search pattern is delimited by two slash characters /.

Is there a way to separate file names from paths?

Many frameworks (e.g. .NET/Python) have methods for separating file names from paths without needing to manually search for the '/' character. This is great because the tools are typically platform-independent. Yes, but he hasn't specified language yet. If it was Python, I would suggest os.path.dirname and os.path.basename .

How to find a directory that starts with / in Linux?

A directory can start with / when it is absolute path and directory name when it's relative. Hence, look for / with zero or one occurrence. / (?P<filepath> (?P<root> [/]?) (?P<rest_of_the_path>.+))/ Step-2: Try to find the first directory. Next, a directory and its child is always separated by /. And a directory name can be anything except /.

Regex for parsing directory and filename

Tags:

regex

parsing

Try this:

^(.+)\/([^\/]+)$

EDIT: escaped the forward slash to prevent problems when copy/pasting the Regex

In languages that support regular expressions with non-capturing groups:

((?:[^/]*/)*)(.*)

I'll explain the gnarly regex by exploding it...

(
  (?:
    [^/]*
    /
  )
  *
)
(.*)

What the parts mean:

(  -- capture group 1 starts
  (?:  -- non-capturing group starts
    [^/]*  -- greedily match as many non-directory separators as possible
    /  -- match a single directory-separator character
  )  -- non-capturing group ends
  *  -- repeat the non-capturing group zero-or-more times
)  -- capture group 1 ends
(.*)  -- capture all remaining characters in group 2

Example

To test the regular expression, I used the following Perl script...

#!/usr/bin/perl -w

use strict;
use warnings;

sub test {
  my $str = shift;
  my $testname = shift;

  $str =~ m#((?:[^/]*/)*)(.*)#;

  print "$str -- $testname\n";
  print "  1: $1\n";
  print "  2: $2\n\n";
}

test('/var/log/xyz/10032008.log', 'absolute path');
test('var/log/xyz/10032008.log', 'relative path');
test('10032008.log', 'filename-only');
test('/10032008.log', 'file directly under root');

The output of the script...

/var/log/xyz/10032008.log -- absolute path
  1: /var/log/xyz/
  2: 10032008.log

var/log/xyz/10032008.log -- relative path
  1: var/log/xyz/
  2: 10032008.log

10032008.log -- filename-only
  1:
  2: 10032008.log

/10032008.log -- file directly under root
  1: /
  2: 10032008.log

Most languages have path parsing functions that will give you this already. If you have the ability, I'd recommend using what comes to you for free out-of-the-box.

Assuming / is the path delimiter...

^(.*/)([^/]*)$

The first group will be whatever the directory/path info is, the second will be the filename. For example:

/foo/bar/baz.log: "/foo/bar/" is the path, "baz.log" is the file
foo/bar.log: "foo/" is the path, "bar.log" is the file
/foo/bar: "/foo/" is the path, "bar" is the file
/foo/bar/: "/foo/bar/" is the path and there is no file.

What language? and why use regex for this simple task?

If you must:

^(.*)/([^/]*)$

gives you the two parts you wanted. You might need to quote the parentheses:

^\(.*\)/\([^/]*\)$

depending on your preferred language syntax.

But I suggest you just use your language's string search function that finds the last "/" character, and split the string on that index.

Reasoning:

I did a little research through trial and error method. Found out that all the values that are available in keyboard are eligible to be a file or directory except '/' in *nux machine.

I used touch command to create file for following characters and it created a file.

(Comma separated values below)
'!', '@', '#', '$', "'", '%', '^', '&', '*', '(', ')', ' ', '"', '\', '-', ',', '[', ']', '{', '}', '`', '~', '>', '<', '=', '+', ';', ':', '|'

It failed only when I tried creating '/' (because it's root directory) and filename container / because it file separator.

And it changed the modified time of current dir . when I did touch .. However, file.log is possible.

And of course, a-z, A-Z, 0-9, - (hypen), _ (underscore) should work.

Outcome

So, by the above reasoning we know that a file name or directory name can contain anything except / forward slash. So, our regex will be derived by what will not be present in the file name/directory name.

/(?:(?P<dir>(?:[/]?)(?:[^\/]+/)+)(?P<filename>[^/]+))/

Step by Step regexp creation process

Pattern Explanation

Step-1: Start with matching `root` directory

A directory can start with / when it is absolute path and directory name when it's relative. Hence, look for / with zero or one occurrence.

/(?P<filepath>(?P<root>[/]?)(?P<rest_of_the_path>.+))/

enter image description here

Step-2: Try to find the first directory.

Next, a directory and its child is always separated by /. And a directory name can be anything except /. Let's match /var/ first then.

/(?P<filepath>(?P<first_directory>(?P<root>[/]?)[^\/]+/)(?P<rest_of_the_path>.+))/

enter image description here

Step-3: Get full directory path for the file

Next, let's match all directories

/(?P<filepath>(?P<dir>(?P<root>[/]?)(?P<single_dir>[^\/]+/)+)(?P<rest_of_the_path>.+))/

enter image description here

Here, single_dir is yz/ because, first it matched var/, then it found next occurrence of same pattern i.e. log/, then it found the next occurrence of same pattern yz/. So, it showed the last occurrence of pattern.

Step-4: Match filename and clean up

Now, we know that we're never going to use the groups like single_dir, filepath, root. Hence let's clean that up.

Let's keep them as groups however don't capture those groups.

And rest_of_the_path is just the filename! So, rename it. And a file will not have / in its name, so it's better to keep [^/]

/(?:(?P<dir>(?:[/]?)(?:[^\/]+/)+)(?P<filename>[^/]+))/

This brings us to the final result. Of course, there are several other ways you can do it. I am just mentioning one of the ways here.

enter image description here

Regex Rules used above are listed here

^ means string starts with
(?P<dir>pattern) means capture group by group name. We have two groups with group name dir and file
(?:pattern) means don't consider this group or non-capturing group.
? means match zero or one. + means match one or more [^\/] means matches any char except forward slash (/)

[/]? means if it is absolute path then it can start with / otherwise it won't. So, match zero or one occurrence of /.

[^\/]+/ means one or more characters which aren't forward slash (/) which is followed by a forward slash (/). This will match var/ or xyz/. One directory at a time.

What about this?

[/]{0,1}([^/]+[/])*([^/]*)

Deterministic :

((/)|())([^/]+/)*([^/]*)

Strict :

^[/]{0,1}([^/]+[/])*([^/]*)$
^((/)|())([^/]+/)*([^/]*)$

A very late answer, but hope this will help

^(.+?)/([\w]+\.log)$

This uses lazy check for /, and I just modified the accepted answer

http://regex101.com/r/gV2xB7/1

Related questions
                            
                                How can I exclude some characters from a class?
                            
                                How can I match on, but exclude a regex pattern?
                            
                                Atom Editor: RegEx replace to uppercase/lowercase
                            
                                How to filter multiple words in Android Studio logcat
                            
                                Dynamic vs Inline RegExp performance in JavaScript
                            
                                What is '?-mix' in a Ruby Regular Expression
                            
                                How can I validate US Social Security Number?
                            
                                Replace chars if not match
                            
                                How to use awk variables in regular expressions?
                            
                                PHP ereg vs. preg
                            
                                Visual Studio Find and Replace Regular Expressions help
                            
                                Why doesn’t the alternation (pipe) operator ( | ) in JavaScript regular expressions give me two matches?
                            
                                Regex not equal to string
                            
                                What does [\S\s]* mean in regex in PHP?
                            
                                Escaping special characters in Java Regular Expressions
                            
                                javascript split string by space, but ignore space in quotes (notice not to split by the colon too)
                            
                                Java equivalent to PHP's preg_replace_callback
                            
                                Javascript reg ex to match whole word only, bound only by whitespace
                            
                                How do you "debug" a regular expression with sed?
                            
                                Replace a word with multiple lines using sed?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regex for parsing directory and filename

Tags:

regex

parsing

Example

Reasoning:

Outcome

Step by Step regexp creation process

Pattern Explanation

Step-1: Start with matching root directory

Step-2: Try to find the first directory.

Step-3: Get full directory path for the file

Step-4: Match filename and clean up

Regex Rules used above are listed here

Related questions

Recent Activity

Donate For Us

Step-1: Start with matching `root` directory