Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using perl to split a line that may contain whitespace

Okay, so I'm using perl to read in a file that contains some general configuration data. This data is organized into headers based on what they mean. An example follows:

[vars]

# This is how we define a variable!
$var = 10;
$str = "Hello thar!";


# This section contains flags which can be used to modify module behavior
# All modules read this file and if they understand any of the flags, use them
[flags] 
  Verbose =       true; # Notice the errant whitespace!

[path]
WinPath = default; # Keyword which loads the standard PATH as defined by the operating system. Append  with additonal values.
LinuxPath = default;

Goal: Using the first line as an example "$var = 10;", I'd like to use the split function in perl to create an array that contains the characters "$var" and "10" as elements. Using another line as an example:

    Verbose    =         true;
    # Should become [Verbose, true] aka no whitespace is present

This is needed because I will be outputting these values to a new file (which a different piece of C++ code will read) to instantiate dictionary objects. Just to give you a little taste of what it might look like (just making it up as I go along):

define new dictionary
name: [flags]
# Start defining keys => values
new key name: Verbose
new value val: 10 
# End dictionary

Oh, and here is the code I currently have along with what it is doing (incorrectly):

sub makeref($)
{
    my @line = (split (/=/)); # Produces ["Verbose", "    true"];
}

To answer one question, why I am not using Config::Simple, is that I originally did not know what my configuration file would look like, only what I wanted it to do. Making it up as I went along - at least what seemed sensible to me - and using perl to parse the file.

The problem is I have some C++ code that will load the information in the config file, but since parsing in C or C++ is :( I decided to use perl. It's also a good learning exercise for me since I am new to the language. So that's the thing, this perl code is not really apart of my application, it just makes it easier for the C++ code to read the information. And, it is more readable (both the config file, and the generated file). Thanks for the feedback, it really helped.

like image 358
Tommy Fisk Avatar asked Jun 18 '10 07:06

Tommy Fisk


People also ask

How do I split a string by any whitespace?

The standard solution to split a string is using the split() method provided by the String class. It accepts a regular expression as a delimiter and returns a string array. To split on any whitespace character, you can use the predefined character class \s that represents a whitespace character.

How do I split a string by spaces in Perl?

The split() function is used to divide any string based on any particular delimiter and if no delimiter is provided the space is used as the default delimiter. The delimiter can be a character, a list of characters, a regular expression pattern, the hash value, and an undefined value.

How do I split a string with multiple delimiters in Perl?

A string is splitted based on delimiter specified by pattern. By default, it whitespace is assumed as delimiter. split syntax is: Split /pattern/, variableName.

Is Perl white space sensitive?

Whitespaces in Perl All types of whitespace like spaces, tabs, newlines, etc. are equivalent for the interpreter when they are used outside of the quotes. A line containing only whitespace, possibly with a comment, is known as a blank line, and Perl totally ignores it.


4 Answers

If you're doing this parsing as a learning exercise, that's fine. However, CPAN has several modules that will do a lot of the work for you.

use Config::Simple;
Config::Simple->import_from( 'some_config_file.txt', \my %conf );
like image 155
FMc Avatar answered Oct 19 '22 19:10

FMc


split splits on a regular expression, so you can simply put the whitespace around the = sign into its regex:

split (/\s*=\s*/, $line);

You obviously do not want to remove all whitespace, or such a line would be produced (whitespace missing in the string):

$str="Hellothere!";

I guess that only removing whitespace from the beginning and end of the line is sufficient:

$line =~ s/^\s*(.*?)\s*$/$1/;

A simpler alternative with two statements:

$line =~ s/^\s+//;
$line =~ s/\s+$//;
like image 21
Svante Avatar answered Oct 19 '22 18:10

Svante


Seems like you've got it. Strip the whitespaces before splitting.

sub makeref($)
{
    s/\s+//g;
    my @line = (split(/=/)); # gets ["verbose", "true"]
}
like image 29
Daniel Quinlan Avatar answered Oct 19 '22 19:10

Daniel Quinlan


This code does the trick (and is more efficient without reversing).

for (@line) {
    s/^\s+//;
    s/\s+$//;
}
like image 34
Tommy Fisk Avatar answered Oct 19 '22 18:10

Tommy Fisk