I need to pull numbers from a string and put them into a list, there are some rules to this however such as identifying if the extracted number is a Integer or Float.
The task sounds simple enough but I am finding myself more and more confused as time goes by and could really do with some guidance.
Take the following test string as an example:
There are test values: P7 45.826.53.91.7, .5, 66.. 4 and 5.40.3.
The rules to follow when parsing the string are as follows:
numbers cannot be preceeded by a letter.
If it finds a number and is not followed by a decimal point then the number is as an Integer.
If it finds a number and is followed by a decimal point then the number is a float, eg 5.
~ If more numbers follow the decimal point then the number is still a float, eg 5.40
~ A further found decimal point should then break up the number, eg 5.40.3 becomes (5.40 Float) and (3 Float)
In the event of a letter for example following a decimal point, eg 3.H
then still add 3.
as a Float to the list (even if technically it is not valid)
Example 1
To make this a little more clearer, taking the test string quoted above the desired output should be as follows:
From the image above, light blue colour illustrates Float numbers, pale red illustrates single Integers (but note also how Floats joined together are split into seperate Floats).
- 45.826 (Float)
- 53.91 (Float)
- 7 (Integer)
- 5 (Integer)
- 66 . (Float)
- 4 (Integer)
- 5.40 (Float)
- 3 . (Float)
Note there are deliberate spaces between 66 . and 3 . above due to the way the numbers were formatted.
Example 2:
Anoth3r Te5.t string .4 abc 8.1Q 123.45.67.8.9
- 4 (Integer)
- 8.1 (Float)
- 123.45 (Float)
- 67.8 (Float)
- 9 (Integer)
To give a better idea, I created a new project whilst testing which looks like this:
Now onto the actual task. I thought maybe I could read each character from the string and identify what are valid numbers as per the rules above, and then pull them into a list.
To my ability, this was the best I could manage:
The code is as follows:
unit Unit1;
{$mode objfpc}{$H+}
interface
uses
Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls;
type
TForm1 = class(TForm)
btnParseString: TButton;
edtTestString: TEdit;
Label1: TLabel;
Label2: TLabel;
Label3: TLabel;
lstDesiredOutput: TListBox;
lstActualOutput: TListBox;
procedure btnParseStringClick(Sender: TObject);
private
FDone: Boolean;
FIdx: Integer;
procedure ParseString(const Str: string; var OutValue, OutKind: string);
public
{ public declarations }
end;
var
Form1: TForm1;
implementation
{$R *.lfm}
{ TForm1 }
procedure TForm1.ParseString(const Str: string; var OutValue, OutKind: string);
var
CH1, CH2: Char;
begin
Inc(FIdx);
CH1 := Str[FIdx];
case CH1 of
'0'..'9': // Found a number
begin
CH2 := Str[FIdx - 1];
if not (CH2 in ['A'..'Z']) then
begin
OutKind := 'Integer';
// Try to determine float...
//while (CH1 in ['0'..'9', '.']) do
//begin
// case Str[FIdx] of
// '.':
// begin
// CH2 := Str[FIdx + 1];
// if not (CH2 in ['0'..'9']) then
// begin
// OutKind := 'Float';
// //Inc(FIdx);
// end;
// end;
// end;
//end;
end;
OutValue := Str[FIdx];
end;
end;
FDone := FIdx = Length(Str);
end;
procedure TForm1.btnParseStringClick(Sender: TObject);
var
S, SKind: string;
begin
lstActualOutput.Items.Clear;
FDone := False;
FIdx := 0;
repeat
ParseString(edtTestString.Text, S, SKind);
if (S <> '') and (SKind <> '') then
begin
lstActualOutput.Items.Add(S + ' (' + SKind + ')');
end;
until
FDone = True;
end;
end.
It clearly doesn't give the desired output (failed code has been commented out) and my approach is likely wrong but I feel I only need to make a few changes here and there for a working solution.
At this point I have found myself rather confused and quite lost despite thinking the answer is quite close, the task is becoming increasingly infuriating and I would really appreciate some help.
EDIT 1
Here I got a little closer as there is no longer duplicate numbers but the result is still clearly wrong.
unit Unit1;
{$mode objfpc}{$H+}
interface
uses
Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls;
type
TForm1 = class(TForm)
btnParseString: TButton;
edtTestString: TEdit;
Label1: TLabel;
Label2: TLabel;
Label3: TLabel;
lstDesiredOutput: TListBox;
lstActualOutput: TListBox;
procedure btnParseStringClick(Sender: TObject);
private
FDone: Boolean;
FIdx: Integer;
procedure ParseString(const Str: string; var OutValue, OutKind: string);
public
{ public declarations }
end;
var
Form1: TForm1;
implementation
{$R *.lfm}
{ TForm1 }
// Prepare to pull hair out!
procedure TForm1.ParseString(const Str: string; var OutValue, OutKind: string);
var
CH1, CH2: Char;
begin
Inc(FIdx);
CH1 := Str[FIdx];
case CH1 of
'0'..'9': // Found the start of a new number
begin
CH1 := Str[FIdx];
// make sure previous character is not a letter
CH2 := Str[FIdx - 1];
if not (CH2 in ['A'..'Z']) then
begin
OutKind := 'Integer';
// Try to determine float...
//while (CH1 in ['0'..'9', '.']) do
//begin
// OutKind := 'Float';
// case Str[FIdx] of
// '.':
// begin
// CH2 := Str[FIdx + 1];
// if not (CH2 in ['0'..'9']) then
// begin
// OutKind := 'Float';
// Break;
// end;
// end;
// end;
// Inc(FIdx);
// CH1 := Str[FIdx];
//end;
end;
OutValue := Str[FIdx];
end;
end;
OutValue := Str[FIdx];
FDone := Str[FIdx] = #0;
end;
procedure TForm1.btnParseStringClick(Sender: TObject);
var
S, SKind: string;
begin
lstActualOutput.Items.Clear;
FDone := False;
FIdx := 0;
repeat
ParseString(edtTestString.Text, S, SKind);
if (S <> '') and (SKind <> '') then
begin
lstActualOutput.Items.Add(S + ' (' + SKind + ')');
end;
until
FDone = True;
end;
end.
My question is how can I extract numbers from a string, add them to a list and determine if the number is integer or float?
The left pale green listbox (desired output) shows what the results should be, the right pale blue listbox (actual output) shows what we actually got.
Please advise Thanks.
Note I re-added the Delphi tag as I do use XE7 so please don't remove it, although this particular problem is in Lazarus my eventual solution should work for both XE7 and Lazarus.
Your rules are rather complex, so you can try to build finite state machine (FSM, DFA -Deterministic finite automaton).
Every char causes transition between states.
For example, when you are in state "integer started" and meet space char, you yield integer value and FSM goes into state " anything wanted".
If you are in state "integer started" and meet '.', FSM goes into state "float or integer list started" and so on.
There are so many basic errors in your code I decided to correct your homework, as it were. This is still not a good way to do it, but at least the basic errors are removed. Take care to read the comments!
procedure TForm1.ParseString(const Str: string; var OutValue,
OutKind: string);
//var
// CH1, CH2: Char; <<<<<<<<<<<<<<<< Don't need these
begin
(*************************************************
* *
* This only corrects the 'silly' errors. It is *
* NOT being passed off as GOOD code! *
* *
*************************************************)
Inc(FIdx);
// CH1 := Str[FIdx]; <<<<<<<<<<<<<<<<<< Not needed but OK to use. I removed them because they seemed to cause confusion...
OutKind := 'None';
OutValue := '';
try
case Str[FIdx] of
'0'..'9': // Found the start of a new number
begin
// CH1 := Str[FIdx]; <<<<<<<<<<<<<<<<<<<< Not needed
// make sure previous character is not a letter
// >>>>>>>>>>> make sure we are not at beginning of file
if FIdx > 1 then
begin
//CH2 := Str[FIdx - 1];
if (Str[FIdx - 1] in ['A'..'Z', 'a'..'z']) then // <<<<< don't forget lower case!
begin
exit; // <<<<<<<<<<<<<<
end;
end;
// else we have a digit and it is not preceeded by a number, so must be at least integer
OutKind := 'Integer';
// <<<<<<<<<<<<<<<<<<<<< WHAT WE HAVE SO FAR >>>>>>>>>>>>>>
OutValue := Str[FIdx];
// <<<<<<<<<<<<< Carry on...
inc( FIdx );
// Try to determine float...
while (Fidx <= Length( Str )) and (Str[ FIdx ] in ['0'..'9', '.']) do // <<<<< not not CH1!
begin
OutValue := Outvalue + Str[FIdx]; //<<<<<<<<<<<<<<<<<<<<<< Note you were storing just 1 char. EVER!
//>>>>>>>>>>>>>>>>>>>>>>>>> OutKind := 'Float'; ***** NO! *****
case Str[FIdx] of
'.':
begin
OutKind := 'Float';
// now just copy any remaining integers - that is all rules ask for
inc( FIdx );
while (Fidx <= Length( Str )) and (Str[ FIdx ] in ['0'..'9']) do // <<<<< note '.' excluded here!
begin
OutValue := Outvalue + Str[FIdx];
inc( FIdx );
end;
exit;
end;
// >>>>>>>>>>>>>>>>>>> all the rest in unnecessary
//CH2 := Str[FIdx + 1];
// if not (CH2 in ['0'..'9']) then
// begin
// OutKind := 'Float';
// Break;
// end;
// end;
// end;
// Inc(FIdx);
// CH1 := Str[FIdx];
//end;
end;
inc( fIdx );
end;
end;
end;
// OutValue := Str[FIdx]; <<<<<<<<<<<<<<<<<<<<< NO! Only ever gives 1 char!
// FDone := Str[FIdx] = #0; <<<<<<<<<<<<<<<<<<< NO! #0 does NOT terminate Delphi strings
finally // <<<<<<<<<<<<<<< Try.. finally clause added to make sure FDone is always evaluated.
// <<<<<<<<<< Note there are better ways!
if FIdx > Length( Str ) then
begin
FDone := TRUE;
end;
end;
end;
You have got answers and comments that suggest using a state machine, and I support that fully. From the code you show in Edit1, I see that you still did not implement a state machine. From the comments I guess you don't know how to do that, so to push you in that direction here's one approach:
Define the states you need to work with:
type
TReadState = (ReadingIdle, ReadingText, ReadingInt, ReadingFloat);
// ReadingIdle, initial state or if no other state applies
// ReadingText, needed to deal with strings that includes digits (P7..)
// ReadingInt, state that collects the characters that form an integer
// ReadingFloat, state that collects characters that form a float
Then define the skeleton of your statemachine. To keep it as easy as possible I chose to use a straight forward procedural approach, with one main procedure and four subprocedures, one for each state.
procedure ParseString(const s: string; strings: TStrings);
var
ix: integer;
ch: Char;
len: integer;
str, // to collect characters which form a value
res: string; // holds a final value if not empty
State: TReadState;
// subprocedures, one for each state
procedure DoReadingIdle(ch: char; var str, res: string);
procedure DoReadingText(ch: char; var str, res: string);
procedure DoReadingInt(ch: char; var str, res: string);
procedure DoReadingFloat(ch: char; var str, res: string);
begin
State := ReadingIdle;
len := Length(s);
res := '';
str := '';
ix := 1;
repeat
ch := s[ix];
case State of
ReadingIdle: DoReadingIdle(ch, str, res);
ReadingText: DoReadingText(ch, str, res);
ReadingInt: DoReadingInt(ch, str, res);
ReadingFloat: DoReadingFloat(ch, str, res);
end;
if res <> '' then
begin
strings.Add(res);
res := '';
end;
inc(ix);
until ix > len;
// if State is either ReadingInt or ReadingFloat, the input string
// ended with a digit as final character of an integer, resp. float,
// and we have a pending value to add to the list
case State of
ReadingInt: strings.Add(str + ' (integer)');
ReadingFloat: strings.Add(str + ' (float)');
end;
end;
That is the skeleton. The main logic is in the four state procedures.
procedure DoReadingIdle(ch: char; var str, res: string);
begin
case ch of
'0'..'9': begin
str := ch;
State := ReadingInt;
end;
' ','.': begin
str := '';
// no state change
end
else begin
str := ch;
State := ReadingText;
end;
end;
end;
procedure DoReadingText(ch: char; var str, res: string);
begin
case ch of
' ','.': begin // terminates ReadingText state
str := '';
State := ReadingIdle;
end
else begin
str := str + ch;
// no state change
end;
end;
end;
procedure DoReadingInt(ch: char; var str, res: string);
begin
case ch of
'0'..'9': begin
str := str + ch;
end;
'.': begin // ok, seems we are reading a float
str := str + ch;
State := ReadingFloat; // change state
end;
' ',',': begin // end of int reading, set res
res := str + ' (integer)';
str := '';
State := ReadingIdle;
end;
end;
end;
procedure DoReadingFloat(ch: char; var str, res: string);
begin
case ch of
'0'..'9': begin
str := str + ch;
end;
' ','.',',': begin // end of float reading, set res
res := str + ' (float)';
str := '';
State := ReadingIdle;
end;
end;
end;
The state procedures should be self explaining. But just ask if something is unclear.
Both your test strings result in the values listed as you specified. One of your rules was a little bit ambiguous and my interpretation might be wrong.
numbers cannot be preceeded by a letter
The example you provided is "P7", and in your code you only checked the immediate previous character. But what if it would read "P71"? I interpreted it that "1" should be omitted just as the "7", even though the previous character of "1" is "7". This is the main reason for ReadingText
state, which ends only on a space or period.
Here's a solution using regex. I implemented it in Delphi (tested in 10.1, but should also work with XE8), I'm sure you can adopt it for lazarus, just not sure which regex libraries work over there. The regex pattern uses alternation to match numbers as integers or floats following your rules:
Integer:
(\b\d+(?![.\d]))
(?<![[:alnum:]])
instead)Float:
(\b\d+(?:\.\d+)?)
(?<![[:alnum:]])
instead)A simple console application looks like
program Test;
{$APPTYPE CONSOLE}
uses
System.SysUtils, RegularExpressions;
procedure ParseString(const Input: string);
var
Match: TMatch;
begin
WriteLn('---start---');
Match := TRegex.Match(Input, '(\b\d+(?![.\d]))|(\b\d+(?:\.\d+)?)');
while Match.Success do
begin
if Match.Groups[1].Value <> '' then
writeln(Match.Groups[1].Value + '(Integer)')
else
writeln(Match.Groups[2].Value + '(Float)');
Match := Match.NextMatch;
end;
WriteLn('---end---');
end;
begin
ParseString('There are test values: P7 45.826.53.91.7, .5, 66.. 4 and 5.40.3.');
ParseString('Anoth3r Te5.t string .4 abc 8.1Q 123.45.67.8.9');
ReadLn;
end.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With