Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing backslash (escape character) from a string

Tags:

ruby

I am trying to work on my own JSON parser. I have an input string that I want to tokenize:

input = "{ \"foo\": \"bar\", \"num\": 3}"

How do I remove the escape character \ so that it is not a part of my tokens?

Currently, my solution using delete works:

tokens = input.delete('\\"').split("")

=> ["{", " ", "f", "o", "o", ":", " ", "b", "a", "r", ",", " ", "n", "u", "m", ":", " ", "3", "}"]

However, when I try to use gsub, it fails to find any \".

tokens = input.gsub('\\"', '').split("")

=> ["{", " ", "\"", "f", "o", "o", "\"", ":", " ", "\"", "b", "a", "r", "\"", ",", " ", "\"", "n", "u", "m", "\"", ":", " ", "3", "}"]

I have two questions:

1. Why does gsub not work in this case?

2. How do I remove the backslash (escape) character? I currently have to remove the backslash character with the quotes to make this work.

like image 282
Huy Avatar asked Jul 03 '14 02:07

Huy


3 Answers

When you write:

input = "{ \"foo\": \"bar\", \"num\": 3}"

The actual string stored in input is:

{ "foo": "bar", "num": 3}

The escape \" here is interpreted by Ruby parser, so that it can distinguish between the boundary of a string (the left most and the right most "), and a normal character " in a string (the escaped ones).

String#delete deletes a character set specified the first parameter, rather than a pattern. All characters that is in the first parameter will be removed. So by writing

input.delete('\\"')

You got a string with all \ and " removed from input, rather than a string with all \" sequence removed from input. This is wrong for your case. It may cause unexpected behavior some time later.

String#gsub, however, substitute a pattern (either regular expression or plain string).

input.gsub('\\"', '')

means find all \" (two characters in a sequence) and replace them with empty string. Since there isn't \ in input, nothing got replaced. What you need is actually:

input.gsub('"', '')
like image 128
Arie Xiao Avatar answered Oct 16 '22 07:10

Arie Xiao


You do not have backslashes in your string. You have quotes in your string, which need to be escaped when placed in a double-quoted string. Look:

input = "{ \"foo\": \"bar\", \"num\": 3}"
puts input
# => { "foo": "bar", "num": 3}

You are removing - phantoms.

input.delete('\\"')

will delete any characters in its argument. Thus, you delete any non-existent backslashes, and also delete all quotes. Without quotes, the default display method (inspect) will not need to escape anything.

input.gsub('\\"', '')

will try to delete the sequence \", which does not exist, so gsub ends up doing nothing.

Make sure you know what the difference between string representation (puts input.inspect) and string content (puts input) is, and note the backslashes as the artifacts of the representation.

That said, I have to echo emaillenin: writing a correct JSON parser is not simple, and you can't do it with regular expressions (or at least, not with regular regular expressions; it might be possible with Oniguruma). It needs a proper parser like treetop or rex/racc, since it has a lot of corner cases that are easy to miss (chief among them being, ironically, escaped characters).

like image 29
Amadan Avatar answered Oct 16 '22 07:10

Amadan


input.gsub(/[\"]/,"") will also work.

like image 6
Dan Avatar answered Oct 16 '22 09:10

Dan