<p>I want to store some data with a html form and Rebol cgi. My form looks like this:</p> <pre class="prettyprint"><code><form action="test.cgi" method="post" > Input: <input type="text" name="field"/> <input type="submit" value="Submit" /> </form> </code></pre> <p>But for unicode characters like Chinese, I get the encoded form of the data with percent signs, for instance <code>%E4%BA%BA</code>.</p> <p><em>(This is for the Chinese character "人" ... its UTF-8 form as a Rebol binary literal is <code>#{E4BABA}</code>)</em></p> <p>Is there a function in the system, or an existing library that can decode this directly? <code>dehex</code> does not appear to currently cover this case. I'm currently decoding this manually by removing the percent signs and constructing the corresponding binary, like this:</p> <pre class="prettyprint"><code>data: to-string read system/ports/input print data ;-- this prints "field=%E4%BA%BA" k-v: parse data "=" print k-v ;-- this prints ["field" "%E4%BA%BA"] v: append insert replace/all k-v/2 "%" "" "#{" "}" print v ;-- This prints "#{E4BABA}" ... a string!, not binary! ;-- LOAD will help construct the corresponding binary ;-- then TO-STRING will decode that binary from UTF-8 to character codepoints write %test.txt to-string load v </code></pre>

<p>I have a library called AltWebForm that en/decodes percent-encoded web form data:</p> <pre class="prettyprint"><code>do http://reb4.me/r3/altwebform load-webform "field=%E4%BA%BA" </code></pre> <p>The library is described here: Rebol and Web Forms.</p>

Is there a function to decode encoded unicode utf-8 string like from a form?

Tags:

forms

unicode

cgi

rebol

rebol3

I want to store some data with a html form and Rebol cgi. My form looks like this:

<form action="test.cgi" method="post" >

     Input:

     <input type="text" name="field"/>
     <input type="submit" value="Submit" />

</form>

But for unicode characters like Chinese, I get the encoded form of the data with percent signs, for instance %E4%BA%BA.

(This is for the Chinese character "人" ... its UTF-8 form as a Rebol binary literal is #{E4BABA})

Is there a function in the system, or an existing library that can decode this directly? dehex does not appear to currently cover this case. I'm currently decoding this manually by removing the percent signs and constructing the corresponding binary, like this:

data: to-string read system/ports/input
print data

;-- this prints "field=%E4%BA%BA"

k-v: parse data "="
print k-v

;-- this prints ["field" "%E4%BA%BA"]

v: append insert replace/all k-v/2 "%" "" "#{" "}"
print v

;-- This prints "#{E4BABA}" ... a string!, not binary!
;-- LOAD will help construct the corresponding binary
;-- then TO-STRING will decode that binary from UTF-8 to character codepoints

write %test.txt to-string load v

333

asked Aug 20 '13 09:08

Wayne Cui

2 Answers

I have a library called AltWebForm that en/decodes percent-encoded web form data:

do http://reb4.me/r3/altwebform
load-webform "field=%E4%BA%BA"

The library is described here: Rebol and Web Forms.

answered Oct 25 '22 02:10

rgchris

Looks to be related to ticket #1986, where it is discussed whether this is a "bug" or the Internet changing out from under its own spec:

Have DEHEX convert UTF-8 sequences from browsers as Unicode.

If you have specific experience on what has become standard in Chinese, and want to weigh in, that would be valuable.

Just as an aside, the specific case above could have been handled in PARSE alternately as:

key-value: {field=%E4%BA%BA}

utf8-bytes: copy #{}

either parse key-value [
    copy field-name to {=}
    skip
    some [
        and {%}
        copy enhexed-byte 3 skip (
            append utf8-bytes dehex enhexed-byte
        )
    ]
] [
    print [field-name {is} to string! utf8-bytes]
] [
    print {Malformed input.}
]

That will output:

field is 人

With some comments included:

key-value: {field=%E4%BA%BA}

;-- Generate empty binary value by copying an empty binary literal     
utf8-bytes: copy #{}

either parse key-value [

    ;-- grab field-name as the chars right up to the equals sign
    copy field-name to {=}

    ;-- skip the equal sign as we went up to it, without moving "past" it
    skip

    ;-- apply the enclosed rule SOME (non-zero) number of times
    some [
        ;-- match a percent sign as the immediate next symbol, without
        ;-- advancing the parse position
        and {%}

        ;-- grab the next three chars, starting with %, into enhexed-byte
        copy enhexed-byte 3 skip (

            ;-- If we get to this point in the match rule, this parenthesized
            ;-- expression lets us evaluate non-dialected Rebol code to 
            ;-- append the dehexed byte to our utf8 binary
            append utf8-bytes dehex enhexed-byte
        )
    ]
] [
    print [field-name {is} to string! utf8-bytes]
] [
    print {Malformed input.}
]

(Note also that "simple parse" is getting the axe in favor of enhancements to SPLIT. So writing code like parse data "=" can now be expressed instead as split data "=", or other cool variants if you check them out...samples are in the ticket.)

answered Oct 25 '22 02:10

HostileFork says dont trust SE

Related questions
                            
                                How to render my select field with WTForms?
                            
                                Firefox jQuery form submission not working
                            
                                How can I select a folder in a Django form?
                            
                                Django form validation with authenticated user as a field
                            
                                JavaScript - writing value into html form from script
                            
                                Creating Yii FormModel objects (CFormModel) dynamically
                            
                                How to post restricted access form in JMeter
                            
                                Change form's action attribute
                            
                                onClick JavaScript function inside HTML-form
                            
                                Call a function after a form is submitted using JavaScript / jQuery
                            
                                Unique constraint for repeated values
                            
                                need help merging old working jquery with new webform
                            
                                How to set two Events in One Button Click?
                            
                                How to define multi-company-aware models in OpenERP
                            
                                Django - How to get a form template to run (getting ' namespace not registered error ')
                            
                                Submitting form after using e.preventDefault();
                            
                                Laravel Form run JS function onsubmit
                            
                                Symfony2 Form Customization
                            
                                Prevent Form resubmission upon hitting back button
                            
                                IE11-Only Submit Bug

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With