While writing some recent scripts in cmd.exe, I had a need to use <code>findstr</code> with regular expressions - customer required standard cmd.exe commands (no GnuWin32 nor Cygwin nor VBS nor Powershell). I just wanted to know if a variable contained any upper-case characters and attempted to use: <pre class="prettyprint"><code>> set myvar=abc > echo %myvar%|findstr /r "[A-Z]" abc > echo %errorlevel% 0 </code></pre> When <code>%myvar%</code> is set to <code>abc</code>, that actually outputs the string and sets <code>errorlevel</code> to 0, saying that a match was found. However, the full-list variant: <pre class="prettyprint"><code>> echo %myvar%|findstr /r "[ABCDEFGHIJKLMNOPQRSTUVWXYZ]" > echo %errorlevel% 1 </code></pre> does not output the line and it correctly sets <code>errorlevel</code> to 1. In addition: <pre class="prettyprint"><code>> echo %myvar%|findstr /r "^[A-Z]*$" > echo %errorlevel% 1 </code></pre> also works as expected. I'm obviously missing something here even if it's only the fact that <code>findstr</code> is somehow broken. Why does the first (range) regex not work in this case? <hr> And yet more weirdness: <pre class="prettyprint"><code>> echo %myvar%|findstr /r "[A-Z]" abc > echo %myvar%|findstr /r "[A-Z][A-Z]" abc > echo %myvar%|findstr /r "[A-Z][A-Z][A-Z]" > echo %myvar%|findstr /r "[A]" </code></pre> The last two above also does not output the string!!

I believe this is mostly a horrible design flaw. We all expect the ranges to collate based on the ASCII code value. But they don't - instead the ranges are based on a collation sequence that nearly matches the default sequence used by SORT. EDIT -The exact collation sequence used by FINDSTR is now available at https://stackoverflow.com/a/20159191/1012053 under the section titled Regex character class ranges [x-y]. I prepared a text file containing one line for each extended ASCII character from 1 - 255, excluding 10 (LF), 13 (CR), and 26 (EOF on Windows). On each line I have the character, followed by a space, followed by the decimal code for the character. I then ran the file through SORT and captured the output in a sortedChars.txt file. I now can easily test any regex range against this sorted file and demonstrate how the range is determined by a collation sequence that is nearly the same as SORT. <pre class="prettyprint"><code>>findstr /nrc:"^[0-9]" sortedChars.txt 137:0 048 138:½ 171 139:¼ 172 140:1 049 141:2 050 142:² 253 143:3 051 144:4 052 145:5 053 146:6 054 147:7 055 148:8 056 149:9 057 </code></pre> The results are not quite what we expected in that chars 171, 172 and 253 are thrown in the mix. But the results make perfect sense. The line number prefix corresponds to the SORT collation sequence, and you can see that the range exactly matches according to the SORT sequence. Here is another range test that exactly follows the SORT sequence: <pre class="prettyprint"><code>>findstr /nrc:"^[!-=]" sortedChars.txt 34:! 033 35:" 034 36:# 035 37:$ 036 38:% 037 39:& 038 40:( 040 41:) 041 42:* 042 43:, 044 44:. 046 45:/ 047 46:: 058 47:; 059 48:? 063 49:@ 064 50:[ 091 51:\ 092 52:] 093 53:^ 094 54:_ 095 55:` 096 56:{ 123 57:| 124 58:} 125 59:~ 126 60:¡ 173 61:¿ 168 62:¢ 155 63:£ 156 64:¥ 157 65:₧ 158 66:+ 043 67:∙ 249 68:< 060 69:= 061 </code></pre> There is one small anomaly with alpha characters. Character "a" sorts between "A" and "Z" yet it does not match [A-Z]. "z" sorts after "Z", yet it matches [A-Z]. There is a corresponding problem with [a-z]. "A" sorts before "a", yet it matches [a-z]. "Z" sorts between "a" and "z", yet it does not match [a-z]. Here are the [A-Z] results: <pre class="prettyprint"><code>>findstr /nrc:"^[A-Z]" sortedChars.txt 151:A 065 153:â 131 154:ä 132 155:à 133 156:å 134 157:Ä 142 158:Å 143 159:á 160 160:ª 166 161:æ 145 162:Æ 146 163:B 066 164:b 098 165:C 067 166:c 099 167:Ç 128 168:ç 135 169:D 068 170:d 100 171:E 069 172:e 101 173:é 130 174:ê 136 175:ë 137 176:è 138 177:É 144 178:F 070 179:f 102 180:&fnof; 159 181:G 071 182:g 103 183:H 072 184:h 104 185:I 073 186:i 105 187:ï 139 188:î 140 189:ì 141 190:í 161 191:J 074 192:j 106 193:K 075 194:k 107 195:L 076 196:l 108 197:M 077 198:m 109 199:N 078 200:n 110 201:ñ 164 202:Ñ 165 203:ⁿ 252 204:O 079 205:o 111 206:ô 147 207:ö 148 208:ò 149 209:Ö 153 210:ó 162 211:º 167 212:P 080 213:p 112 214:Q 081 215:q 113 216:R 082 217:r 114 218:S 083 219:s 115 220:ß 225 221:T 084 222:t 116 223:U 085 224:u 117 225:û 150 226:ù 151 227:ú 163 228:ü 129 229:Ü 154 230:V 086 231:v 118 232:W 087 233:w 119 234:X 088 235:x 120 236:Y 089 237:y 121 238:ÿ 152 239:Z 090 240:z 122 </code></pre> And the [a-z] results <pre class="prettyprint"><code>>findstr /nrc:"^[a-z]" sortedChars.txt 151:A 065 152:a 097 153:â 131 154:ä 132 155:à 133 156:å 134 157:Ä 142 158:Å 143 159:á 160 160:ª 166 161:æ 145 162:Æ 146 163:B 066 164:b 098 165:C 067 166:c 099 167:Ç 128 168:ç 135 169:D 068 170:d 100 171:E 069 172:e 101 173:é 130 174:ê 136 175:ë 137 176:è 138 177:É 144 178:F 070 179:f 102 180:&fnof; 159 181:G 071 182:g 103 183:H 072 184:h 104 185:I 073 186:i 105 187:ï 139 188:î 140 189:ì 141 190:í 161 191:J 074 192:j 106 193:K 075 194:k 107 195:L 076 196:l 108 197:M 077 198:m 109 199:N 078 200:n 110 201:ñ 164 202:Ñ 165 203:ⁿ 252 204:O 079 205:o 111 206:ô 147 207:ö 148 208:ò 149 209:Ö 153 210:ó 162 211:º 167 212:P 080 213:p 112 214:Q 081 215:q 113 216:R 082 217:r 114 218:S 083 219:s 115 220:ß 225 221:T 084 222:t 116 223:U 085 224:u 117 225:û 150 226:ù 151 227:ú 163 228:ü 129 229:Ü 154 230:V 086 231:v 118 232:W 087 233:w 119 234:X 088 235:x 120 236:Y 089 237:y 121 238:ÿ 152 240:z 122 </code></pre> Sort sorts upper case before lower case. (EDIT - I just read the help for SORT and learned that it does not differentiate between upper and lower case. The fact that my SORT output consistently put upper before lower is probably a result of the order of the input.) But regex apparently sorts lower case before upper case. All of the following ranges fail to match any characters. <pre class="prettyprint"><code>>findstr /nrc:"^[A-a]" sortedChars.txt >findstr /nrc:"^[B-b]" sortedChars.txt >findstr /nrc:"^[C-c]" sortedChars.txt >findstr /nrc:"^[D-d]" sortedChars.txt </code></pre> Reversing the order finds the characters. <pre class="prettyprint"><code>>findstr /nrc:"^[a-A]" sortedChars.txt 151:A 065 152:a 097 >findstr /nrc:"^[b-B]" sortedChars.txt 163:B 066 164:b 098 >findstr /nrc:"^[c-C]" sortedChars.txt 165:C 067 166:c 099 >findstr /nrc:"^[d-D]" sortedChars.txt 169:D 068 170:d 100 </code></pre> There are additional characters that regex sorts differently than SORT, but I haven't got a precise list.

So if you want <ul> <li>only numbers : <code>FindStr /R "^[0123-9]*$"</code></li> <li>octal : <code>FindStr /R "^[0123-7]*$"</code></li> <li>hexadécimal : <code>FindStr /R "^[0123-9aAb-Cd-EfF]*$"</code></li> <li>alpha with no accent : <code>FindStr /R "^[aAb-Cd-EfFg-Ij-NoOp-St-Uv-YzZ]*$"</code></li> <li>alphanumeric : <code>FindStr /R "^[0123-9aAb-Cd-EfFg-Ij-NoOp-St-Uv-YzZ]*$"</code></li> </ul>

Why does findstr not handle case properly (in some circumstances)?

Tags:

regex

windows

batch-file

cmd

findstr

While writing some recent scripts in cmd.exe, I had a need to use findstr with regular expressions - customer required standard cmd.exe commands (no GnuWin32 nor Cygwin nor VBS nor Powershell).

I just wanted to know if a variable contained any upper-case characters and attempted to use:

> set myvar=abc
> echo %myvar%|findstr /r "[A-Z]"
abc
> echo %errorlevel%
0

When %myvar% is set to abc, that actually outputs the string and sets errorlevel to 0, saying that a match was found.

However, the full-list variant:

> echo %myvar%|findstr /r "[ABCDEFGHIJKLMNOPQRSTUVWXYZ]"
> echo %errorlevel%
1

does not output the line and it correctly sets errorlevel to 1.

In addition:

> echo %myvar%|findstr /r "^[A-Z]*$"
> echo %errorlevel%
1

also works as expected.

I'm obviously missing something here even if it's only the fact that findstr is somehow broken.

Why does the first (range) regex not work in this case?

And yet more weirdness:

> echo %myvar%|findstr /r "[A-Z]"
abc
> echo %myvar%|findstr /r "[A-Z][A-Z]"
abc
> echo %myvar%|findstr /r "[A-Z][A-Z][A-Z]"
> echo %myvar%|findstr /r "[A]"

The last two above also does not output the string!!

377

asked Apr 14 '10 07:04

paxdiablo

2 Answers

I believe this is mostly a horrible design flaw.

We all expect the ranges to collate based on the ASCII code value. But they don't - instead the ranges are based on a collation sequence that nearly matches the default sequence used by SORT. EDIT -The exact collation sequence used by FINDSTR is now available at https://stackoverflow.com/a/20159191/1012053 under the section titled Regex character class ranges [x-y].

I prepared a text file containing one line for each extended ASCII character from 1 - 255, excluding 10 (LF), 13 (CR), and 26 (EOF on Windows). On each line I have the character, followed by a space, followed by the decimal code for the character. I then ran the file through SORT and captured the output in a sortedChars.txt file.

I now can easily test any regex range against this sorted file and demonstrate how the range is determined by a collation sequence that is nearly the same as SORT.

>findstr /nrc:"^[0-9]" sortedChars.txt
137:0 048
138:½ 171
139:¼ 172
140:1 049
141:2 050
142:² 253
143:3 051
144:4 052
145:5 053
146:6 054
147:7 055
148:8 056
149:9 057

The results are not quite what we expected in that chars 171, 172 and 253 are thrown in the mix. But the results make perfect sense. The line number prefix corresponds to the SORT collation sequence, and you can see that the range exactly matches according to the SORT sequence.

Here is another range test that exactly follows the SORT sequence:

>findstr /nrc:"^[!-=]" sortedChars.txt
34:! 033
35:" 034
36:# 035
37:$ 036
38:% 037
39:& 038
40:( 040
41:) 041
42:* 042
43:, 044
44:. 046
45:/ 047
46:: 058
47:; 059
48:? 063
49:@ 064
50:[ 091
51:\ 092
52:] 093
53:^ 094
54:_ 095
55:` 096
56:{ 123
57:| 124
58:} 125
59:~ 126
60:¡ 173
61:¿ 168
62:¢ 155
63:£ 156
64:¥ 157
65:₧ 158
66:+ 043
67:∙ 249
68:< 060
69:= 061

There is one small anomaly with alpha characters. Character "a" sorts between "A" and "Z" yet it does not match [A-Z]. "z" sorts after "Z", yet it matches [A-Z]. There is a corresponding problem with [a-z]. "A" sorts before "a", yet it matches [a-z]. "Z" sorts between "a" and "z", yet it does not match [a-z].

Here are the [A-Z] results:

>findstr /nrc:"^[A-Z]" sortedChars.txt
151:A 065
153:â 131
154:ä 132
155:à 133
156:å 134
157:Ä 142
158:Å 143
159:á 160
160:ª 166
161:æ 145
162:Æ 146
163:B 066
164:b 098
165:C 067
166:c 099
167:Ç 128
168:ç 135
169:D 068
170:d 100
171:E 069
172:e 101
173:é 130
174:ê 136
175:ë 137
176:è 138
177:É 144
178:F 070
179:f 102
180:ƒ 159
181:G 071
182:g 103
183:H 072
184:h 104
185:I 073
186:i 105
187:ï 139
188:î 140
189:ì 141
190:í 161
191:J 074
192:j 106
193:K 075
194:k 107
195:L 076
196:l 108
197:M 077
198:m 109
199:N 078
200:n 110
201:ñ 164
202:Ñ 165
203:ⁿ 252
204:O 079
205:o 111
206:ô 147
207:ö 148
208:ò 149
209:Ö 153
210:ó 162
211:º 167
212:P 080
213:p 112
214:Q 081
215:q 113
216:R 082
217:r 114
218:S 083
219:s 115
220:ß 225
221:T 084
222:t 116
223:U 085
224:u 117
225:û 150
226:ù 151
227:ú 163
228:ü 129
229:Ü 154
230:V 086
231:v 118
232:W 087
233:w 119
234:X 088
235:x 120
236:Y 089
237:y 121
238:ÿ 152
239:Z 090
240:z 122

And the [a-z] results

>findstr /nrc:"^[a-z]" sortedChars.txt
151:A 065
152:a 097
153:â 131
154:ä 132
155:à 133
156:å 134
157:Ä 142
158:Å 143
159:á 160
160:ª 166
161:æ 145
162:Æ 146
163:B 066
164:b 098
165:C 067
166:c 099
167:Ç 128
168:ç 135
169:D 068
170:d 100
171:E 069
172:e 101
173:é 130
174:ê 136
175:ë 137
176:è 138
177:É 144
178:F 070
179:f 102
180:ƒ 159
181:G 071
182:g 103
183:H 072
184:h 104
185:I 073
186:i 105
187:ï 139
188:î 140
189:ì 141
190:í 161
191:J 074
192:j 106
193:K 075
194:k 107
195:L 076
196:l 108
197:M 077
198:m 109
199:N 078
200:n 110
201:ñ 164
202:Ñ 165
203:ⁿ 252
204:O 079
205:o 111
206:ô 147
207:ö 148
208:ò 149
209:Ö 153
210:ó 162
211:º 167
212:P 080
213:p 112
214:Q 081
215:q 113
216:R 082
217:r 114
218:S 083
219:s 115
220:ß 225
221:T 084
222:t 116
223:U 085
224:u 117
225:û 150
226:ù 151
227:ú 163
228:ü 129
229:Ü 154
230:V 086
231:v 118
232:W 087
233:w 119
234:X 088
235:x 120
236:Y 089
237:y 121
238:ÿ 152
240:z 122

Sort sorts upper case before lower case. (EDIT - I just read the help for SORT and learned that it does not differentiate between upper and lower case. The fact that my SORT output consistently put upper before lower is probably a result of the order of the input.) But regex apparently sorts lower case before upper case. All of the following ranges fail to match any characters.

>findstr /nrc:"^[A-a]" sortedChars.txt

>findstr /nrc:"^[B-b]" sortedChars.txt

>findstr /nrc:"^[C-c]" sortedChars.txt

>findstr /nrc:"^[D-d]" sortedChars.txt

Reversing the order finds the characters.

>findstr /nrc:"^[a-A]" sortedChars.txt
151:A 065
152:a 097

>findstr /nrc:"^[b-B]" sortedChars.txt
163:B 066
164:b 098

>findstr /nrc:"^[c-C]" sortedChars.txt
165:C 067
166:c 099

>findstr /nrc:"^[d-D]" sortedChars.txt
169:D 068
170:d 100

There are additional characters that regex sorts differently than SORT, but I haven't got a precise list.

107

answered Sep 18 '22 08:09

dbenham

So if you want

only numbers : FindStr /R "^[0123-9]*$"
octal : FindStr /R "^[0123-7]*$"
hexadécimal : FindStr /R "^[0123-9aAb-Cd-EfF]*$"
alpha with no accent : FindStr /R "^[aAb-Cd-EfFg-Ij-NoOp-St-Uv-YzZ]*$"
alphanumeric : FindStr /R "^[0123-9aAb-Cd-EfFg-Ij-NoOp-St-Uv-YzZ]*$"

answered Sep 20 '22 08:09

JLGautier

Related questions
                            
                                LNK2022 metadata operation: Inconsistent layout information in duplicated types
                            
                                How to clear variables after each batch script run?
                            
                                Allow selection in Explorer-style list view to start in the first column
                            
                                How to open a URL in Firefox\Chrome from command line in pop up mode?
                            
                                Windows Batch file - taskkill if window title contains text
                            
                                bash: nano: command not found at Windows git bash
                            
                                Force Chrome to close/re-open all TCP/TLS connections when profiling with the Network Panel
                            
                                Adding "Open Anaconda Prompt here" to context menu (Windows)
                            
                                Adding fonts in server core 2019ltsc container image
                            
                                Creating, opening and printing a word file from C++
                            
                                How do you add Start->Run shortcuts in Windows XP?
                            
                                Windows Backup for SVN Repositories
                            
                                Python: getting filename case as stored in Windows?
                            
                                Could Grand Central Dispatch (`libdispatch`) ever be made available on Windows?
                            
                                Visual Studio 2010 -- how to reduce its memory footprint
                            
                                Why does %TEMP% resolve to a non-deterministic path of the form %TEMP\<digit>?
                            
                                How to detect a process start & end using c# in windows?
                            
                                How to get Windows domain name?
                            
                                Inno Setup - Check if file exist in destination or else if doesn't abort the installation
                            
                                Max tcp/ip connections on Windows Server 2008

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With