I am working on getting some song lyrics using an API, and converting the lyrics string into an array of words. I am getting some unusual behaviors in preg_replace function. When I did some debugging using var_dump, I see that var_dump returns a value of 10 for the string "you", which tells me that there might be something wrong. After that preg_replace acts weirdly.
This is my code:
$source = get_chart_lyrics_data("madonna","frozen");
$pieces = explode("\n", $source);
$lyrics = array();
for($i=0;$i<count($pieces);$i++){
if($i>10){
$words = explode(" ",$pieces[$i]);
foreach($words as $_word){
if($_word=="")
continue;
var_dump($_word);
$word = strtolower($_word);
var_dump($word);
$word = trim($word);
var_dump($word);
$word = preg_replace("/[^A-Za-z ]/", '', $word);
var_dump($word);
$lyrics[$word]++;
}
}
}
This is the first 4 lines this code returns:
string(10) “You”
string(10) “you”
string(10) “you”
string(8) “lyricyou”
How come var_dump is returning a value of 10 for "you"? And why preg_replace is acting like that?
Thanks.
The php var_dump () function returns a data structure which includes information about the passed variables type and value. With var_dump (), you can see the type and values of strings, ints, floats, arrays and objects. Below is an example of using var_dump () to output different types of variables.
As Bryan said, it is possible to capture var_dump () output to a string. But it's not quite exact if the dumped variable contains HTML code. I wrote this dandy little function for using var_dump () on HTML documents so I don't have to view the source. The second parameter lets you specify the height of the box.
The second parameter lets you specify the height of the box. Default is 9em, but if you're expecting a huge output you'll probably want a higher value. Happy var_dumping. Be careful this outputs to stdout stream (1) instead of the proper stderr stream (2). * Better GI than print_r or var_dump -- but, unlike var_dump, you can only dump one variable.
$b = "Hello world!"; The var_dump () function dumps information about one or more variables. The information holds type and value of the variable (s). var_dump ( var1, var2, ...); var1, var2, ...
The likeliest answer is that the string contains non-printable characters beyond "you". To figure out what exactly it contains, you'll have to look at the raw bytes. Do this with echo bin2hex($word)
. This outputs a string like 666f6f...
, where every 2 characters are one byte in hexadecimal notation. You may make that more readable with something like:
echo join(' ', str_split(bin2hex($word), 2));
// 66 6f 6f ...
Now use your favourite ASCII/Unicode table (depending on the encoding of the string) to figure out what individual characters those represent and where you got them from.
Perhaps your string is encoded in UTF-16, in which case you should see telltale 00
bytes every two characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With