Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java - removing \u0000 from an String

I'm using the Twitter API and I have the following string that is bugging me Proyecto de ingeniera comercial, actual Profesora de matemáticas \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000Enseña Chile
I want to store that in PostgreSql, but \u0000 is not accepted, so I want to replace it.
I try to use string= string.replaceAll("\\u0000", ""); but it doesn't work. I just get the following

String json = TwitterObjectFactory.getRawJSON(user);
System.out.println(json);
json = json.replaceAll("\\u0000", "");
System.out.println(json);

The output (only the part that matters)

Proyecto de ingeniera comercial, actual Profesora de matemáticas \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000Enseña Chile
Proyecto de ingeniera comercial, actual Profesora de matemáticas \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000Enseña Chile

If I put that part in an String in java the replacement works, but if I put it in an text file or I read it directly for Twitter it doesnt work
So my question is, How do I replace \u0000 from an string?
By the way, the full string is this

{"utc_offset":null,"friends_count":83,"profile_image_url_https":"https://pbs.twimg.com/profile_images/2636139584/3a8455cd94045fa6980402add14796a9_normal.jpeg","listed_count":1,"profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","default_profile_image":false,"favourites_count":0,"description":"Proyecto de ingeniera comercial, actual Profesora de matemáticas \u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000Enseña Chile","created_at":"Sat May 28 14:24:06 +0000 2011","is_translator":false,"profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","protected":false,"screen_name":"Fsquadritto","id_str":"306825274","profile_link_color":"0084B4","is_translation_enabled":false,"id":306825274,"geo_enabled":false,"profile_background_color":"C0DEED","lang":"es","profile_sidebar_border_color":"C0DEED","profile_location":null,"profile_text_color":"333333","verified":false,"profile_image_url":"http://pbs.twimg.com/profile_images/2636139584/3a8455cd94045fa6980402add14796a9_normal.jpeg","time_zone":null,"url":null,"contributors_enabled":false,"profile_background_tile":false,"entities":{"description":{"urls":[]}},"statuses_count":2,"follow_request_sent":false,"followers_count":36,"profile_use_background_image":true,"default_profile":true,"following":false,"name":"Fiorella Squadritto","location":"","profile_sidebar_fill_color":"DDEEF6","notifications":false,"status":{"in_reply_to_status_id_str":null,"in_reply_to_status_id":null,"possibly_sensitive":false,"coordinates":null,"created_at":"Fri Oct 12 17:40:35 +0000 2012","truncated":false,"in_reply_to_user_id_str":null,"source":"<a href=\"http://instagram.com\" rel=\"nofollow\">Instagram<\/a>","retweet_count":1,"retweeted":false,"geo":null,"in_reply_to_screen_name":null,"entities":{"urls":[{"display_url":"instagr.am/p/QsOQxTNfvQ/","indices":[49,69],"expanded_url":"http://instagr.am/p/QsOQxTNfvQ/","url":"http://t.co/GKziME7N"}],"hashtags":[{"indices":[24,34],"text":"eduinnova"}],"user_mentions":[{"indices":[35,47],"screen_name":"ensenachile","id_str":"57099132","name":"Enseña Chile","id":57099132}],"symbols":[]},"id_str":"256811615171792896","in_reply_to_user_id":null,"favorite_count":1,"id":256811615171792896,"text":"Amando las matemáticas! #eduinnova @ensenachile  http://t.co/GKziME7N","place":null,"contributors":null,"lang":"es","favorited":false}}
like image 273
FeanDoe Avatar asked Mar 11 '15 14:03

FeanDoe


People also ask

How do you remove null characters from a string in Java?

Solution 1temp = temp. Replace("\0", string. Empty); will remove the null characters.

What is \u0000 in Java?

The minimum value char can hold is 'u0000' which is a Unicode value denoting 'null' or 0 in decimal.


2 Answers

string = string.replace("\u0000", ""); // removes NUL chars
string = string.replace("\\u0000", ""); // removes backslash+u0000

The character with u-escaping is done on java source level. For instance "class" is:

public \u0063lass C {

Also you do not need regex.

like image 94
Joop Eggen Avatar answered Oct 03 '22 10:10

Joop Eggen


The first argument to replaceAll is a regular expression, and the Java regex engine understands \uNNNN escapes so

json.replaceAll("\\u0000", "")

will search for the regular expression \u0000, which matches instances of the Unicode NUL character (U+0000), not instances of the actual string \u0000. If you want to match the string \u0000 then you need to use the regular expression \\u0000, which in turn means the Java string literal "\\\\u0000"

json.replaceAll("\\\\u0000", "")

Or more simply, use replace (whose first argument is a literal string rather than a regex) instead of replaceAll

json.replace("\\u0000", "")
like image 40
Ian Roberts Avatar answered Oct 03 '22 11:10

Ian Roberts