I know about C and I am entering into Java and confused about its approach towards arrays and strings. It's totally different from arrays and strings in C. Please help me understand what is actually the difference between C and Java (for strings and arrays).
An array is a data structure consisting of a collection of elements (values or variables), each identified by at least one array index or key. Depending on the language, array types may overlap (or be identified with) other data types that describe aggregates of values, such as lists and strings.
An array is a collection of similar types of data. For example, if we want to store the names of 100 people then we can create an array of the string type that can store 100 names. String[] array = new String[100]; Here, the above array cannot store more than 100 names.
There are three different kinds of arrays: indexed arrays, multidimensional arrays, and associative arrays.
An array is a linear data structure that collects elements of the same data type and stores them in contiguous and adjacent memory locations. Arrays work on an index system starting from 0 to (n-1), where n is the size of the array.
Arrays in C are simply syntactic sugar to access contiguous memory spaces, or - vulgarizing it shamelessly here - a variant of a pointer notation. To avoid allocating big chunks of contiguous memory and avoid having to reallocate your memory yourself manipulating data of variable size, you then resort to implementations of common Computer Science Data Structure concepts (for instance, a linked list, which uses a pointer to indicate the memory address of the next element in a series).
You can substitute pointer arithmetic with array notations in C, and vice versa.
The following will print the 5 elements of an array using different access methods:
#include <stdio.h>
int main(int ac, char **av) {
char arr[2] = {'a', 'b'};
printf("0:%c 0:%c 1:%c 1:%c\n", arr[0], *arr, arr[1], *(arr + 1));
return (0);
}
The following will be valid with int variables. Notice the slight modification to accomodate for the size of an integer:
#include <stdio.h>
int main(int ac, char **av) {
int arr[2] = {42, -42};
printf("0:%d 0:%d 1:%d 1:%d\n", arr[0], *arr, arr[1], *(arr + 4));
return (0);
}
(To obtain the size of a given data type, resort to the use of sizeof.)
Here I assume you want to know about the conventional C-string implementation, and not one provided by a 3rd-party library.
Strings in C are basically simply arrays of characters. The main reason for this is obvious: as you need to often manipulate strings and print them to a stream, using a contiguous memory space makes sense and is an easy implementation. However, as you need to remember the size of your contiguous memory space to not inadvertently access something forbidden, we rely on the concept of a "NULL-terminated string", meaning a string of N characters is a actually an array of N + 1 characters terminated by a trailing '\0' character, which is used as the de-facto character to look for when you want to reach the end of a string.
A straightforward declaration would be:
char *test = "my test";
which would be equivalent to:
char test[8] = { 'm', 'y', ' ', 't', 'e', 's', 't', '\0' };
(Notice the trailing '\0')
However, you have to realize that in that case, the string "my test" is static, and that's the memory space you are directly pointing to. Which means you will encounter issues when trying to dynamically modify it.
For instance, this would blow up in your face (following thee previous declaration):
test[4] = 'H'; /* expect a violent complaint here */
So to have a string you can actually modify you can declare a string simply as:
#include <stdio.h>
#include <stdlib.h>
int main(int ac, char **av) {
char *test = strdup("my test");
printf("%s\n", test);
return (0);
}
Where strdup() is a function of the C standard library allocating memory for your string and injecting the characters in there. Or you can allocate memory yourself with malloc() and copy characters manually or with a function like strcpy().
This particular declaration is thus mutable, and your are free to modify the content of the string (which in the end is just a dynamically allocated array of characters, allocated with malloc()).
If you need to change the length of this string (add/remove characters to/from it), you will need to everytime be wary of the allocated memory. For instance, calling strcat() will fail if you haven't reallocated some additional memory first. Some functions, however, will take care of this for you.
The C string does NOT support Unicode by default. You need to implement to manage code points yourself, or consider using 3rd-party library.
Arrays in Java are very close to their C parent (to the point that we even have a method for efficient array-to-array-copy support using a bare-bone native implementation: System.arraycopy()). They represent contiguous memory spaces.
However, they wrap these bare-bone arrays within an object (which keeps track of the size/length of the array for you).
Java arrays can have their content modified, but like their C counterpart, you will need to allocate more memory when trying to expand them (except you do it indirectly, and will usually reallocate a complete array instead of doing a realloc() like in C).
Strings in Java are immutable, meaning they cannot be changed, once initialized, and operations on String actually create new String instances. Look up StringBuilder and StringBuffer for efficient string manipulation with an existing instance, and beware of their internal implementation details (especially when it comes to pre-setting the capacity of your buffer efficiently, to avoid frequent re-allocations).
for instance, the following code uses produces a 3rd String instance out of someString and "another string":
String myNewStr = someString + "another string";
In the underlying implementation, the Java String* classes also use an arrays of characters, like their C parent.
This implies that they use more memory than the bare-bone C implementation, as you have the overhead of your instance.
Not only that, they actually use a lot more memory because the Java String class provides Unicode support by default, meaning it allows for multiple code points per character (which is not a trivial thing to do in C, in comparison).
On the other, notice that except if considering performance, you don't need to worry about threading, memory, and implementing functions looking for trailing '\0' characters.
A lot more could be said and researched. Your question is fairly broad at the moment, but I'll be glad to edit if you add sub-questions in your comments.
Also, maybe this could help:
In C, a string is typically just an array of (or a pointer to) chars, terminated with a NUL (\0) character. You can process a string as you would process any array.
In Java, however, strings are not arrays. Java strings are instances (objects) of the java.lang.String
class. They represent character data, but the internal implementation is not exposed to the programmer. You cannot treat them as arrays, although, if required, you can extract string data as an array of bytes or chars (methods getBytes
and getChars
). Note also that Java chars are 16-bits, always, while chars in C are typically (not always) 8-bit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With