Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Segmentation fault occur while reading content from file in object C++

In my code first I stored the name and mobile number in one object after that I write that object into one text file using fstream.write() method. It successfully works, but when I read that written content into another object and call the display method then it displays data correctly, but it gives me segmentation fault after printing data. Here is my code -

#include<iostream>
#include<fstream>
using namespace std;
class Telephone
{
private:
    string name="a";
    int phno=123;
public:
    void getTelephoneData()
    {
        cout<<"Enter Name:";
        cin>>name;
        cout<<"Enter Phone Number:";
        cin>>phno;
    }
    void displayData()
    {
        cout<<"Name\t\tPhone no"<<endl;
        cout<<name<<"\t\t"<<phno<<endl;
    }

    void getData() {
        Telephone temp;
        ifstream ifs("Sample.txt",ios::in|ios::binary);
        ifs.read((char*)&temp,sizeof(temp));
        temp.displayData();
    }   
};
int main()
{
    Telephone t1;
    t1.getTelephoneData();
    cout<<"----Writing Data to file------"<<endl;
    ofstream ofs("Sample.txt",ios::out|ios::binary);
    ofs.write((char*)&t1,sizeof(t1));
    ofs.close();
    t1.getData();
}

Please help me where I'm wrong. Thanks in advance...!

like image 428
Rohit Suthar Avatar asked Jan 02 '23 18:01

Rohit Suthar


2 Answers

So, before I give you a solution, let's briefly talk about what is going on here:

ofs.write((char*)&t1,sizeof(t1));

What you are doing is casting t1 to a pointer to char, and saying 'write to ofs the memory representation of t1, as is'. So we have to ask ourselves: what is this memory representation of t1?

  1. You are storing a (implementation defined, most probably 4 byte) integer
  2. You are also storing a complex std::string object.

Writing the 4 byte integer might be OK. It is definitely not portable (big-endian vs little endian), and you might end up with the wrong int if the file is read on a platform with different endianness.

Writing the std::string is definitely not OK. Strings are complex object, and they most often allocate storage on the heap (although there is such a thing as small string optimization). What this means is that you're going to serialize a pointer to a dynamically allocated object. This will never work, as reading the pointer back would point to some location in memory that you have absolutely no control of. This is a great example of undefined behavior. Anything goes, and anything might happen with your program, including 'appearing to work correct' despite deeply seated problems. In your specific example, because the Telephone object that was created is still in memory, what you get is 2 pointers to the same dynamically allocated memory. When your temp object goes out of scope, it deletes that memory.

When you return to your main function, when t1 goes out of scope, it tries to delete the same memory again.

Serializing any kind of pointers is a big no-no. If your object internals consist of pointers, you need to make a custom solution of how those pointers will be stored in your stream, and later read to construct a new object. A common solution is to store them 'as if' they were stored by value, and later, when reading the object from storage, allocate memory dynamically and put the contents of the object in the same memory. This will obviously not work if you are trying to serialize the case where multiple objects point to the same address in memory: if you try to apply this solution, you would end up with multiple copies of your original object.

Fortunately, for the case of a std::string this problem is easily solved, as strings have overloaded operator<< and operator>>, and you don't need to implement anything to make them work.

edit: Just using operator<< and operator>> won't work for std::string, explained a bit later why.

How to make it work:

There are many possible solutions, and I'm going to share one here. The basic idea is that you should serialize every member of your Telephone structure individually, and rely on the fact that every member know how to serialize itself. I am going to ignore the problem of cross-endianness compatibility, to make the answer a bit briefer, but if you care about cross platform compatibility, you should think about it.

My basic approach is to override operator<< and operator>> for the class telephone.

I declare two free functions, that are friends of the Telephone class. This would allow them to poke at the internals of different telephone objects to serialize their members.

class Telephone { 
   friend ostream& operator<<(ostream& os, const Telephone& telephone);
   friend istream& operator>>(istream& is, Telephone& telephone);
   // ... 
};

edit: I initially had the code for serializing the strings wrong, so my comment that it's fairly straightforward is plain-out wrong

The code for implementing the functions has a surprising twist. Because operator>> for strings stops reading from the stream when encountering a whitespace, having a name that is not a single word, or with special characters would not work, and put the stream in a state of error, failing to read the phone number. To go around the problem, i followed the example by @Michael Veksler and stored the length of the string explicitly. My implementation looks as follows:

ostream& operator<<(ostream& os, const Telephone& telephone)
{
    const size_t nameSize = telephone.name.size();
    os << nameSize;
    os.write(telephone.name.data(), nameSize);
    os << telephone.phno;
    return os;
}

istream& operator>>(istream& is, Telephone& telephone)
{
    size_t nameSize = 0;
    is >> nameSize;
    telephone.name.resize(nameSize);
    is.read(&telephone.name[0], nameSize);
    is >> telephone.phno;
    return is;
}

Please note, that you must make sure that the data you write matches the data you're going to later try to read. If you store a different amount of information, or the arguments are in the wrong order, you will not end up with a valid object. If you later do any kind of modifications to the Telephone class, by adding new fields that you would want saved, you'll need to modify both functions.

To support names with spaces in them, the way you read the names from cin should be modified as well. One way would be to use std::getline(std::cin, name); instead of cin >> name

Finally, how you should serialize and deserialize from those streams: Don't use the ostream::write() and istream::read() functions - use instead the operator<< and operator>> that we have overriden.

void getData() {
    Telephone temp;
    ifstream ifs("Sample.txt",ios::in|ios::binary);
    ifs >> temp;
    temp.displayData();
} 

void storeData(const Telephone& telephone) {
    ofstream ofs("Sample.txt",ios::out|ios::binary);
    ofs << telephone;
}
like image 194
divinas Avatar answered Jan 04 '23 16:01

divinas


problem

You can't simply dump std::string objects into a file. Notice that std::string is defined as

std::basic_string<char, std::char_traits<char>, std::allocator<char>>

When std::string can't avoid it, it uses std::allocator<char> to allocate heap memory for the string. By writing your Telephone object with ofs.write((char*)&t1,sizeof(t1)), you are also writing the std::string it contains as a collection of bits. Some of these std::string bits can be pointers obtained from std::allocator. These pointers point to heap memory which contains the characters of the string.

By calling ofs.write() the program writes the pointers, but not the characters. Then when the string is read with ifs.read(), it has pointers to unallocated heap, that does not contain the characters. Even if it did point to a valid heap, by some miracle, it would still not contain the characters it should have. Sometimes you may be lucky, and the program will not crash because the strings were short enough to avoid heap allocation, but it is completely unreliable.

solution

You must write your own serialization code for this class, rather than rely on ofs.write(). There are several ways to do it. First of all, you can use boost serialization. You can simply follow the examples in the linked tutorial and make serialization work for you.

Another option is to do everything from scratch yourself. Of course, it is better to use existing code (like boost), but it could be a good learning experience to implement it yourself. By implementing yourself, you can better understand what boost might do under the hood:

void writeData(std::ostream & out) const {
    unsigned size = name.size();
    out.write((char*)&size, sizeof(size));
    out.write(name.data(), size);

    out.write((char*)&phno, sizeof(phno));
}   

Then in getData read it in the same order. Of course, you have to allocate the string dynamically to the correct size, and then fill it with ifs.read().

Unlike with operator<< for strings, this technique works well for any type of string. It will work well with strings that contain any character, including spaces and null characters (\0). The operator>> technique won't work with strings that have spaces, such as first-name last-name combinations, since it stops at white-spaces.


Note that there are specialized allocators that make serialization/deserialization trivial. Such an allocator can allocate the string from inside a pre-allocated buffer, and use fancy pointers (such as boost::interprocess::offset_ptr) into this buffer. It is then possible to simply dump the whole buffer and reread it later with no hassle. For some reason, this approach is not commonly used.


Critical:

Security is an issue. If the data is not under your control, then it can be used to hack your system. In my serialization example, the worse that can be is to run out of memory. Running out of memory could be an attack vector for denial of service, or worse. Maybe you should limit the max-size of the string, and manage the error.

Another thing to consider is interoperability across systems. Not all systems represent int or long the same way. For example, on 64-bit linux, long is 8 bytes, while on MS-Windows it is 4 bytes. The simplest solution is to use out<<size<<' ' to write the size, but make sure to use C locale, otherwise four digit lengths can have commas or dots in them, which will spoil the parsing.

like image 29
Michael Veksler Avatar answered Jan 04 '23 17:01

Michael Veksler