Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a pandas DataFrame from multiple dicts [duplicate]

Tags:

pandas

I'm new to pandas and that's my first question on stackoverflow, I'm trying to do some analytics with pandas.

I have some text files with data records that I want to process. Each line of the file match to a record which fields are in a fixed place and have a length of a fixed number of characters. There are different kinds of records on the same file, all records share the first field that are two characters depending of the type of record. As an example:

Some file: 01Jhon      Smith     555-1234                                         03Cow            Bos primigenius taurus        00401                   01Jannette  Jhonson           00100000000                              ...   field    start  length    type         1       2   *common to all records, example: 01 = person, 03 = animal name         3      10 surname     13      10 phone       23       8 credit      31      11 fill of spaces 

I'm writing some code to convert one record to a dictionary:

person1 = {'type': 01, 'name': = 'Jhon', 'surname': = 'Smith', 'phone': '555-1234'} person2 = {'type': 01, 'name': 'Jannette', 'surname': 'Jhonson', 'credit': 1000000.00} animal1 = {'type': 03, 'cname': 'cow', 'sciname': 'Bos....', 'legs': 4, 'tails': 1 } 

If a field is empty (filled with spaces) there will not be in the dictionary).

With all records of one kind I want to create a pandas DataFrame with the dicts keys as columns names, I've try with pandas.DataFrame.from_dict() without success.

And here comes my question: Is any way to do this with pandas so dict keys become column names? Are any other standard method to deal with this kind of files?

like image 249
tinproject Avatar asked Jul 19 '13 17:07

tinproject


1 Answers

To make a DataFrame from a dictionary, you can pass a list of dictionaries:

>>> person1 = {'type': 01, 'name': 'Jhon', 'surname': 'Smith', 'phone': '555-1234'} >>> person2 = {'type': 01, 'name': 'Jannette', 'surname': 'Jhonson', 'credit': 1000000.00} >>> animal1 = {'type': 03, 'cname': 'cow', 'sciname': 'Bos....', 'legs': 4, 'tails': 1 } >>> pd.DataFrame([person1])    name     phone surname  type 0  Jhon  555-1234   Smith     1 >>> pd.DataFrame([person1, person2])     credit      name     phone  surname  type 0      NaN      Jhon  555-1234    Smith     1 1  1000000  Jannette       NaN  Jhonson     1 >>> pd.DataFrame.from_dict([person1, person2])     credit      name     phone  surname  type 0      NaN      Jhon  555-1234    Smith     1 1  1000000  Jannette       NaN  Jhonson     1 

For the more fundamental issue of two differently-formatted files intermixed, and assuming the files aren't so big that we can't read them and store them in memory, I'd use StringIO to make an object which is sort of like a file but which only has the lines we want, and then use read_fwf (fixed-width-file). For example:

from StringIO import StringIO  def get_filelike_object(filename, line_prefix):     s = StringIO()     with open(filename, "r") as fp:         for line in fp:             if line.startswith(line_prefix):                 s.write(line)     s.seek(0)     return s 

and then

>>> type01 = get_filelike_object("animal.dat", "01") >>> df = pd.read_fwf(type01, names="type name surname phone credit".split(),                       widths=[2, 10, 10, 8, 11], header=None) >>> df    type      name  surname     phone     credit 0     1      Jhon    Smith  555-1234        NaN 1     1  Jannette  Jhonson       NaN  100000000 

should work. Of course you could also separate the files into different types before pandas ever sees them, which might be easiest of all.

like image 198
DSM Avatar answered Sep 28 '22 19:09

DSM