How to write a Pandas Dataframe to existing Django model

Tags:

I am trying to insert data in a Pandas DataFrame into an existing Django model, Agency, that uses a SQLite backend. However, following the answers on How to write a Pandas Dataframe to Django model and Saving a Pandas DataFrame to a Django Model leads to the whole SQLite table being replaced and breaking the Django code. Specifically, it is the Django auto-generated id primary key column that is replaced by index that causes the errors when rendering templates (no such column: agency.id).

Here is the code and the result of using Pandas to_sql on the SQLite table, agency.

In models.py:

class Agency(models.Model):
    name = models.CharField(max_length=128)

In myapp/management/commands/populate.py:

class Command(BaseCommand):

def handle(self, *args, **options):

    # Open ModelConnection
    from django.conf import settings
    database_name = settings.DATABASES['default']['NAME']
    database_url = 'sqlite:///{}'.format(database_name)
    engine = create_engine(database_url, echo=False)

    # Insert data data
    agencies = pd.DataFrame({"name": ["Agency 1", "Agency 2", "Agency 3"]})
    agencies.to_sql("agency", con=engine, if_exists="replace")

Calling 'python manage.py populate' successfully adds the three agencies into the table:

index    name
0        Agency 1
1        Agency 2
2        Agency 3

However, doing so has changed the DDL of the table from:

CREATE TABLE "agency" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "name" varchar(128) NOT NULL)

to:

CREATE TABLE agency (
  "index" BIGINT, 
  name TEXT
);
CREATE INDEX ix_agency_index ON agency ("index")

How can I add the DataFrame to the model managed by Django and keep the Django ORM intact?

702

asked Jan 06 '17 14:01

Greg Brown

2 Answers

To answer my own question, as I import data using Pandas into Django quite often nowadays, the mistake I was making was trying to use Pandas built-in Sql Alchemy DB ORM which was modifying the underlying database table definition. In the context above, you can simply use the Django ORM to connect and insert the data:

from myapp.models import Agency

class Command(BaseCommand):

    def handle(self, *args, **options):

        # Process data with Pandas
        agencies = pd.DataFrame({"name": ["Agency 1", "Agency 2", "Agency 3"]})

        # iterate over DataFrame and create your objects
        for agency in agencies.itertuples():
            agency = Agency.objects.create(name=agency.name)

However, you may often want to import data using an external script rather than using a management command, as above, or using Django's shell. In this case you must first connect to the Django ORM by calling the setup method:

import os, sys

import django
import pandas as pd

sys.path.append('../..') # add path to project root dir
os.environ["DJANGO_SETTINGS_MODULE"] = "myproject.settings"

# for more sophisticated setups, if you need to change connection settings (e.g. when using django-environ):
#os.environ["DATABASE_URL"] = "postgres://myuser:mypassword@localhost:54324/mydb"

# Connect to Django ORM
django.setup()

# process data
from myapp.models import Agency
Agency.objects.create(name='MyAgency')

Here I have exported my settings module myproject.settings to the DJANGO_SETTINGS_MODULE so that django.setup() can pick up the project settings.
Depending on where you run the script from, you may need to path to the system path so Django can find the settings module. In this case, I run my script two directories below my project root.
You can modify any settings before calling setup. If your script needs to connect to the DB differently than whats configured in settings. For example, when running a script locally against Django/postgres Docker containers.

Note, the above example was using the django-environ to specify DB settings.

159

answered Oct 16 '22 18:10

Greg Brown

For those looking for a more performant and up-to-date solution, I would suggest using manager.bulk_create and instantiating the django model instances, but not creating them.

model_instances = [Agency(name=agency.name) for agency in agencies.itertuples()]
Agency.objects.bulk_create(model_instances)

Note that bulk_create does not run signals or custom saves, so if you have custom saving logic or signal hooks for Agency model, that will not be triggered. Full list of caveats below.

Documentation: https://docs.djangoproject.com/en/3.0/ref/models/querysets/#bulk-create

answered Oct 16 '22 17:10

jorf.brunning

Related questions
                            
                                Python path as a string [closed]
                            
                                Stuffing a pandas DataFrame.plot into a matplotlib subplot
                            
                                Memory-aware LRU caching in Python?
                            
                                Pandas - Delete Rows with only NaN values
                            
                                Python AttributeError: 'module' object has no attribute 'connect'
                            
                                Datetime Timezone conversion using pytz
                            
                                Regex, select closest match
                            
                                How can I share a class between processes?
                            
                                How do you add error bars to Bokeh plots in python?
                            
                                Difference(s) between scipy.stats.linregress, numpy.polynomial.polynomial.polyfit and statsmodels.api.OLS
                            
                                Find the year with the most number of people alive in Python
                            
                                Curl POST request into pycurl code
                            
                                Python3 threading with uWSGI
                            
                                One object two foreign keys to the same table
                            
                                How does Pandas to_sql determine what dataframe column is placed into what database field?
                            
                                How to avoid NLTK's sentence tokenizer splitting on abbreviations?
                            
                                Using generator send() within a for loop
                            
                                Python Selenium Exception AttributeError: "'Service' object has no attribute 'process'" in selenium.webdriver.ie.service.Service
                            
                                Python Pandas Drop Duplicates keep second to last
                            
                                Result of -1%7 is different in javascript(-1) and python(6)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to write a Pandas Dataframe to existing Django model

Tags:

python

sqlite

pandas

django

Greg Brown

People also ask

2 Answers

Greg Brown

jorf.brunning

Recent Activity

Donate For Us