Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a Repo in an Ecto migration

I've got an Ecto migration where I'd like to modify some columns but also migrate some data. For example:

import Ecto.Query

defmodule MyApp.Repo.Migrations.AddStatus do
  alter table(:foo) do
    add(:status, :text)
  end

  foos = from(f in MyApp.Foo, where: ...)
         |> MyApp.Repo.all

  Enum.each(foos, fn(foo) ->
    # There's then some complex logic here to work 
    # out how to set the status based on other attributes of `foo`
  end)
end

Now, the problem here is that by calling MyApp.Repo.all the migration essentially uses a separate database connection to the one which the alter table... statement used (EDIT: This assumption is false, see the accepted answer). Consequently, no status column so the whole migration blows up! Note, we're using a postgres database so DDL statements are transactional.

I could do this as two separate migrations or a mix task to set the data, leaving the migration only for the schema change, but would prefer not to in order to ensure data consistency.

Any thoughts on how to use the same database connection for MyApp.Repo queries in this manner?

EDIT: Note, I'm working on a small set of data and downtime is acceptable in my use case. See José's response below for some good advice when this is not the case.

like image 243
seddy Avatar asked Jan 22 '18 12:01

seddy


People also ask

What is an ecto migration?

Migration behaviour (Ecto SQL v3. 8.3) Migrations are used to modify your database schema over time. This module provides many helpers for migrating the database, allowing developers to use Elixir to alter their storage in a way that is database independent.

What is ecto elixir?

Ecto is an official Elixir project providing a database wrapper and integrated query language. With Ecto we're able to create migrations, define schemas, insert and update records, and query them. Changesets. In order to insert, update or delete data from the database, Ecto. Repo.


2 Answers

You can execute the current pending changes in a migration by calling Ecto.Migration.flush/0. Any code after that will have the status field available.

defmodule MyApp.Repo.Migrations.AddStatus do
  alter table(:foo) do
    add(:status, :text)
  end

  flush()

  foos = from(f in MyApp.Foo, where: ...)
         |> MyApp.Repo.all

  ...
end
like image 80
Dogbert Avatar answered Sep 29 '22 11:09

Dogbert


Generally speaking, doing data migration and changing the DDL at the same time is a bad practice. If this is a live system and the migration takes long, you may generate a lot of contention for a long period of time.

If your application is still processing requests, new entries can be added while the data is processed, and those won't be processed!

There are different ways to tackle this, depending on the use case, but they usually require a step by step approach. For example, if you are adding a new column:

  • the first step is to introduce the new column in the database and make sure that the new column is populated when new entries are created. This step is only to guarantee that all future entries will be correctly populated.

  • then a second step is to migrate the old data

  • finally, you can put the new data live as you can assume all of the entries are properly populated

like image 36
José Valim Avatar answered Sep 29 '22 11:09

José Valim