I've got an Ecto migration where I'd like to modify some columns but also migrate some data. For example:
import Ecto.Query
defmodule MyApp.Repo.Migrations.AddStatus do
alter table(:foo) do
add(:status, :text)
end
foos = from(f in MyApp.Foo, where: ...)
|> MyApp.Repo.all
Enum.each(foos, fn(foo) ->
# There's then some complex logic here to work
# out how to set the status based on other attributes of `foo`
end)
end
Now, the problem here is that by calling MyApp.Repo.all
the migration essentially uses a separate database connection to the one which the alter table...
statement used (EDIT: This assumption is false, see the accepted answer). Consequently, no status
column so the whole migration blows up! Note, we're using a postgres
database so DDL statements are transactional.
I could do this as two separate migrations or a mix
task to set the data, leaving the migration only for the schema change, but would prefer not to in order to ensure data consistency.
Any thoughts on how to use the same database connection for MyApp.Repo
queries in this manner?
EDIT: Note, I'm working on a small set of data and downtime is acceptable in my use case. See José's response below for some good advice when this is not the case.
Migration behaviour (Ecto SQL v3. 8.3) Migrations are used to modify your database schema over time. This module provides many helpers for migrating the database, allowing developers to use Elixir to alter their storage in a way that is database independent.
Ecto is an official Elixir project providing a database wrapper and integrated query language. With Ecto we're able to create migrations, define schemas, insert and update records, and query them. Changesets. In order to insert, update or delete data from the database, Ecto. Repo.
You can execute the current pending changes in a migration by calling Ecto.Migration.flush/0
. Any code after that will have the status field available.
defmodule MyApp.Repo.Migrations.AddStatus do
alter table(:foo) do
add(:status, :text)
end
flush()
foos = from(f in MyApp.Foo, where: ...)
|> MyApp.Repo.all
...
end
Generally speaking, doing data migration and changing the DDL at the same time is a bad practice. If this is a live system and the migration takes long, you may generate a lot of contention for a long period of time.
If your application is still processing requests, new entries can be added while the data is processed, and those won't be processed!
There are different ways to tackle this, depending on the use case, but they usually require a step by step approach. For example, if you are adding a new column:
the first step is to introduce the new column in the database and make sure that the new column is populated when new entries are created. This step is only to guarantee that all future entries will be correctly populated.
then a second step is to migrate the old data
finally, you can put the new data live as you can assume all of the entries are properly populated
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With