I am new to DBT and currently trying to build a Docker container where I can directly run DBT commands within. I have a file where I export env variables (envs.sh) that looks like:
export DB_HOST="secret"
export DB_PWD="evenabiggersecret"
My packages.yml looks like:
packages:
- package: fishtown-analytics/dbt_utils
version: 0.6.2
I structured my docker file like:
FROM fishtownanalytics/dbt:0.19.0b1
# Define working directory
WORKDIR /usr/app/profile/
ENV DBT_DIR /usr/app
ENV DBT_PROFILES_DIR /usr/app
# Load ENV Vars
COPY ./dbt ${DBT_DIR}
# Load env variables and install packages
COPY envs.sh envs.sh
RUN . ./envs.sh \
&& dbt deps # Exporting envs to avoid profile not found errors when install deps
However, when I run dbt run inside the docker container I get the error:
'dbt_utils' is undefined. When I manually run dbt deps it seems to fix the issue and dbt run succeeds. Am I missing something when I am originally installing the dependencies?
Update:
In other words, running dbt deps when building the Docker image seems to have no effect. So I have to run it manually (when I do docker run for example) before I can start doing my workflows. This issue does not happen when I use a Python image (not the image from fishtown-analytics)
Because the base image in the Dockerfile (fishtownanalytics/dbt:0.19.0b1) includes a VOLUME declaration for /usr/app, you can't modify anything in that directory during the build process (see Dockerfile reference notes on VOLUME). Because the working directory is using /usr/app, the modules that are being downloaded and installed by the RUN dbt deps command in the Dockerfile are being discarded rather than being added to the final image. The python image doesn't have the same VOLUME declaration so isn't causing the same issue.
To get around this you can change the working directory to something other than the declared volume name (e.g., /usr/dbt).
Running dbt deps is a necessary step in preparing your dbt environment, so you should feel fine invoking dbt deps in the Dockerfile prior to dbt run.
I think, however, your intention is getting lost in the RUN instruction on the last line: either the last-line RUN command should be converted to a CMD instruction or you could perform a RUN dbt depts by itself prior. (See this question for more detail on the differences between RUN and CMD.)
And, for what it's worth: dbt Cloud, the hosted SaaS build environment for dbt, also runs dbt deps as one of its standard steps for all dbt build jobs -- meaning executing at run time, every time, similar to Docker's CMD.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With