With scala
, using sbt
for builds and git
for version control, what would be a good way of organizing your team code when it outgrows being a single project? At some point, you start thinking about separating your code into separate libraries or projects, and importing between them as necessary. How would you organize things for that? or would you avoid the temptation and just manage all packages under the same sbt and git single "project"?
Points of interest being: (feel free to change)
CI server
.SbtNativePackager
to package your stuff for production without too much pain.In addition, would you use some sort of "local sbt/maven team repository" and what may need to be done to accomplish that? hopefully, this is not necessary though.
Thanks!
I use the following lines in the sand:
I try to consider the final deployables when making divisions that make sense. For example, if my system foosys has foosys-frontend
and foosys-backend
deployables, where foosys-frontend
does HTML templating and foosys-backend
talks to the database and the two communicate via a REST API, then I'll have those as separate projects, and a foosys-core
project for common code. foosys-core
isn't allowed to depend on the html templating library (because foosys-backend
doesn't want that), nor on the ORM library (because foosys-frontend
doesn't want that). But I don't worry about separating the code that works with the REST library from the "core domain objects", because both foosys-frontend
and foosys-backend
use the REST code.
Now supose I add a new foosys-reports
deployable, which accesses the database to do some reports. Then I'll probably create a foosys-database
project, depending on foosys-core
, to hold shared code used by both foosys-backend
and foosys-reports
. And since foosys-reports
doesn't use the REST library, I should probably also split out foosys-rest
from foosys-core
. So I end up with a foosys-core
library, two more library projects that depend on it (foosys-database
and foosys-rest
), and the three deployable projects (foosys-reports
depending on foosys-database
, foosys-frontend
depending on foosys-rest
, and foosys-backend
depending on both).
You'll notice that this means there's one code project for every combination of deployables where that code might be used. Code that goes in all three deployables goes in foosys-core
. Code that goes in just one deployable goes in that deployable's project. Code that goes in two of the three deployables goes in foosys-rest
or foosys-database
. If we wanted to have some code that was part of the foosys-frontend
and foosys-reports
deployables, but not the foosys-backend
deployable, we'd have to create another project for that code. In theory this means an exponential blowup in the number of projects as we add more deployables. In practice I've found it's not too problematic - most theoretically possible combinations don't actually make sense, so as long as we only create new projects when we actually have code to put in them it's ok. And if we end up with a couple of classes in foosys-core
that aren't actually used in every single deployable, it's not the end of the world.
Tests are best understood in this view as another kind of deployable. So I would have a separate foosys-test
project containing common code that was used for tests for all three deployable projects (depending on foosys-core
), and perhaps a foosys-database-test
project (depending on foosys-test
and foosys-database
) for test helper code (e.g. database integration test setup code) that was common between foosys-backend
and foosys-reports
. Ultimately we might end up with a full parallel hierarchy of -test
projects.
Code in different repositories is necessarily versioned independently, so in some sense this is a vacuous definition. But I think you should move on to separate git repositories only when you have to (analogy with this post: you should only use Hadoop when your data is too big to use anything friendlier). Once your code is in multiple git repositories, you have to manually update the dependencies between them (on a dev machine you can use -SNAPSHOT dependencies and IDE support to work as though the versions were still in sync, but you have to manually update this every time you resync with master, so it adds friction to development). Since you're doing releases and updating the dependency asynchronously, you have to adopt and enforce something like semantic versioning, so that people know when it's safe to update the dependency on foocorp-utils
and when it isn't. You have to publish changelogs, and have an early-warning CI build, and a more thorough code review process. All this is because the feedback cycle is a lot longer; if you break something in a downstream project, you won't know about this until they update their dependency on foocorp-utils
, months or even years later (yes, years - I have witnessed this, and in an 80-person startup, not a megacorp). So you need process to prevent that, and everything becomes correspondingly less agile.
Valid reasons to do this include:
I would use a team maven repository, probably Nexus. Actually I'd recommend this even before you get to the multi-project stage. It's very easy to run (just a Java app), and you can proxy your external dependencies through it, meaning you have a reliable source for your dependency jars and your builds will be reproducible even if one of your upstream dependencies disappears.
I intend to write up my ways of team working as a blog post, but in the meantime I'm happy to answer any further questions.
I'm a little late here, but my 2 cents.
Most scala projects and/or any projects I've worked in my past jobs have ended up with a very similar structure. Usually with consensus with other team members (which helps to validate the decision). The only main philosophical difference has been to either separate projects on technical infrastructure layers or by business modules. Examples below:
This can be very convenient and easy to manage by business area and you can then deploy single modules as needed. You can also later decide to separate out the modules into separate APIs if needed ( with a shared code base still in utils, and core ). The disadvantage here is that the approach can make the number of projects swell.
In this approach all the logic / services for all areas are in the services project and likewise for the database. So the code for say the inventory is split between in the database and services projects. This allows separating by traditional technical tiers. This can be much faster for smaller projects.
Personally, I prefer the more modular separation in option 1. Its more scalable and generally feels simpler when making code changes.
-K
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With