Git environment setup. Advice needed

Question

Background info:

We are currently 3 web programmers (good, real-life friends, no distrust issues).
Each programmers SSH into the single Linux server, where the code resides, under their own username with sudo powers.
We all use work on the different files at one time. We ask the question "Are you in the file __?" sometimes. We use Vim so we know if the file is opened or not.
Our development code (no production yet) resides in /var/www/
Our remote repo is hosted on bitbucket.
I am *very* new to Git. I used subversion before but I was basically spoon-fed instructions and was told exactly what to type to sync up codes and commit.
I read about half of Scott Chacon's Pro Git and that's the extent to most of my Git knowledge.
In case it matters, we run Ubuntu 11.04, Apache 2.2.17, and Git 1.7.4.1.

So Jan Hudec gave me some advice in the previous question. He told me that a good practice to do the following:

Each developer have their own repo on their local computer.
Let the /var/www/ be the repo on the server. Set the .git folder to permission 770.

That would mean that each developer's computer need to have their own LAMP stack (or at least Apache, PHP, MySQL, and Python installed).

The codes are mostly JavaScript and PHP files so it's not a big deal to clone it over. However how do we locally manage the database?

In this case, we only have two tables and it'll be simple to recreate the entire database locally (at least for testing). But in the future when the database gets too big, then should we just remotely log on the MySQL database on the server or should we just have a "sample" data for developing and testing purposes?

Schwern · Accepted Answer

What you're doing is transitioning from "everybody works together in one environment" to "everybody has their own development environment". The major benefit is everybody won't be stepping on each other's feet.

Other benefits include a heterogeneous development environment, that is if everyone is developing on the same machine the software will become dependent on that one setup because developers are lazy. If everyone develops in different environments, even just with slightly different versions of the same stuff, they'll be forced to write more robust code to deal with that.

The main drawback, as you've noticed, is setting up the environment is harder. In particular, making sure the database works.

First, each developer should have their own database. This doesn't mean they all have to have their own database server (though its good for heterogeneous purposes) but they should have their own database instance which they control.

Second, you should have a schema and not just whatever's in the database. It should be in a version controlled file.

Third, setting up a fresh database should be automatic. This lets developers set up a clean database with no hassle.

Fourth, you'll need to get interesting test data into that database. Here's where things get interesting...

You have several routes to do that.

First is to make a dump of an existing database which contains realistic data, sanitized of course. This is easy, and provides realistic data, but it is very brittle. Developers will have to hunt around to find interesting data to do their testing. That data may change in the next dump, breaking their tests. Or it just might not exist at all.

Second is to write "test fixtures". Basically each test populates the database with the test data it needs. This has the benefit of allowing the developer to get precisely the data they want, and know precisely the state the database is in. The drawbacks are that it can be very time consuming, and often the data is too clean. The data will not contain all the gritty real data that can cause real bugs.

Third is to not access the database at all and instead "mock" all the database calls. You trick all the methods which normally query a database into instead returning testing data. This is much like writing test fixtures, and has most of the same drawbacks and benefits, but it's FAR more invasive. It will be difficult to do unless your system has been designed to do it. It also never actually tests if your database calls work.

Finally, you can build up a set of libraries which generate semi-random data for you. I call this "The Sims Technique" after the video game where you create fake families, torture them and then throw them away. For example, lets say you have User object who needs a name, an age, a Payment object and a Session object. To test a User you might want users with different names, ages, ability to pay and login status. To control all that you need to generate test data for names, ages, Payments and Sessions. So you write a function to generate names and one to generate ages. These can be as simple as picking randomly from a list. Then you write one to make you a Payment object and one a Session object. By default, all the attributes will be random, but valid... unless you specify otherwise. For example...

# Generate a random login session, but guarantee that it's logged in.
session = Session.sim( logged_in = true )

Then you can use this to put together an interesting User.

# A user who is logged in but has an invalid Visa card
# Their name and age will be random but valid
user = User.sim(
    session = Session.sim( logged_in = true ),
    payment = Payment.sim( invalid = true, type = "Visa" ),
);

This has all the advantages of test fixtures, but since some of the data is unpredictable it has some of the advantages of real data. Adding "interesting" data to your default sim and rand functions will have wide ranging repercussions. For example, adding a Unicode name to random_name will likely discover all sorts of interesting bugs! It unfortunately is expensive and time consuming to build up.

There you have it. Unfortunately there's no easy answer to the database problem, but I implore you to not simply copy the production database as it's a losing proposition in the long run. You'll likely do a hybrid of all the choices: copying, fixtures, mocking, semi-random data.

rkb · Answer

A few options, in order of increasing complexity:

You all connect to the live master DB, read/write permissions. This is risky, but I guess you're already doing it. Make sure you have backups!
Use test fixtures to populate a local test DB and just use it. Not sure what tools there are for this in the PHP world.
Copy (mysqldump) the master database and import it into your local machines' MySQL instances, then set up your dev environments to connect to your local MySQL. Repeat the dump/import as necessary
Set up one-way replication from the master to your local instances.

Optionally, set up a read-only user on the main DB, and configure your app to let you switch to a read-only connection to the real master DB in case you can't wait for that next copy of the master data.

Git environment setup. Advice needed

Tags:

git

mysql

lamp

hobbes3

2 Answers

Schwern

rkb

Recent Activity

Donate For Us

Git environment setup. Advice needed

Tags:

git

mysql

lamp

hobbes3

2 Answers

Schwern

rkb

Related questions

Recent Activity

Donate For Us