Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should we be using Faker in Rails Factories?

I love Faker, I use it in my seeds.rb all the time to populate my dev environment with real-ish looking data.

I've also just started using Factory Girl which also saves a lot of time - but when i sleuth around the web for code examples I don't see much evidence of people combining the two.

Q. Is there a good reason why people don't use faker in a factory?

My feeling is that by doing so I'd increase the robustness of my tests by seeding random - but predictable - data each time, which hopefully would increase the chances of a bug popping up.

But perhaps that's incorrect and there is either no benefit over hard coding a factory or I'm not seeing a potential pitfall. Is there a good reason why these two gems should or shouldn't be combined?

like image 934
Huw Avatar asked Jan 23 '16 11:01

Huw


3 Answers

Some people argue against it, as here.

DO NOT USE RANDOM ATTRIBUTE VALUES

One common pattern is to use a fake data library (like Faker or Forgery) to generate random values on the fly. This may seem attractive for names, email addresses or telephone numbers, but it serves no real purpose. Creating unique values is simple enough with sequences:

FactoryGirl.define do   
  sequence(:title) { |n| "Example title #{n}" }

  factory :post do
    title
  end 
end

FactoryGirl.create(:post).title # => 'Example title 1' 

Your randomised data might at some stage trigger unexpected results in your tests, making your factories frustrating to work with. Any value that might affect your test outcome in some way would have to be overridden, meaning:

Over time, you will discover new attributes that cause your test to fail sometimes. This is a frustrating process, since tests might fail only once in every ten or hundred runs – depending on how many attributes and possible values there are, and which combination triggers the bug. You will have to list every such random attribute in every test to override it, which is silly. So, you create non-random factories, thereby negating any benefit of the original randomness. One might argue, as Henrik Nyh does, that random values help you discover bugs. While possible, that obviously means you have a bigger problem: holes in your test suite. In the worst case scenario the bug still goes undetected; in the best case scenario you get a cryptic error message that disappears the next time you run the test, making it hard to debug. True, a cryptic error is better than no error, but randomised factories remain a poor substitute for proper unit tests, code review and TDD to prevent these problems.

Randomised factories are therefore not only not worth the effort, they even give you false confidence in your tests, which is worse than having no tests at all.

But there's nothing stopping you from doing it if you want to, just do it.

Oh, and there is an even easier way to inline a sequence in recent FactoryGirl, that quote was written for an older version.

like image 109
jrochkind Avatar answered Oct 11 '22 15:10

jrochkind


It's up to you.

In my opinion is a very good idea to have random data in tests and it always helped me to discover bugs and corner cases I didn't think about.

I never regret to have random data. All the points described by @jrochkind would be correct (and you should read the other answer before reading this one) but it's also true that you can (and should) write that in your spec_helper.rb

config.before(:all)  { Faker::Config.random = Random.new(config.seed) }

this will make so that you have repeatable tests with repeatable data as well. If you don't do that then you have all the problems described in the other answer.

like image 40
coorasse Avatar answered Oct 11 '22 15:10

coorasse


I like to use Faker and usually do so when working with larger code bases. I see the following advantages and disadvantages when using Faker with Factory Girl:

Possible disadvantages:

  • A bit harder to reproduce the exact same test scenario (at least RSpec works around this by displaying the random number generator seed every time and allows you to reproduce the exact same test with it)
  • Generating data wastes a bit of performance

Possible advantages:

  • Makes data displayed usually more humanly comprehensible. When creating test-data manually, people tend to all kinds of short-cuts to avoid the tediousness.
  • Building factories with Faker for tests at the same time provides you with the means of generating nice demo data for presentations.
  • You could randomly discover edge case bugs when running the tests a lot
like image 27
aef Avatar answered Oct 11 '22 15:10

aef