Example for non-iid data

Tags:

I have read some papers regarding to non-iid data. Based on Wikipedia, I know what iid (independent and identical distributed) data is but am still confused about non-iid. I did some research but could not find a clear definition and example of it. Can someone help me on this?

811

asked Oct 24 '12 21:10

Long Thai

Video Answer

2 Answers

From wikipedia iid:

"Independent and identically distributed" implies an element in the sequence is independent of the random variables that came before it. In this way, an IID sequence is different from a Markov sequence, where the probability distribution for the nth random variable is a function of the previous random variable in the sequence (for a first order Markov sequence).

As a simple synthetic example, assume you have a special dice with 6 faces. If the last time the face value is 1, next time you throw it, you will still get a face value of 1 with 0.5 probability and a face value of 2,3,4,5,6 each with 0.1 probability. However, if the last time the face value is not 1, you get equal probability of each face. E.g.,

p(face(0) = k) = 1/6, k = 1,2,3,4,5,6  -- > initial probability at time 0. 
p(face(t) = 1| face(t-1) = 1) = 0.5, p(face(t) = 1| face(t-1) != 1) = 1/6
p(face(t) = 2| face(t-1) = 1) = 0.1, p(face(t) = 1| face(t-1) != 1) = 1/6
p(face(t) = 3| face(t-1) = 1) = 0.1, p(face(t) = 1| face(t-1) != 1) = 1/6
p(face(t) = 4| face(t-1) = 1) = 0.1, p(face(t) = 1| face(t-1) != 1) = 1/6
p(face(t) = 5| face(t-1) = 1) = 0.1, p(face(t) = 1| face(t-1) != 1) = 1/6
p(face(t) = 6| face(t-1) = 1) = 0.1, p(face(t) = 1| face(t-1) != 1) = 1/6
face(t) stands for the face value of t-th throw.

This is an example when the probability distribution for the nth random variable (the result of the nth throw) is a function of the previous random variable in the sequence.

I see Non-identical and Non-independent (e.g, Markovian) data in some machine learning scenarios, which can be thought of as non-iid examples.

Online learning with streaming data, when the distribution of the incoming examples changes over time: the examples are not identically distributed. Suppose you have a learning module for predicting the click-thru-rate of online-ads, the distribution of query terms coming from the users are changing during the year dependent on seasonal trending. The query terms in summer and in Christmas season should have different distribution.
Active learning, where labels for specific data are requested by the learner: the independence assumption is also violated.
Learning / making inference with graphical models. Variables are connected thru dependence relations.

185

answered Sep 22 '22 04:09

greeness

In a very hand-wavy way (since I assume you've read the technical definition), i.i.d. means if you have a bunch of values, then all permutations of those values have equal probability. So if I have 3,6,7 then the probability of this is equal to the probability of 7,6,3 is equal to 6,7,3 etc. That is, each value has no dependence on other values in the sequence.

As a counter example, imagine the sequence x where each element x_i is either one higher or one lower than the preceding element, with a 50-50 chance as to which of these happens. Then one possible sequence is 1,2,3,2,3,4,3,2. It should be clear that there are some permutations of this sequence that are not equiprobable: in particular, sequences starting 1,4,... have probability zero. You can instead consider pairs of the form x_i | x_i-1 to be iid if you wish.

answered Sep 22 '22 04:09

Ben Allison

Related questions
                            
                                Switching user in Fabric
                            
                                Display Images Inline via CSS
                            
                                Why is this regex using lookbehinds invalid in R?
                            
                                how to use customs fonts in emails or email templates
                            
                                Optimizing Execution Plans for Parameterized T-SQL Queries Containing Window Functions
                            
                                Avoiding the infamous "eval(parse())" construct
                            
                                How do I keep Resharper from massively indenting lambdas?
                            
                                Where do I test my queries for my RDF written in SPARQL
                            
                                Difference between KeyStore and KeyManager/TrustManager
                            
                                JSON "POST" to Flask View doesn't work
                            
                                Keyboard-only column block selection in GVim Win32, or why does Ctrl-Q not emulate Ctrl-V when mswin.vim is included?
                            
                                Difference between git branch --set-upstream-to vs git remote add origin

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With