TypeError: argument of type 'float' is not iterable

Question

I am new to python and TensorFlow. I recently started understanding and executing TensorFlow examples, and came across this one: https://www.tensorflow.org/versions/r0.10/tutorials/wide_and_deep/index.html

I got the error, TypeError: argument of type 'float' is not iterable, and I believe that the problem is with the following line of code:

df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)

(income_bracket is the label column of the census dataset, with '>50K' being one of the possible label values, and the other label is '=<50K'. The dataset is read into df_train. The explanation provided in the documentation for the reason to do the above is, "Since the task is a binary classification problem, we'll construct a label column named "label" whose value is 1 if the income is over 50K, and 0 otherwise.")

If anyone could explain me what is exactly happening and how should I fix it, that'll be great. I tried using Python2.7 and Python3.4, and I don't think that the problem is with the version of the language. Also, if anyone is aware of great tutorials for someone who is new to TensorFlow and pandas, please share the links.

Complete program:

import pandas as pd
import urllib
import tempfile
import tensorflow as tf

gender = tf.contrib.layers.sparse_column_with_keys(column_name="gender", keys=["female", "male"])
race = tf.contrib.layers.sparse_column_with_keys(column_name="race", keys=["Amer-Indian-Eskimo", "Asian-Pac-Islander", "Black", "Other", "White"])
education = tf.contrib.layers.sparse_column_with_hash_bucket("education", hash_bucket_size=1000)
marital_status = tf.contrib.layers.sparse_column_with_hash_bucket("marital_status", hash_bucket_size=100)
relationship = tf.contrib.layers.sparse_column_with_hash_bucket("relationship", hash_bucket_size=100)
workclass = tf.contrib.layers.sparse_column_with_hash_bucket("workclass", hash_bucket_size=100)
occupation = tf.contrib.layers.sparse_column_with_hash_bucket("occupation", hash_bucket_size=1000)
native_country = tf.contrib.layers.sparse_column_with_hash_bucket("native_country", hash_bucket_size=1000)


age = tf.contrib.layers.real_valued_column("age")
age_buckets = tf.contrib.layers.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
education_num = tf.contrib.layers.real_valued_column("education_num")
capital_gain = tf.contrib.layers.real_valued_column("capital_gain")
capital_loss = tf.contrib.layers.real_valued_column("capital_loss")
hours_per_week = tf.contrib.layers.real_valued_column("hours_per_week")

wide_columns = [gender, native_country, education, occupation, workclass, marital_status, relationship, age_buckets, tf.contrib.layers.crossed_column([education, occupation], hash_bucket_size=int(1e4)), tf.contrib.layers.crossed_column([native_country, occupation], hash_bucket_size=int(1e4)), tf.contrib.layers.crossed_column([age_buckets, race, occupation], hash_bucket_size=int(1e6))]

deep_columns = [
  tf.contrib.layers.embedding_column(workclass, dimension=8),
  tf.contrib.layers.embedding_column(education, dimension=8),
  tf.contrib.layers.embedding_column(marital_status, dimension=8),
  tf.contrib.layers.embedding_column(gender, dimension=8),
  tf.contrib.layers.embedding_column(relationship, dimension=8),
  tf.contrib.layers.embedding_column(race, dimension=8),
  tf.contrib.layers.embedding_column(native_country, dimension=8),
  tf.contrib.layers.embedding_column(occupation, dimension=8),
  age, education_num, capital_gain, capital_loss, hours_per_week]

model_dir = tempfile.mkdtemp()
m = tf.contrib.learn.DNNLinearCombinedClassifier(
    model_dir=model_dir,
    linear_feature_columns=wide_columns,
    dnn_feature_columns=deep_columns,
    dnn_hidden_units=[100, 50])


COLUMNS = ["age", "workclass", "fnlwgt", "education", "education_num",
  "marital_status", "occupation", "relationship", "race", "gender",
  "capital_gain", "capital_loss", "hours_per_week", "native_country", "income_bracket"]
LABEL_COLUMN = 'label'
CATEGORICAL_COLUMNS = ["workclass", "education", "marital_status", "occupation", "relationship", "race", "gender", "native_country"]
CONTINUOUS_COLUMNS = ["age", "education_num", "capital_gain", "capital_loss", "hours_per_week"]


train_file = tempfile.NamedTemporaryFile()
test_file = tempfile.NamedTemporaryFile()
urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", train_file.name)
urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test", test_file.name)


df_train = pd.read_csv(train_file, names=COLUMNS, skipinitialspace=True)
df_test = pd.read_csv(test_file, names=COLUMNS, skipinitialspace=True, skiprows=1)
df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
df_test[LABEL_COLUMN] = (df_test['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)


def input_fn(df):

  continuous_cols = {k: tf.constant(df[k].values)
                     for k in CONTINUOUS_COLUMNS}

  categorical_cols = {k: tf.SparseTensor(
      indices=[[i, 0] for i in range(df[k].size)],
      values=df[k].values,
      shape=[df[k].size, 1])
                      for k in CATEGORICAL_COLUMNS}

  feature_cols = dict(continuous_cols.items() + categorical_cols.items())
  label = tf.constant(df[LABEL_COLUMN].values)
  return feature_cols, label


def train_input_fn():
    return input_fn(df_train)


def eval_input_fn():
    return input_fn(df_test)

m.fit(input_fn=train_input_fn, steps=200)
results = m.evaluate(input_fn=eval_input_fn, steps=1)
for key in sorted(results):
    print("%s: %s" % (key, results[key]))

Thank you

PS: Full stack trace for the error

Traceback (most recent call last):
  
File "/home/jaspreet/PycharmProjects/TicTacTensorFlow/census.py", line 73, in <module>
    df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
  
File "/usr/lib/python2.7/dist-packages/pandas/core/series.py", line 2023, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
 
File "inference.pyx", line 920, in pandas.lib.map_infer (pandas/lib.c:44780)
  
File "/home/jaspreet/PycharmProjects/TicTacTensorFlow/census.py", line 73, in <lambda>
    df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)

TypeError: argument of type 'float' is not iterable

Microos · Accepted Answer

As you can see, when you inspect the test.data, you will obviously see that the first line of data has "NAN" in income_bracket field.

I have further inspected that this is the only line contains "NAN" by doing:

ib = df_test ["income_bracket"]
t = type('12')
for idx,i in enumerate(ib):
    if(type(i) != t):
        print idx,type(i)

RESULT: 0 <type 'float'>

So you may just skip this row by:

df_test = pd.read_csv(file_test , names=COLUMNS, skipinitialspace=True, skiprows=1)

jaspreet kaur bassan · Answer

The program works verbatim with the latest version of pandas, i.e., 0.18.1

TypeError: argument of type 'float' is not iterable

Tags:

python

pandas

tensorflow

jaspreet kaur bassan

2 Answers

Microos

jaspreet kaur bassan

Recent Activity

Donate For Us

TypeError: argument of type 'float' is not iterable

Tags:

python

pandas

tensorflow

jaspreet kaur bassan

2 Answers

Microos

jaspreet kaur bassan

Related questions

Recent Activity

Donate For Us