Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mesos slaves reject all Marathon jobs with persistent volumes; claims no space available

Tags:

mesos

marathon

I'm trying to use the persistent volumes support for Mesos, and am having a tremendously difficult time getting it to work.

I've configured each of my slaves, as follows, and have confirmed that they've successfully rebooted using this new config:

/etc/mesos-slave/resources

[    ​
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "PATH",
        "path" : { "root" : "/mnt/disk1" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "PATH",
        "path" : { "root" : "/mnt/disk2" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "PATH",
        "path" : { "root" : "/mnt/disk3" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "PATH",
        "path" : { "root" : "/mnt/disk4" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "PATH",
        "path" : { "root" : "/mnt/disk5" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "MOUNT",
        "mount" : { "root" : "/mnt/disk6" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "MOUNT",
        "mount" : { "root" : "/mnt/disk7" }
      }
    }
  }
]

It shows, specifically, that I have unreserved resources. Specifically (full response here):

{
  ...
  "slaves": [{
    "id": "c5e59876-5157-463f-b31e-16b34d6ffc72-S8",
    "pid": "slave(1)@172.30.31.55:5051",
    "hostname": "redacted47.redacted.com",
    "registered_time": 1458810586.61153,
    "resources": {
      "cpus": 32,
      "disk": 29360128,
      "mem": 256651,
      "ports": "[31000-32000]"
    },
    "used_resources": {
      "cpus": 1,
      "disk": 0,
      "mem": 128,
      "ports": "[31282-31282]"
    },
    "offered_resources": {
      "cpus": 0,
      "disk": 0,
      "mem": 0
    },
    "reserved_resources": {},
    "unreserved_resources": {
      "cpus": 32,
      "disk": 29360128,
      "mem": 256651,
      "ports": "[31000-32000]"
    },

Whenever I try to submit a job to it that requests a persistent volume, all of the slaves reject it, claiming that there are no disk resource available:

Mar 26 17:59:43 redacted47.redacted.com start[30457]: [2016-03-26 17:59:43,606] INFO Offer [2220b6bf-aac2-402b-82e6-8d625284d1a4-O9375]. Considering unreserved resources with roles {*}. Not all basic resources satisfied: cpus SATISFIED (1.0 <= 1.0), mem SATISFIED (128.0 <= 128.0), disk including volumes NOT SATISFIED (1024.0 > 0.0) (mesosphere.mesos.ResourceMatcher$:marathon-akka.actor.default-dispatcher-38)
Mar 26 17:59:43 redacted47.redacted.com start[30457]: [2016-03-26 17:59:43,606] INFO Offer [2220b6bf-aac2-402b-82e6-8d625284d1a4-O9376]. Considering unreserved resources with roles {*}. Not all basic resources satisfied: cpus SATISFIED (1.0 <= 1.0), mem SATISFIED (128.0 <= 128.0), disk including volumes NOT SATISFIED (1024.0 > 0.0) (mesosphere.mesos.ResourceMatcher$:marathon-akka.actor.default-dispatcher-38)
Mar 26 17:59:43 redacted47.redacted.com start[30457]: [2016-03-26 17:59:43,606] INFO Finished processing 2220b6bf-aac2-402b-82e6-8d625284d1a4-O9375. Matched 0 ops after 1 passes. disk(*) 4194304.0; disk(*) 4194304.0; disk(*) 4194304.0; disk(*) 4194304.0; disk(*) 4194304.0; disk(*) 4194304.0; disk(*) 4194304.0; cpus(*) 28.0; mem(*) 226955.0; ports(*) 31000->31085,31087->31364,31366->31940,31942->32000 left. (mesosphere.marathon.core.matcher.manager.impl.OfferMatcherManagerActor:marathon-akka.actor.default-dispatcher-11)
Mar 26 17:59:43 redacted47.redacted.com start[30457]: [2016-03-26 17:59:43,606] INFO Offer [2220b6bf-aac2-402b-82e6-8d625284d1a4-O9379]. Considering unreserved resources with roles {*}. Not all basic resources satisfied: cpus SATISFIED (1.0 <= 1.0), mem SATISFIED (128.0 <= 128.0), disk including volumes NOT SATISFIED (1024.0 > 0.0) (mesosphere.mesos.ResourceMatcher$:marathon-akka.actor.default-dispatcher-38)

If I try to post a request to create a volume directly against the mesos master, then it rejects the request, saying "Insufficient disk resources", as follows:

# curl -v -i \
    -u "marathon:$(cat /etc/marathon/.secret)" \
    -d slaveId=c5e59876-5157-463f-b31e-16b34d6ffc72-S8 \
    -d volumes='[
      {
        "name": "disk",
        "type": "SCALAR",
        "scalar": { "value": 512 },
        "role": "foo",
        "reservation": {
          "principal": "marathon"
        },
        "disk": {
          "persistence": {
            "id" : "very-persist"
          },
          "volume": {
            "mode": "RW",
            "container_path": "such-path"
          }
        }
      }
    ]' \
    -X POST http://localhost:5050/master/create-volumes; echo
* About to connect() to localhost port 5050 (#0)
*   Trying ::1...
* Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 5050 (#0)
* Server auth using Basic with user 'marathon'
> POST /master/create-volumes HTTP/1.1
> Authorization: Basic redacted
> User-Agent: curl/7.29.0
> Host: localhost:5050
> Accept: */*
> Content-Length: 481
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 481 out of 481 bytes
< HTTP/1.1 409 Conflict
HTTP/1.1 409 Conflict
< Date: Thu, 24 Mar 2016 09:50:36 GMT
Date: Thu, 24 Mar 2016 09:50:36 GMT
< Content-Length: 53
Content-Length: 53
​
<
* Connection #0 to host localhost left intact
Invalid CREATE Operation: Insufficient disk resources

I'm at wits end. I don't know what I'm doing and I'm trying my best to follow the documentation. Any hint as to what I might be doing wrong would be greatly, tremendously appreciated.

I'm running:

  • Mesos 0.28.0
  • Marathon 1.0.0RC1

I'm following the instructions from the following resources, as best as I can:

  • https://mesosphere.github.io/marathon/docs/persistent-volumes.html
  • http://mesos.apache.org/documentation/latest/persistent-volume/
  • http://mesos.apache.org/documentation/latest/multiple-disk/

Thank you for reading!

like image 833
Tim Harper Avatar asked Oct 31 '22 06:10

Tim Harper


1 Answers

First thank you for providing such a nicely documented issue!

Your problem here seems to be the following:

a) There is no root disk resource available. Once you specify a disk resource manually as you did Mesos will stop detecting the root disk automatically. You could simply add a root disk resource as described here which should solve your problem.

b) Your "Create Volume" http request above will only consider root disk resources (which you don't have for the reason explained above). If you want to use the non-root disk, you should consider the source field as very briefly mentioned here.

BTW any feedback on how the documentation can be improved is welcome (I will add a short note about this issue, but any feedback from users is very helpful)! Feel free to contribute here!

Hope this was helpful!

like image 159
js84 Avatar answered Jan 04 '23 13:01

js84