There are plenty of multipart/form-data file upload solutions out there, but I have not been able to find a free standing one for Scala.
Play2 has this functionality as part of the framework and Spray also supports multipart form data. Unfortunately both these appear to be fairly integrated into the rest of the toolsets (I may be wrong here).
My server has been developed using Finagle (which does not currently support multipart form data), and if possible I would like to use a free standing lib or 'roll my own' solution.
This is a typical multipart/form-data message:
--*****org.apache.cordova.formBoundary
Content-Disposition: form-data; name="value1"
First parameter content
--*****org.apache.cordova.formBoundary
Content-Disposition: form-data; name="value2"
Second parameter content
--*****org.apache.cordova.formBoundary
Content-Disposition: form-data; name="file"; filename="image.jpg"
Content-Type: image/jpeg
$%^&#$%^%#$
--*****org.apache.cordova.formBoundary--
In this example, *****org.apache.cordova.formBoundary
is the form boundary, so the multipart upload contains 2 text parameters and one image (I concatenated the image data for clarity).
If someone who knows Scala better than me can give me a bit of a rundown on how to approach parsing this content, I will be very grateful.
To start with, I thought I would quickly split the content in three doing:
data.split("\\Q--*****org.apache.cordova.formBoundary\\E") foreach println
But execution is notably slow (update - this was due to warm up time). Is there a more efficient way to split the parts? My strategy is to split the content into parts, and the split the parts into sub-parts. Is this a crappy approach? I've seen similar problems being solved with state machines? What is a good functional approach. Keep in mind, I'm trying to learn a proper a approach to Scala while trying to solve the problem.
Update:
I really thought a solution to this problem would be a line or two in Scala. If someone stumbles over this question with a slick solution, please take the time to jot it down. From my understanding one could parse this message using pattern matching, parsing combinators, extraction or simply splitting the string. I'm trying to find the best way to solve this kind of problem, as a project I'm working involves a lot of natural language parsing, and I need to write my own custom parsing tools. I'm getting a good understanding of Scala, but nothing beats the advice of an expert.
It's not just about solving the problem, it's about finding the best (and hopefully simplest) possible way to solve this type of problem.
I'm curious about how slow your "notably slow" actually is. I wrote the following simple little function to generate fake messages:
def generateFakeMessage(n: Int) = {
val rand = new scala.util.Random(1L)
val maxLines = 100
val maxLength = 100
(1 to n).map(i =>
"--*****org.apache.cordova.formBoundary\n" +
"Content-Disposition: form-data; name=\"value%d\"\n\n".format(i) +
(0 to rand.nextInt(maxLines)).map(_ =>
(0 to rand.nextInt(maxLength)).map(_ => rand.nextPrintableChar).mkString
).mkString("\n")
).mkString("\n") + "\n--*****org.apache.cordova.formBoundary--"
}
Next I created a reasonably large message to use for testing:
val data = generateFakeMessage(10000)
It ends up containing a little over half a million lines. Then I tried your regular expression:
data.split("\\Q--*****org.apache.cordova.formBoundary\\E").size
And it returns more or less instantaneously. You could probably tune the regular expression a bit, and there are cleaner approaches you could use if your data were an Iterable[String]
over the lines of the message, but I don't think you're going to get better performance from a hand-rolled state machine for parsing one big String
.
For a first suggestion, this question gives two suggestions, one using a state machine, and the other using parser combinators. I'd pay especial attention to the answer using parser combinators, since these provide a very easy way to build up this sort of parser. The syntax provided in Daniel's answer should adapt very easily to your situation.
Further, you can provide more specific mappings into Scala for your particular grammar if you require. Where Daniel has:
def field = (fieldName <~ ":") ~ fieldBody <~ CRLF ^^ { case name ~ body => name -> body }
you can replace this with an alternation pattern over multiple fields (contentType|contentDisposition|....
) and map each of these individually into your Scala objects.
Apologies for not having the time to write a more detailed solution here, but this should hopefully point you in the right direction!
I think that your solution:
data.split("\\Q--*****org.apache.cordova.formBoundary\\E") foreach println
which is O(n) in complexity, is the best and the simplest you can get. As Travis previously said, this manipulation is not slow. As always with a multipart HTTP form, you will have to parse it one way or another and doing better to O(n) seems tricky.
Moreover, as split
provides you an Iterable
it is really perfect for any matching, treatment...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With