Pandas-js is an experimental library mimicking the Python pandas API in JavaScript. The Python pandas library is built on top of NumPy for its data storage. Panda-js mirrors this structure by building on top of immutable. js.
Pandas run operations on a single machine whereas PySpark runs on multiple machines. If you are working on a Machine Learning application where you are dealing with larger datasets, PySpark is the best fit which could process operations many times(100x) faster than Pandas.
This wiki will summarize and compare many pandas
-like Javascript libraries.
In general, you should check out the d3
Javascript library. d3
is very useful "swiss army knife" for handling data in Javascript, just like pandas
is helpful for Python. You may see d3
used frequently like pandas
, even if d3
is not exactly a DataFrame/Pandas replacement (i.e. d3
doesn't have the same API; d3
doesn't have Series
/ DataFrame
which behave like in pandas
)
Ahmed's answer explains how d3 can be used to achieve some DataFrame functionality, and some of the libraries below were inspired by things like LearnJsData which uses d3
and lodash
.
As for DataFrame-style data transformation (splitting, joining, group by etc) , here is a quick list of some of the Javascript libraries.
Note some libraries are Node.js aka Server-side Javascript, some are browser-compatible aka client-side Javascript, and some are Typescript. So use the option that's right for you.
dfd
); has a basic DataFrame-type data structure, with the ability to plot directlypandas
is built on top of numpy
; likewise danfo-js
is built on tensorflow-js
Then after coming to this question, checking other answers here and doing more searching, I found options like:
JS
alternative to the IPython/Jupyter "notebooks"
recline
; from Rufus' answer)
js-data-mongodb
, js-data-redis
, js-data-cloud-datastore
), sorting, filtering, etc.Here are the criteria we used to consider the above choices
Jupyter
(interactive notebooks), etcI've been working on a data wrangling library for JavaScript called data-forge. It's inspired by LINQ and Pandas.
It can be installed like this:
npm install --save data-forge
Your example would work like this:
var csvData = "Source,col1,col2,col3\n" +
"foo,1,2,3\n" +
"bar,3,4,5\n";
var dataForge = require('data-forge');
var dataFrame =
dataForge.fromCSV(csvData)
.parseInts([ "col1", "col2", "col3" ])
;
If your data was in a CSV file you could load it like this:
var dataFrame = dataForge.readFileSync(fileName)
.parseCSV()
.parseInts([ "col1", "col2", "col3" ])
;
You can use the select
method to transform rows.
You can extract a column using getSeries
then use the select
method to transform values in that column.
You get your data back out of the data-frame like this:
var data = dataFrame.toArray();
To average a column:
var avg = dataFrame.getSeries("col1").average();
There is much more you can do with this.
You can find more documentation on npm.
Ceaveat The following is applicable only to d3 v3, and not the latest d4v4!
I am partial to d3.js, and while it won't be a total replacement for Pandas, if you spend some time learning its paradigm, it should be able to take care of all your data wrangling for you. (And if you wind up wanting to display results in the browser, it's ideally suited to that.)
Example. My CSV file data.csv
:
name,age,color
Mickey,65,black
Donald,58,white
Pluto,64,orange
In the same directory, create an index.html
containing the following:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<title>My D3 demo</title>
<script src="http://d3js.org/d3.v3.min.js" charset="utf-8"></script>
</head>
<body>
<script charset="utf-8" src="demo.js"></script>
</body>
</html>
and also a demo.js
file containing the following:
d3.csv('/data.csv',
// How to format each row. Since the CSV file has a header, `row` will be
// an object with keys derived from the header.
function(row) {
return {name : row.name, age : +row.age, color : row.color};
},
// Callback to run once all data's loaded and ready.
function(data) {
// Log the data to the JavaScript console
console.log(data);
// Compute some interesting results
var averageAge = data.reduce(function(prev, curr) {
return prev + curr.age;
}, 0) / data.length;
// Also, display it
var ulSelection = d3.select('body').append('ul');
var valuesSelection =
ulSelection.selectAll('li').data(data).enter().append('li').text(
function(d) { return d.age; });
var totalSelection =
ulSelection.append('li').text('Average: ' + averageAge);
});
In the directory, run python -m SimpleHTTPServer 8181
, and open http://localhost:8181 in your browser to see a simple listing of the ages and their average.
This simple example shows a few relevant features of d3:
Below is Python numpy and pandas
```
import numpy as np
import pandas as pd
data_frame = pd.DataFrame(np.random.randn(5, 4), ['A', 'B', 'C', 'D', 'E'], [1, 2, 3, 4])
data_frame[5] = np.random.randint(1, 50, 5)
print(data_frame.loc[['C', 'D'], [2, 3]])
# axis 1 = Y | 0 = X
data_frame.drop(5, axis=1, inplace=True)
print(data_frame)
```
The same can be achieved in JavaScript* [numjs works only with Node.js] But D3.js has much advanced Data file set options. Both numjs and Pandas-js still in works..
import np from 'numjs';
import { DataFrame } from 'pandas-js';
const df = new DataFrame(np.random.randn(5, 4), ['A', 'B', 'C', 'D', 'E'], [1, 2, 3, 4])
// df
/*
1 2 3 4
A 0.023126 1.078130 -0.521409 -1.480726
B 0.920194 -0.201019 0.028180 0.558041
C -0.650564 -0.505693 -0.533010 0.441858
D -0.973549 0.095626 -1.302843 1.109872
E -0.989123 -1.382969 -1.682573 -0.637132
*/
@neversaint your wait is over. say welcome to Danfo.js which is pandas like Javascript library built on tensorflow.js and supports tensors out of the box. This means you can convert danfo data structure to Tensors. And you can do groupby, merging, joining, plotting and other data processing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With