Fastplotlib: GPU-accelerated, fast, and interactive plotting library

491 points by rossant 8 months ago

Every two weeks or so I peruse github looking for something like this and I have to say this looks really promising. In statistical genetics we make really big scatterplots called Manhattan plots https://en.wikipedia.org/wiki/Manhattan_plot and we have to use all this highly specialized software to visualize at different scales (for a sense of what this looks like: https://my.locuszoom.org/gwas/236887/). Excited to try this out

clewis7 8 months ago

Hey! This sounds like a really interesting use case. If you run into any issues or need help with the visualization, please don't hesitate to post an issue on the repo. We can also think about adding an example demo of a manhattan plot to help too!
j_bum 8 months ago

If you’re working in R with ggplot2, you could also consider the `ggrastr` package, specifically, `ggrastr::geom_point_rast`
swalsh 8 months ago

These really large scatterplots are also useful for visualizing claims, and finding fraud.
samstave 8 months ago

Have you tried ManimGL?
https://github.com/3b1b/manim/releases
Super awesome, and you can make it into an MCP for Cursor.

jarpineh 8 months ago

This looks very promising. I'll have to think my visualization cases against new possibilities this enables.

I have been intermittently following Rerun, a "robotics-style data visualization" app [1]. Their architecture bears certain similarities [2]. Wgpu in both, egui and imgui, Rust with Python. Rerun's stack does compile to WASM and works in browser. Use cases seem different, but somewhat the same. I don't do scientific nor robotic stuff at all, so no opinions on feasibility of either...

[1] https://rerun.io [2] https://github.com/rerun-io/rerun/blob/main/ARCHITECTURE.md

dcl 8 months ago

I always thought it was interesting that my modern CPU takes ages to plot 100,000 or so points in R or Python (ggplot2, seaborn, plotnine, etc) and yet somehow my 486DX 50Mhz could pump out all those pixels to play Doom interactively and smoothly.

sieste 8 months ago

This SO thread [1] analyses how much time ggplot spends on various tasks. Not sure if a better GPU integration to produce the visual output would help speed it up significantly.
[1] https://stackoverflow.com/questions/73470828/ggplot2-is-slow...
kkoncevicius 8 months ago
From R side i think this is mainly because ggplot2 is really really slow.
Base R graphics would plot 100,000 points in about 100 milliseconds.
```
    x <- rnorm(100000)
    plot(x)
```
A quick benchmark with writing to a file:
```
    x <- rnorm(100000)
    system.time({
      png("file.png")
      plot(x)
      dev.off()
    })

     user  system elapsed
    0.179   0.002   0.180
```
stackedinserter 8 months ago

Nobody cares about optimization for relatively big datasets like million points, maybe it's not a very popular use case. Even libraries that do able to render these datasets, do that incorrectly e.g. skip peaks, show black rectangles instead of showing internal distribution of noisy data, etc.
I ended up with writing my own tool that's able to show millions of points and never looked back.

zoogeny 8 months ago

> powered by WGPU, a cross-platform graphics API that targets Vulkan (Linux), Metal (Mac), and DX12 (Windows).

The fact that they are using WGPU, which appears to be a Python native implementation of WebGPU, suggests there is an interesting possible extended case. As a few other comments suggest, if one knows that the data is available on a machine in a cluster rather than on the local machine of a user, it might make sense to start up a server, expose a port and pass along the data over http to be rendered in a browser. That would make it shareable across the lab. The limit would be the data bandwidth over http (e.g. for the 3 million point case) but it seems like for simpler cases it would be very useful.

That would lead to an interesting exercise of defining a protocol for transferring plot points over http in such a way they could be handed over to a the browser WebGPU interface efficiently. Perhaps even a more efficient representation is possible with some pre-processing on the server side?

kushalkolar 8 months ago

> the data is available on a machine in a cluster rather than on the local machine of a user
jupyter-rfb lets you do remote rendering for this, render to a remote frame buffer and send over a jpeg byte stream. We and a number of our scientific users use it like this. https://fastplotlib.org/ver/dev/user_guide/faq.html#what-fra...
> defining a protocol for transferring plot points
This sounds more like GSP, which Cyrille Rossant (who's made some posts here) works on, it has a slightly different kind of use case.
- zoogeny 8 months ago
  
  What is GSP in this context? Searching Python GSP brings up Generalized Sequence Pattern (GSP) algorithm [1] and Graph Signal Processing [2], neither of which seem to be a protocol. I also found "Generic Signaling Protocol" and "Global Sequence Protocol" which also don't seem relevant. Forgive me if GSP is some well know thing which I am just not familiar with.
  1. https://github.com/jacksonpradolima/gsp-py
  2. https://pygsp.readthedocs.io/en/stable/
  - bglazer 8 months ago
    
    Graphics Server Protocol
    Forgive me for doing this, but I used an LLM to find that. They’re exceptionally useful for disambiguation tasks like this. Knowing what an acronym refers to is very useful for next token prediction, so they’re quite good at it. It’s usually trivial to figure out if they’re hallucinating with a search engine.
    [1] https://news.ycombinator.com/item?id=43335769
  - kushalkolar 8 months ago
    
    I don't think it's ready yet and I think it might be private at the moment, Cyrille can comment more on it.
    But if I understand correctly it's a protocol for serializing graphical objects, pretty neat idea.
mkl 8 months ago

WGPU is a Rust thing more than a Python thing.
- almarklein 8 months ago
  
  To clarify this a bit, wgpu is a Rust implementation of WebGPU, just like Dawn is a C++ implementation of WebGPU (by Google). Both projects expose a C-api following webgpu.h. wgpu-py Should eventually be able to work with both. (Disclaimer: I'm the author of wgpu-py)
- zoogeny 8 months ago
  
  Fair, I was looking at the wgpu-py [1] page but only skimmed it. It does indeed look like a wrapper over wgpu-native [2] which is written in Rust.
  1. https://github.com/pygfx/wgpu-py
  2. https://github.com/gfx-rs/wgpu-native
Swannie 8 months ago

What you describe sounds a bit like Graphistry:
https://pygraphistry.readthedocs.io/en/latest/performance.ht...

Vipitis 8 months ago

I have watched recordings of your recent representation and decided to finally give it a try last week. My goal is to create some interactive network visualizations - like letting you click/box select nodes and edges to highlight subgraphs which sounds possible with the callbacks and selectors.

Haven't had the time to get very far yet, but will gladly contribute an example once I figure something out. Some of the ideas I want to eventually get to is to render shadertoys(interactively?) into a fpl subplot (haven't looked at the code at all, but might be doable), eventually run those interactively in the browser and do the network layout on the GPU with compute shaders (out of scope for fpl).

kushalkolar 8 months ago

Hi! I've seen some of your work on wgpu-py! Definitely let us know if you need help or have ideas, if you're on the main branch we recently merged a PR that allows events to be bidirectional.

crazygringo 8 months ago

Sounds really compelling.

But it doesn't seem to answer how it works in Jupyter notebooks, or if it does at all. Is the GPU acceleration done "client-side" (JavaScript?) or "server-side" (in the kernel?) or is there an option for both?

Because I've used supposedly fast visualization libraries in Google Colab before, but instead of updating at 30 fps, it takes 2 seconds to update after a click, because after the new image is rendered it has to be transmitted via the Jupyter connector and network and that can turn out to be really slow.

ivoflipse 8 months ago

Fastplotlib definitely works in Jupyterlab through jupyter-rfb https://github.com/vispy/jupyter_rfb
I believe the performance is pretty decent, especially if you run the kernel locally
Their docs also cover this as mentioned by @clewis7 below: https://www.fastplotlib.org/ver/dev/user_guide/faq.html#what...
- kushalkolar 8 months ago
  
  Thanks Ivo!
  Just to add on, colab is weird and not performant, this PR outlines our attempts to get jupyter-rfb working on colab: https://github.com/vispy/jupyter_rfb/pull/77
  - crazygringo 8 months ago
    
    Thanks. Yeah I've been baffled as to why just interactive Matplotlib with a Colab kernel is so slow. The Colab CPU is fast (enough), the network is fast, I haven't been able to figure out where the bottleneck is either.
    
    kushalkolar 8 months ago
    
    I just remembered, I think there is something weird with Google's servers or the network because performance was very poor even with a custom Google Cloud instance running jupyterlab, see this: https://github.com/vispy/jupyter_rfb/issues/95#issuecomment-...
  - paddy_m 8 months ago
    
    Is google colab slower than an equivalently powerful kernel running on a remote jupyter kernel? Are you running into network problems, or is it something specific to colab?
    
    kushalkolar 8 months ago
    
    I just commented above, see this: https://github.com/vispy/jupyter_rfb/issues/95#issuecomment-...
- clewis7 8 months ago
  
  Thanks Ivo!

theLiminator 8 months ago

Do you have any numbers for the rough number of datapoints that can be handled? I'm curious if this enables plotting many millions of datapoints in a scatterplot for example.

clewis7 8 months ago

Yes! The number of data points can range in the millions. Quite honestly, the quality of your GPU would be the limiting factor here. I will say, however, that for most use cases, an integrated GPU is sufficient. For reference, we have plotted upwards of 3 million points on a mid-range integrated GPU from 2017.
I will work on adding somewhere in our docs some metrics for this kind of thing (I think it could be helpful for many).
- enriquto 8 months ago
  
  >I will work on adding somewhere in our docs some metrics for this kind of thing (I think it could be helpful for many).
  Certainly! A comparison of performance with specialized tools for large point clouds would be very interesting (like cloudcompare and potree).

wodenokoto 8 months ago

How is it compared to HoloViz?[1]

I followed one of their online workshops, and it feels really powerful, although it is a bit confusing which part of it does what (it's basically 6 or 7 projects put together under an umbrella)

[1] https://holoviz.org/

kushalkolar 8 months ago

Fastplotlib is very different from bokeh and holoviz, and has different use cases.
Bokeh and holoviz send data to a JS front end that draws (to the best of my knowledge), whereas fastplotlib does everything on the python side and uses jupyter_rfb to send a compressed frame buffer when used in jupyter. Fastplotlib also works as a native desktop application in Qt and glfw, which is very different from bokeh/holoviz. Fastplotlib also has higher raw render speed, you can scroll though a 4k video at 60Hz with thousands of extra objects on your desktop which I haven't ever been able to accomplish with bokeh (I haven't tried it in years, not sure if things have changed)
The events system is also very different, we try to keep the API to simple function callbacks in fastplotlib.
At the end of the day use the best tool for your use case :)
almarklein 8 months ago

One big difference is that Fastplotlib is based on GPU tech, so its capable of rendering much larger datasets interactively.
- unnah 8 months ago
  
  How much larger? Holoviz includes the datashader library for GPU-based rendering, and here is an example with 10 million points: https://examples.holoviz.org/gallery/nyc_taxi/nyc_taxi.html
  - almarklein 8 months ago
    
    I don't know Datashader that well, but from what I understand, it generates an image from a set of primitives (e.g. points), and then allows you to interactively inspect that image. It does not re-render the points on every frame like Fastplotlib/Pygfx does.
    Depending on your GPU, you can render say 1-50 million points smoothly. Also see e.g. https://github.com/pygfx/pygfx/discussions/819

juliusbk 8 months ago

This looks super cool! Looking forward to trying it.

I think a killer feature of these gpu-plotting libraries would be if they could take torch/jax cuda arrays directly and not require a (slow) transfer over cpu.

kushalkolar 8 months ago

Thanks! That is a great question and one that I've we've been battling with as well. As far as we know, this is not possible due to the way different contexts are set up on the GPU https://github.com/pygfx/pygfx/issues/510
tinygrad which I haven't used seems torch-like and has a WGPU backend: https://github.com/tinygrad/tinygrad
- juliusbk 8 months ago
  
  Yeah, I remember looking into it myself as well, and not finding any easy path. A shame.... Maybe there's a hard way to do it though :)
  - rossant 8 months ago
    
    I've been looking into this issue with Datoviz [1] following a user request. It turns out there may be a way to achieve it using Vulkan [2] (which Datoviz is based on) and CuPy's UnownedMemory [3]. I wrote a simple proof of concept using only Vulkan and CuPy.
    I'm now working on a way for users to wrap a Datoviz GPU buffer as a CuPy array that directly references the Datoviz-managed GPU memory. This should, in principle, enable efficient GPU-based array operations on GPU data without any transfers.
    [1] https://datoviz.org/
    [2] https://registry.khronos.org/vulkan/specs/latest/man/html/VK...
    [3] https://docs.cupy.dev/en/latest/reference/generated/cupy.cud...
    
    kushalkolar 8 months ago
    
    This looks cools thanks! Makes me wonder if there's any way to do that with WGPU if WGPU is interfacing with Vulkan, probably not easy if possible I"m guessing.
    WGPU has security protections since it's designed for the browser so I'm guessing it's impossible.
    
    rossant 8 months ago
    
    Indeed, it doesn't seem to be possible at the moment, see e.g. https://github.com/gfx-rs/wgpu/issues/4067
    
    paddy_m 8 months ago
    
    Wow. So are you saying that you can have some array on the GPU that you setup with python via CuPy, then you call to the webbrowser and give it the pointer address for that GPU array, and the browser through WASM/WebGPU can access that same array? That sounds like a huge browser security hole.
    
    kushalkolar 8 months ago
    
    Yea the security issue is why I'm pretty sure you can't do it on WGPU, but Vulkan and cupy can fully run locally so it doesn't have the same security concern.
    
    rossant 8 months ago
    
    Exactly, this is the sort of thing you can more easily do on desktop than in a web browser.
    
    rossant 8 months ago
    
    I published my proof-of-concept here: https://gist.github.com/rossant/517806ea551f4038fd412c23c3d6...
PerryStyle 8 months ago

Would it be possible to leverage the python array api standard? Or is that more suited for just computations?
roastedpeacock 8 months ago

Not sure about the memory transfer bottleneck and potential mitigations. But out of interest, how insurmountable would it be to 'retool' fastplotlib to use JAX acceleration instead of wgpu?

7speter 8 months ago

I’m not making neuroscience visualizations. I’m working with rather line graphs and would like to animate based on ~10000 points. I’m looking to convert these visuals to video for youtube, in hd and at 60fps using the HEVC/h.265 codec. I took a quick look at the documentation to see if this is possible and I didn’t see anything. Are or will this sort of rendering be supported?

I previously tried this on matplotlib and it took 20-30 minutes to make a single rendering because matplotlib only uses a single core on a cpu and doesn’t support gpu acceleration. I also tried Man im, but I couldn’t get an actual video file, and opengl seems to be a bit complicated to work with (I went and worked on other things though I should ask around about the video file output). Anyway, I’m excited about the prospect of a gpu accelerated dataviz tool that utilizes Vulkan, and I hope this library can cover my usecase.

kushalkolar 8 months ago

Rendering frames and saving them to disk can be done with rendercanvas but we haven't exposed this in fastplotlib yet: https://github.com/pygfx/rendercanvas/issues/49

paddy_m 8 months ago

Really nice post introducing your library.

When would you reach for a different library instead of fastplotlib?

How does this deal with really large datasets? Are you doing any type of downsampling?

How does this work with pandas? I didn't see it as a requirement in setup.py

Does this work in Jupyter notebooks? What about marimo?

kushalkolar 8 months ago

Thanks!
> When would you reach for a different library instead of fastplotlib?
Use the best tool for your usecase, we're focused on GPU accelerated interactive visualization. Our use cases broadly are developing ML algorithms, user-end ML Ops tools, and looking live data off of live scientific instruments.
> How does this deal with really large datasets? Are you doing any type of downsampling?
Depends on your hardware, see https://fastplotlib.org/ver/dev/user_guide/faq.html#do-i-nee...
> How does this work with pandas? I didn't see it as a requirement in setup.py
If you pass in numpy-like types that use the buffer protocol it should work, we also want to support direct dataframe input in the future: https://github.com/fastplotlib/fastplotlib/issues/395
There are more low-level priorities in the meantime.
> Does this work in Jupyter notebooks? What about marimo?
Jupyter yes via juptyer-rfb, see our repo: https://github.com/fastplotlib/fastplotlib?tab=readme-ov-fil...
- applied_heat 8 months ago
  
  Looking forward to checking out your library, thanks for sharing it with the world.
  I’ve been using kst-plot for live streaming data from instruments and interactive plots. It’s fast and I haven’t found any limit for the amount of data it can plot. Development has basically stopped - the product is done, feature complete, and works perfectly! It is used by European and Canadian space agencies. Maybe it will be interesting to you to see how they have solved or approached some of the same problems you have solved or will also solve !

lagrange77 8 months ago

Nice, i'd be interested to know which method for drawing lines (which is hard [0]) it uses.

[0] https://mattdesl.svbtle.com/drawing-lines-is-hard

kushalkolar 8 months ago

Almar made blog posts about the line shader he wrote!
https://almarklein.org/triangletricks.html
https://almarklein.org/line_rendering.html
A big shader refactor was done in this PR: https://github.com/pygfx/pygfx/pull/628
- lagrange77 8 months ago
  
  Thank you!

carabiner 8 months ago

GPU all the things! GPU-accelerated Tableau would be incredible.

pama 8 months ago

I know 3D is in the roadmap. Once the basic functionality is in place, it would be great to also consider integrating molecular visualization or at least provide enough fast primitives to simplify the integration of molecular visualization tools with this library.

clewis7 8 months ago

We are definitely looking forward to adding more 3D graphics in the future, and this sounds really cool. Would you mind posting an issue on the repo? I think this is something we would want to have on the roadmap or at least an open issue to plan out how we could do this. Thanks!

doright 8 months ago

Sometimes I wish these plotting libraries were more portable beyond Python only. I was looking for something similar for Ruby just a while ago but the install instructions seemed out of date and unsupported on Windows.

noosphr 8 months ago

Any sufficiently advanced plotting library with an api that can be called externally becomes indistinguishable from a GUI toolkit: https://www.gnu.org/software/guile/docs/guile-tut/tutorial.h...
Not sure if that is the right tutorial, but many years ago in the guile 1.x days I wrote a local visualizer for the data from a particle physics accelerator entirely in Guile and Gnuplot. It was very MVC and used guile as the controller and Gnuplot as the viewer.
Was it stupid? Yes. Did it work better than all the other tools I had at the time? Also yes.
kushalkolar 8 months ago

I do not know ruby but sometimes that's an opportunity to try and make one which others will also find useful :)

anthk 8 months ago

Here people using tons of GB and VRAM for tasks I just used awk and gnuplot on really underpowered machines. Such as the guys who parsed GB sized texts files ('big data', you know) containing IPs and Cloudflare hosts (due to LaLiga scandal blocking whole CF IP's because some of them they were user for soccer TV broadcasting piracy) by using CUDA and some bullshit, when the same tasks could be done with mawk in less than an hour.

enriquto 8 months ago

That would be preposterous if it wasn't so hilariously false:

> These days, having a GPU is practically a prerequisite to doing science, and visualization is no exception.

It becomes really funny when they go on to this, as if it was a big deal:

> Depicted below is an example of plotting 3 million points

Anybody who has ever used C or fortran knows that a modern CPU can easily churn through "3 million points" at more than 30 frames per second, using just one thread. It's not a particularly impressive feat, three million points is the size of a mid-resolution picture, and you can zoom-in and out those trivially in real-time using a CPU (and you could do that 20 years ago, as well). Maybe the stated slowness of fastplotlib comes from the unholy mix of rust and python?

Now, besides this rant, I think that fastplotlib is fantastic and, as an (unwilling) user of Python for data science, it's a godsend. It's just that the hype of that website sits wrong in me. All the demos show things that could be done much easier and just as fast when I was a teenager. The big feat, and a really big one at that, is that you can access this sort of performance from python. I love it, in a way, because it makes my life easier now; but it feels like a self-inflicted problem was solved in a very roundabout way.

cycomanic 8 months ago

>> Depicted below is an example of plotting 3 million points
> Anybody who has ever used C or fortran knows that a modern CPU can easily churn through "3 million points" at more than 30 frames per second, using just one thread. It's not a particularly impressive feat, three million points is the size of a mid-resolution picture, and you can zoom-in and out those trivially in real-time using a CPU (and you could do that 20 years ago, as well). Maybe the stated slowness of fastplotlib comes from the unholy mix of rust and python?
That's a misrepresentation though, it's 3 million points in sine waves, e.g. something like 1000 sine waves with e.g. 3000 points in each. If you look at the zoomed in image, the sine waves are spaced significantly, so if you would represent this as an image it would be at least a factor 10 larger. In fact that is likely a significant underestimation, i.e. you need to connect the points inside the sine waves as well.
The comparison case would be to take a vector graphics (e.g. svg) with 1000 sine wave lines and open it in a viewer (written in C or Fortran if you want) and try zooming in and out quickly.
- kushalkolar 8 months ago
  
  Thanks, and the purpose was to show what's possible on modest hardware that most people have. We have created gigabytes of graphics that live on the gpu for more complex use cases and they remain performant, but you need a gaming gpu.
  - enriquto 8 months ago
    
    But why do you want to fit the whole dataset in memory? If the dataset is stored in a tiled and multi-scaled representation you need to only grab the part of it that is needed to fit your screen (which is a constant, small amount of data, even if the dataset is arbitrarily large).
    If you insist to fit the entire thing in memory, it may seem better to do so in the plain RAM, which nowadays is of humongous size even in "modest" systems.
    
    rossant 8 months ago
    
    Maybe it's an instance of Parkinson's law [1]: if it all fits in GPU memory, just put it all in and plot it. This is much simpler to implement than any out-of-memory technique. It's also easier for the user—`scatter(x, y)` would work effortlessly with, say, 10 million points.
    But with 10 billion points, you need to consider more sophisticated approaches.
    [1] https://en.wikipedia.org/wiki/Parkinson%27s_law
stackedinserter 8 months ago

You can't draw a proper plot with 3 million points at 30 fps, unless you cut corners, like not showing distribution of data (showing black rectangle when there's internal structure) or skipping peaks, like many plotting tools do, e.g. Grafana.
- enriquto 8 months ago
  
  Of course you can! The screen of my laptop has nearly 3 million points (2160x1350) and I can do a fair amount of processing on each of its pixels, with one CPU thread, and still be above 30fps. A naive plotting method that loops over all the points and puts them into a grid will work without problem. Try it yourself!
  - kushalkolar 8 months ago
    
    Setting the value of a pixel in an image is very different from drawing objects like lines, this is a good introduction: https://graphicscompendium.com/intro/01-graphics-pipeline
    
    enriquto 8 months ago
    
    Ultimately, objects are always drawn in the screen by setting pixels into it. Plotting a point by setting a pixel is entirely reasonable, and can be indeed done directly, in realtime, for several millions of points. I just tested the C program below, compiled with gcc without optimizations, and it gives about 80 fps for three million points (on my 6 year-old thinkpad). My point: CPUs are ridiculously fast, and you can indeed do a lot of large-scale data visualization without need to meddle with the GPU.
    #define FPS 80 void plot_points( float *o, // output raster array (w*h) int w, // width of raster int h, // height of raster float *x, // input point coordinates (2*n) int n // number of input points ) { // initialize the output raster for (int i = 0; i < 2*w*h; i++) o[i] = 0; // accumulate the points that fall inside the raster for (int i = 0; i < n; i++) { int p = x[2*i+0]; int q = x[2*i+1]; if (p >= 0 && p < w && q >= 0 && q < h) o[w*q+p] += 1; } } #include <stdlib.h> int main(void) { int w = 1000; int h = 1000; int n = 3000000; float *x = malloc(2*n*sizeof*x); float *o = malloc(w*h*sizeof*o); for (int i = 0; i < 2*n; i++) x[i] = 1000*(rand()/(1.0+RAND_MAX)); for (int i = 0; i < FPS ; i++) plot_points(o, w, h, x, n); return 0; } // NOTE: if this program runs in less than 1 second, it means that it // is faster than "FPS"
    
    stackedinserter 7 months ago
    
    You're plotting individual points here, not a proper data graph. Even if you need a cloud of points, it's not enough, since you need to have different sizes and types of points that may have different sizes based on another data column, and definitely need to be drawn with antialiasing, even if they're simple squares.
    Then, to draw something like this imgur.com/a/mXvEBzl (ADS-B data, ~10 million points iirc), you need to connect points with (antialiased) lines, where individual pixel should be blended into plot with respect of line opacity. Also, lines can be of different thickness, so it multiplies your `o[w x p+p] += 1` again.
    I'm not even talking about multiple layers that are quite standard.
    I use my own plotting app, it takes a lot more than just slap a bunch of points into "float *o". Try to write your own, you will figure it out pretty quickly, unless you're ok with black blobs that resemble the input data.
    
    kushalkolar 8 months ago
    
    OK now try to do this in 3D with arbitrary projections and interactivity! And guess what, you'd create a rendering engine :)
    My earlier reply has a link to how GPUs actually push pixels to the screen.
    There are also some excellent blog posts on how line rendering is done:
    https://almarklein.org/triangletricks.html
    https://almarklein.org/line_rendering.html

klaussilveira 8 months ago

Very cool to see imgui empowering so many different things.

fpl-dev 8 months ago

We love imgui! Big thanks to the imgui devs, and Pascal Thomet who maintains the python bindings for imgui-bundle, and https://github.com/panxinmiao who made an Imgui Renderer for wgpu-py!
- rossant 8 months ago
  
  Imgui is awesome! Thanks for mentioning imgui-bundle—I hadn’t heard of it before, but it looks great! [1]
  [1] https://github.com/pthom/imgui_bundle

vibranium 8 months ago

I’m often working with a windows desktop and a remote Linux box on which I have my data & code. I’d like to plot “locally” on my desktop workstation from the remote host. This usually either means using X11 (slow) or some sort of web-based library like plotly. Does fastplotlib offer any easy solution here?

kushalkolar 8 months ago

This is exactly why we use jupyter-rfb, I often have large datasets on a remote cluster computer and we perform remote rendering.
see: https://fastplotlib.org/ver/dev/user_guide/faq.html#what-fra...
- aplzr 8 months ago
  
  I'm in the same boat as the person you replied to, but have zero experience with remote plotting other that doing static plots in in a remote session in the interactive window provided by VS Code's python extension. Would this also work there, or would I have to start using jupyter notebooks?
  - kushalkolar 8 months ago
    
    non-jupyter notebook implementations have their quirks, eventually we hope to make a more universal jupyter-rfb kind of library, perhaps using anywidget. Anywidget is awesome: https://github.com/manzt/anywidget
    People have used fastplotlib and jupyter-rfb in vscode, but it can be troublesome and we don't currently have the resources to figure out exactly why.
    
    aplzr 8 months ago
    
    Alright, thanks. I don't particularly like notebook, but this might a reason to give it another go.
dan-robertson 8 months ago

I’ve found X11 to be fine, but:
- defaults are often bad. In R there is a way to turn on double-buffering in Cairo to make things fast
- eventually so went for R-inside-orgmode where graphics are written to pngs (fast) and then displayed inside Emacs (fast over X forwarding so long as you aren’t trying to smooth-scroll with an image half-visible in the current window).

roter 8 months ago

Very interesting and promising package.

I especially like that there is a PyQt interface which might provide an alternative to another great package: pyqtgraph[0].

[0] https://github.com/pyqtgraph/pyqtgraph

kushalkolar 8 months ago

Thanks! I used pyqtgraph for many years and love what can be done by it, we started off wanting to build something like it but based on WGPU and not bound to Qt.
clewis7 8 months ago

Thank you for your interest! We have taken a lot of inspiration from pyqtgraph and really like their library.
unit149 8 months ago

[dead]

abdullahkhalids 8 months ago

Is it possible to put the interactive plots on your website? Or is this a Jupyter notebook only tool.

clewis7 8 months ago

See here: https://www.fastplotlib.org/ver/dev/user_guide/faq.html#what...
We are hoping for pyodide integration soon, which would allow fastplotlib to be run strictly in the browser!
- abdullahkhalids 8 months ago
  
  Thanks. That will be very cool.
fpl-dev 8 months ago

In the browser only jupyter for now, you can use voila to make a server based application using jupyter: https://github.com/voila-dashboards/voila
As Caitlin pointed out below pyodide is a future goal.
- abdullahkhalids 8 months ago
  
  This is very nice. But thinking more along the lines of, can I embed a single interactive widget in a blog post.
  - ivoflipse 8 months ago
    
    Not today, it requires wgpu-py to support running on WASM / pyodide, which it doesn't yet (unfortunately)

trostaft 8 months ago

Looks very interesting for interactive visualization. I like the animation interface. Also love imgui, glad to see it here. I wish I had better plotting tools for publication quality images (though, honestly I'm pretty happy with matplotlib).

kushalkolar 8 months ago

Thanks! Yup our focus is not publication figures, matplotlib and seaborn cover that space pretty well.

meisel 8 months ago

One of the big bottlenecks of plotting libraries is simply the time it takes to import the library. I’ve seen matplotlib being slow to import, and in Julia they even have a “time to first plot” metric. I’d be curious to see how this library compares.

clewis7 8 months ago

I think one nice thing that we have tried to do is limit super heavy dependencies and also separate optional dependencies to streamline things.
The quickest install would be `pip install fastplotlib`. This would be if you were interested in just having the barebones (no imgui or notebook) for desktop viz using something like glfw.
We can think about adding in our docs some kind of import time metrics.
kushalkolar 8 months ago

Almar did some work on speeding up imports a year ago: https://github.com/fastplotlib/fastplotlib/pull/431
but we haven't benchmarked it yet
almarklein 8 months ago

I have a feeling there's room for improvement for importing Pygfx as well. I think we should indeed strive that simple plots load super-quick.

insane_dreamer 8 months ago

Looks great--and meets a significant need. I see you have meshes on the roadmap--very much looking forward to testing that for real-time cortex mapping once available. Kudos.

neomantra 8 months ago

I'm exploring something akin to this, but focusing on 3D views via mesh shading, also powered by DuckDB.
Can you describe your cortex mapping data sources (volumetric? approx number of samples?) and is there any open data to grab? What kinds of visualizations/manipulations you would want or is there an existing product to compare to? Thanks :)
EDIT: Confused WGPU with WebGPU, so deleted a sentence.
kushalkolar 8 months ago

In the meantime you can use the rendering engine pygfx to create them directly :)

gooboo 8 months ago

Yeah, many browsers have webgpu turned off by default, So you're stuck with wasm (wasm Simd if you're lucky)

Hopefully both are implemented.

ivoflipse 8 months ago

This library builds upon pygfx and wgpu-py. Unfortunately, the latter doesn't support running on WASM, pyscript or pyodide yet, but there's an issue about it:
https://github.com/pygfx/wgpu-py/issues/407
PRs welcome though :-)
almarklein 8 months ago

That's because WebGPU is still experimental. This will change, as it's set to replace WebGL.
Fastplotlib / pygfx are primarily meant to run on desktop. When using it via the notebook the server does the rendering.
As Ivo said, we have plans to support running in the browser via Pyodide, which opens some interesting things, but is not the primary purpose.

_aleph2c_ 8 months ago

https://archive.md/G3wj6

asterix_pano 8 months ago

Looks very interesting. Does it allow to plot lines of varying thickness?

vegabook 8 months ago

Syntax looks matplotlib-ish so we’re going back to 2003

jampekka 8 months ago

I'd prefer even more matplotlib-ish. Don't fix what's not broken.
A major reason why other plotting libraries don't take of is use of complicated APIs. But data analysis doesn't need Application Programming Interfaces, it needs User Interfaces.
- menaerus 8 months ago
  
  > of complicated APIs
  of which the matplotlib is the embodiment. Terrible API with terrible terrible performance.
  - jampekka 8 months ago
    
    Not sure what you mean by complicated API. The (pylab) API is a very straightforward (mostly) immediate rendering typeish interface, with a lot of convenient shortcuts for operations used a lot in data analysis.
    For architecture astronauts there's also the OOP API over which the pylab API is a wrapper.
    Of course there are also a lot of all sorts of declarative APIs, which are popular with people copy-pasting code from cookbooks. These become very painful very fast if you do something that's not in the cookbook.
    Matplotlib does struggle with performance in some/many cases, but it has little to do with the API.
    
    menaerus 8 months ago
    
    Just my personal experience from using the library for at least 7-8 years. So many things and concepts are glued onto each other, making the API so much non-intuitive whenever you try to do anything more sophisticated that isn't a 1:1 match from examples found in the cookbook. It's really a PITA and performance, I have to say this again, is really really bad. If this had been part of my daily job I would certainly try to switch to something else.
    
    jampekka 8 months ago
    
    Performance for animation and (custom) interaction is a real problem. But as for performance being "really really bad", there are not many widely used plotting libraries faster than it, at least for static plotting and zoom/pan interaction.
    There are indeed many ad-hoc functions, typically for commonly used cases, and they tend to cover vast majority of common use case with very simple and concise code. If you want something more custom, the underlying artist API is very flexible. But you probably know this based on the 7-8 years?
    Things like subplot layouts, data point annotation and legend tweaking can be really painful. Something like a proper box/model CSS layouting would be great.
  - rossant 8 months ago
    
    I feel like everyone has different expectations for a scientific plotting API. The tension between ease of use and expressivity is so strong that a one-size-fits-all solution is unlikely ever to exist.
  - disgruntledphd2 8 months ago
    
    Yeah, I'm not sure why anyone likes matplotlib, but then I guess I liked base-R which is even more niche, so :shrug:.
- fransje26 8 months ago
  
  > Don't fix what's not broken.
  I would argue that the Matplotlib syntax is horribly broken (or rather, the Matlab syntax it historically tried to emulate, and had to stick with for better or worse..)
  - jampekka 8 months ago
    
    What are your issues with the matplotlib API more specifically?
    
    fransje26 8 months ago
    
    Complaining about the inconsistencies of the matplotlib interface is pretty much beating a dead horse by now, and has been done repeatedly and in detail by others.
    The problems start as soon as you try doing something more than plt.plot(), and you get your first encounter with the maddening interface differences between a single figure plot and a multi-figure plot. And then it spirals out of control from there.
    There is no denying that a lot of effort was put in the library over the years, with lots of documentation and examples, but my general experience over the 15 years I've been using it is that as soon as you try doing something slightly different than the defaults, it invariably turns out to be a time-consuming, frustrating endeavour, with no guarantees that you'll get the result you want.
    
    jampekka 8 months ago
    
    Sure there are some inconsistencies and legacy, but I wouldn't call that "horribly broken".
    You're probably referring to plots with subplots. Those indeed have issues, although mostly not because of the API. This has somewhat improved with the constrained layout, within the old API. There's also now GridSpec for more control. And for EDA those don't really matter much. There are some annoying differences when calling Axis methods vs the global functions (e.g. xlim vs set_xlim).
    Tweaking plots exactly as you want can get tricky, although for that the artist API can get you more or less anything you want. Care to share what's the library that gives you guaranteed results in no time and with no frustrations?
    
    fransje26 8 months ago
    
    > You're probably referring to plots with subplots. Those indeed have issues, [..] There are some annoying differences when calling Axis methods vs the global functions [..]
    When plotting is the basis of what a library does, and there are annoying differences encountered at such a very basic usage level, then it is not completely unreasonable to express some grievance about the syntax imposed to the user.. It is a frustrating user-experience to start encountering issues at such a fundamental level.
    Tweaking plots, axis and layouts is tricky. Animating a plot with a bit of control is non-trivial, although I am prepared to concede that the two are different beasts.
    My most recent annoyance was for something that seemed superficially easy: duplicating a left axis to a right axis, with a different label text, keeping the "original" grid and limits. Think of degree Kelvin on the left, and the equivalent in Celsius as a right axis. After more than 30 minutes of trying, I simply gave up as it was way beyond the amount of time I could justify spending on a single plot.
kushalkolar 8 months ago

Take a look at the examples gallery, it differs greatly from matplotlib. But I guess that will be subjective

MortyWaves 8 months ago

Another tool that requires precise control over memory layout, bandwidth, performance… using Python.

rossant 8 months ago

... using Python... itself leveraging NumPy, C, the GPU...

asangha 8 months ago

>sine_wave.colors[::3] = "red"

I never knew I needed this until now

kushalkolar 8 months ago

We offer a lot of ways to slice colors, set cmaps and cmap transforms, they are really useful in neuroscience:
https://fastplotlib.org/ver/dev/_gallery/line/line_colorslic...
https://fastplotlib.org/ver/dev/_gallery/line/line_cmap_more...
https://fastplotlib.org/ver/dev/_gallery/line/line_cmap.html...
And with collections if you want to go crazy: https://fastplotlib.org/ver/dev/_gallery/line_collection/lin...

sfpotter 8 months ago

Very cool effort. That said, and it's probably because of the kind of work that I do, but I have almost never found the four challenges to be any kind of a problem for me. Although I do think there is some kind of contradiction there. Plotting (exploratory data analyis ("EDA"), really) is all about distilling key insights and finding features hidden in data. But you have to some kind of intuition about where the needle in the haystack is. IME, throwing up a ton of plots and being able to scrub around in them never seems to provide much insight. It's also very fast---usually the feedback loop is like "make a plot, go away and think about it for an hour, decide what plot I need to make next, repeat". If there is too much data on the screen it defeats the point of EDA a little bit.

For me, matplotlib still reigns supreme. Rather than a fancy new visualization framework, I'd love for matplotlib to just be improved (admittedly, fastplotlib covers a different set of needs than what matplotlib does... but the author named it what they named it, so they have invited comparison. ;-) ).

Two things for me at least that would go a long way:

1) Better 3D plotting. It sucks, it's slow, it's basically unusable, although I do like how it looks most of the time. I mainly use PyVista now but it sure would be nice to have the power of a PyVista in a matplotlib subplot with a style consistent with the rest of matplotlib.

2) Some kind of WYSIWYG editor that will let you propagate changes back into your plot easily. It's faster and easier to adjust your plot layout visually rather than in code. I'd love to be able to make a plot, open up a WYSIWYG editor, lay things out a bit, and have those changes propagate back to code so that I can save it for all time.

(If these features already exist I'll be ecstatic ;-) )

kkoncevicius 8 months ago

I have to agree with your point about EDA. The library is neat, but even the example of covariance matrix animation is a bit contrived.
Every pixel has a covariance with every other pixel, so sliding though the rows of the covariance matrix generates as many faces on the right as there are pixels in a photograph of a face. However the pixels that strongly co-vary will produce very similar right side "face" pictures. To get a sense of how many different behaviours there are one would look for eigenvectors of this covariance matrix. And then 10 or so static eigenvectors of the covariance matrix (eigenfaces [1]) would be much more informative than thousands of animated faces displayed in the example.
Some times a big interactive visualisation can be a sign of not having a concrete goal or not knowing how to properly summarise. After all that's the purpose of a figure - to highlight insights, not to look for ways to display the entire dataset. And pictures that try to display the whole dataset end up shifting the job of exploratory analysis to a visual space and leave it for somebody else.
Thou of course there are exceptions.
[1]: https://en.wikipedia.org/wiki/Eigenface
- fpl-dev 8 months ago
  
  Hi, one of the other devs here. As the poster below pointed out what you're missing is that in this case we know that an eigendecomposition or PCA will be useful. However if you're working on matrix decomposition algorithms like us, or if you're trying to design new forms of summary matrices because a covariance matrix isn't informative for your type of data then these types of visualizations are useful. We broadly work on designing new forms of matrix decomposition algorithms so it's very useful to look at the matrices and then try to determine what types of decompositions we want to do.
  - sfpotter 8 months ago
    
    I've also worked on designing new matrix decompositions, and I've never found the need for anything but `imshow`...
    
    fpl-dev 8 months ago
    
    ok, different libraries have different use cases, the type of data we work with absolutely necessitates dynamic visualization. You wouldn't view a video with imshow would you?
    
    sfpotter 8 months ago
    
    Every time I've needed to scrub through something in time like that, dumping a ton of frames to disk using imshow has been good enough. Usually, the limiting factor is how quickly I can generate a single frame.
    It's hard for me to imagine what you're doing that necessitates such fancy tools, but I'm definitely interested to learn! My failure of imagination is just that.
    
    fpl-dev 8 months ago
    
    The example from the article with the subtitle "Large-scale calcium imaging dataset with corresponding behavior and down-stream analysis" is a good example. We have brain imaging video that is acquired simultaneously with behavioral video data. It is absolutely essential to view the raw video at 30-60Hz.
- wtallis 8 months ago
  
  Aren't you missing the entire point of exploratory data analysis? Eigenfaces are an example of what you can come up with as the end product of your data exploration, after you've tried many ways of looking at the data and determined that eigenfaces are useful.
  Your whole third paragraph seems to be criticizing the core purpose of exploratory data analysis as though one should always be able to skip directly to the next phase of having a standardized representation. When entering a new problem domain, somebody needs to actually look at the data in a somewhat raw form. Using the strengths of the human vision system to get a rough idea of what the typical data looks like and the frequency and character of outliers isn't dumping the job of exploratory data analysis onto the reader, it's how the job actually gets done in the first place.
  - fpl-dev 8 months ago
    
    > Using the strengths of the human vision system to get a rough idea of what the typical data looks like and the frequency and character of outliers isn't dumping the job of exploratory data analysis onto the reader, it's how the job actually gets done in the first place.
    Yup this is a good summary of the intent, we also have to remember that the eigenfaces dataset is a very clean/toy data example. Real datasets never look this good, and just going straight to an eigendecomp or PCA isn't informative without first taking a look at things. Often you may want to do something other than an eigendecomp or PCA, get an idea of your data first and then think about what to do to it.
    Edit: the point of that example was to show that visually we can judge what the covariance matrix is producing in the "image space". Sometimes a covariance matrix isn't even the right type of statistic to compute from your data and interactively looking at your data in different ways can help.
  - kkoncevicius 8 months ago
    
    As a whole, of course you have a point - big visualisations when done properly should help with data exploration. However, from my experience they rarely (but not never) do. I think it's specific to the type of data you work with and the visualisation you employ. Let me give an example.
    Imagine we have some big data - like an OMIC dataset about chromatin modification differences between smokers and non-smokers. Genomes are large so one way to visualise might be to do a manhattan plot (mentioned here in another comment). Let's (hypothetically) say the pattern in the data is that chromatin in the vicinity of genes related to membrane functioning have more open chromatin marks in smokers compared to non smokers. A manhattan plot will not tell us that. And in order to be able to detect that in our visualisation we had to already know what we were looking for in the first place.
    My point in this example is the following: in order to detect that we would have to know what to visualise first (i.e. visualise the genes related to membrane function separately from the rest). But then when we are looking for these kinds of associations - the visualisation becomes unnecessary. We can capture the comparison of interest with a single number (i.e. average difference between smokers vs non-smokers within this group of genes). And then we can test all kinds of associations by running a script with a for-loop in order to check all possible groups of genes we care about and return a number for each. It's much faster than visualisation. And then after this type of EDA is done, the picture would be produced as a result, displaying the effect and highlighting the insights.
    I understand your point about visualisation being an indistinguishable part of EDA. But the example I provided above is much closer to my lived experience.
    
    sfpotter 8 months ago
    
    Yeah, I agree with the general sentiment of what you're saying.
    Re: wtallis, I think my original complaint about EDA per se is indeed off the mark.
    Certainly creating a 20x20 grid of live-updating GPU plots and visualizations is a form of EDA, but it seems to suggest a complete lack of intuition about the problem you're solving. Like you're just going spelunking in a data set to see what you can find... and that's all you've got; no hypothesis, no nothing. I think if you're able to form even the meagerest of hypotheses, you should be able to eliminate most of these visualizations and focus on something much, much simpler.
    I guess this tool purports to eliminate some of this, but there is also a degree of time-wasting involved in setting up all these visualizations. If you do more thinking up front, you can zero in on a smaller and more targeted subset of experiments. Simpler EDA tools may suffice. If you can prove your point with a single line or scatter plot (or number?), that's really the best case scenario.
  - macleginn 8 months ago
    
    Eigendecomposition of the covariance matrix, essentially PCA, is probably the first non-trivial step in the analysis of any dataset. The idea in the comment above seems to be that it's more useful to combine some basic knowledge of statistics with simpler visualisation techniques, rather than to quickly generate thousands of shallower plots. Being able to generate thousands of plot is useful, of course, but I would agree that promoting good data-analysis culture is more beneficial.
    
    wtallis 8 months ago
    
    > Eigendecomposition of the covariance matrix, essentially PCA, is probably the first non-trivial step in the analysis of any dataset
    For a sufficiently narrow definition of "dataset", perhaps. I don't think it's the obvious step one when you want to start understanding a time series dataset, for example. (Fourier transform would be a more likely step two, after step one of actually look at some of your data.)
    
    mturmon 8 months ago
    
    I agree, but: the technique of “singular spectrum analysis” is pretty much PCA applied to a covariance matrix resulting from time-lagging the original time series. (https://en.wikipedia.org/wiki/Singular_spectrum_analysis)
    So this is not unheard of for time series analysis.
    
    fpl-dev 8 months ago
    
    Exactly that's a good example!
hatthew 8 months ago

For me, one of the most annoying things in my workflow is when I'm waiting for the software to catch up. If I'm making a plot, there's a lot of little tweaks I want to do to visually extract the maximum amount of information from a dataset. For example, if I'm making a histogram, I may want to adjust the number of bins, change to log scale, set min/max to remove outliers, and change the plot size on page. For the sake of the argument, let's say I'm working with a set of 8 slices of the dataset, so I need to regenerate 8 plots every time I make a tweak. My workflow is: Code the initial plots with default settings, run numpy to process the data, run matplotlib to display the data, look at the results, make tweaks to the code, circle back to step 2. In that cycle, "wait for matplotlib to finish generating the plots" can often be one of the longest parts of the cycle, and critically it's the vast majority of the cumulative time that I'm waiting rather than actively doing something. Drawing plots should be near instantaneous; there's an entire industry devoted to drawing complicated graphics in 16ms or less, I shouldn't need to wait >100ms for a single 2d grid with some dots and lines on it.
Matplotlib is okay, but there's definitely room for improvement, so why not go for that improvement?
- sfpotter 8 months ago
  
  I think this varies a lot depending on what you're doing.
  I agree 100% that matplotlib is really slow and should be made to run as fast as humanly possible. I would add a (3) to my list above: optimize matplotlib!
  OTOH, at least for what I'm doing, the code that runs to generate the data that gets plotted dominates the runtime 99% of the time.
  For me, adjusting plots is usually the time waster. Hence point (2) above. I'd love to be able to make the tweaks using a WYSIWYG editor and have my plotting script dynamically updated. The bins, the log scale, the font, the dpi, etc, etc.
  I think with your 8 slices examples above: my (2) and (3) would cover your bases. In your view, is the rest of matplotlib really so bad that it needs to be burnt to the ground for progress to be made?
  - hatthew 8 months ago
    
    Yeah, I'd love it if mpl could be optimized. I do think that it has a lot of weird design decisions that could justify burning it down and starting from scratch (e.g. weird mix of stateful and stateless api), but I've already learned most of its common quirks so I selfishly don't care anymore, and my only significant complaint is that I want it to be faster :)
    edit: regarding runtime, I'm sure this varies a lot based on usecase, but for my usual usecase I store a mostly-processed dataset, so the additional processing before drawing the data is usually minimal.
paddy_m 8 months ago

I'd be curious to hear more about your EDA workflow.
What I want for EDA is a tool that let's me quickly toggle between common views of the dataset. I run through the same analysis over and over again, I don't want to type the same commands repeatedly. I have my own heuristics for which views I want, and I want a platform that lets me write functions that express those heuristics. I want to build the inteligence into the tool instead of having to remember a bunch of commands to type on each dataframe.
For manipulating the plot, I want a low-code UI that lets me point and click the operations I want to use to transform the dataframe. The lowcode UI should also emit python code to do the same operations (so you aren't tied to a low-code system, you just use it as a faster way to generate code then typing).
I have built the start of this for my open source datatable UX called Buckaroo. But it's for tables, not for plotting. The approach could be adapted to plotting. Happy to collaborate.
- jampekka 8 months ago
  
  At least I usually do prefer to do the EDA plotting by writing and editing code. This is a lot more flexible. It's relatively rare to need other interactivity than zooming and panning.
  The differing approaches probably can be seen in some API choices, although the fastplotlib API is a lot more ergonomic than many others. Having to index the figure or prefixing plots with add_ are minor things, and probably preferable for application development, but for fast-iteration EDA they will start to irritate fast. The "mlab" API of matplotlib violates all sorts of software development principles, but it's very convenient for exploratory use.
  Matplotlib's performance, especially with interaction and animation, and clunky interaction APIs are definite pain points, and a faster and better interaction supporting library for EDA would be very welcome. Something like a mlab-type wrapper would probably be easy to implement for fastplotlib.
  And to bikeshed a bit, I don't love the default black background. It's against usual conventions, difficult for publication and a bit harder to read when used to white.
  - paddy_m 8 months ago
    
    Writing and editting code is a lot more flexible, but it gets repetitive, and I have written the same stuff so many times. It's all adhoc, and it fixes the problem at the time, then it gets thrown away with the notebook only to be written again soon.
    As an example, I frequently want to run analytics on a dataframe. More complex summary stats. So you write a couple of functions, and have two for loops, iterating over columns and functions. This works for a bit. It's easy to add functions to the list. Then a function throws an error, and you're trying to figure out where you are in two nested for loops.
    Or, especially for pandas, you want to separate functions to depend on the same expensive pre-calc. You could pass the existing dict of computed measures so you can reuse that expensive calculation... Now you have to worry about the ordering of functions.
    So you could put all of your measures into one big function, but that isn't reusable. So you write your big function over and over.
    I built a small dag library that handles this, and lets you specify that your analysis requires keys and provides keys, then the DAG of functions is ordered for you.
    How do other people approach these issues?
    
    kkoncevicius 8 months ago
    
    I work with R and not python, so some things might not apply, but this:
    > [...] it fixes the problem at the time, then it gets thrown away with the notebook only to be written again soon.
    Is one of the reasons I stopped using notebooks.
    One solution to your problem might be to create a simple executable script that, when called on the file of your dataset in a shell, would produce the visualisation you need. If it's an interactive visualisation then I would create a library or otherwise a re-usable piece of code that can be sourced. It takes some time but ends up saving more time in the end.
    If you have custom-made things you have to check on your data tables, then likely no library will solve your problem without you doing some additional the work on top.
    And for these:
    > Or, especially for pandas, you want to separate functions to depend on the same expensive pre-calc. [...] Now you have to worry about the ordering of functions.
    I save expensive outputs to intermediate files, and manage dependencies with a very simple build-system called redo [1][2].
    [1]: http://www.goredo.cypherpunks.su
    [2]: http://karolis.koncevicius.lt/posts/using_redo_to_manage_r_d...
    
    paddy_m 8 months ago
    
    Thanks. I see how redo works.
    For larger datasets, real scripts are a better idea. I expect my stuff to work with datasets up to about 1Gb, caching is easy to layer on and would speed up work for larger datsets, but my code assumes the data fits in memory. It would be easier to add caching, the make sure I don't load an entire dataset into memory. (I don't serialize the entire dataframe to the browser though).
    
    jampekka 8 months ago
    
    Usually I write scripts that use function memoization cache (to disk) for expensive operations. Recently I've also used Marimo sometimes, which has great support for modules (no reloading hacks), can memoize to disk and has deterministic state.
simply_anyone 8 months ago

I agree with you sfpotter, very interesting. Looks in some ways similar to PyQtGraph regarding real time plotting.
I agree with you regarding matplotlib, although I find a lot of faults/frustration in using it. Both your points on 3D plotting and WYSIWYG editor would be extremely nice and as far as I know nothing exists in python ticking these marks. For 3D I typically default to Matlab as I've found it to be the most responsive/easy to use. I've not found anything directly like a WYSIWYG editor. Stata is the closest but I deplore it, R to some extent has it but if I'm generating multiple plots it doesn't always work out.
I'm surprised by what you said about "EDA". I find the opposite, a shotgun approach, exploring a vast number of plots with various stratifications gives me better insight. I've explored plotting across multiple languages (R,python,julia,stata) and not found one that meets all my needs.
The biggest issue I often face is I have 1000 plots I want to generate that are all from separate data groups and could all be plotted in parallel but most plotting libraries have holds/issues with distribution/parallelization. The closest I've found is I'll often build up a plot in python using a Jupyter notebook. Once I'm done I'll create a function taking all the needed data/saving a plot out, then either manually or with the help of LLMs convert it to julia which I've found to be much faster in loading large amounts of data and processing it. Then I can loop it using julia's "distributed" package. Its less then ideal, threaded access would be great, rather then having to distribute the data, but I've yet to find something that works. I'd love a simple 2D EDA plotting library that has basic plots like lines, histograms (1/2d), scatter plots, etc, has basic colorings and alpha values and is able to handle large amounts (thousands to millions of points) of static data and plot it saving to disk parallelized. I've debated writing my own library but I have other priorities currently, maybe once I finish my PhD.
- selimthegrim 8 months ago
  
  Interested to hear what your PhD is in.
tomjakubowski 8 months ago

For point (2), have you tried the perspective-viewer library? You can make edits in the UI and then use the "debug view" to copy and paste the new configuration back into your code.
https://perspective.finos.org/
benbojangles 8 months ago

I agree on the refinement of matplotlib, we all need it to be better at resource handling, lower memory use, it often get boggy quickly.
mhh__ 8 months ago

My hot take is that 3D plotting feels bad because 3D plots are bad. You can usually find some alternative way of representing the data
- sfpotter 8 months ago
  
  I work on solving 3D problems: numerical methods for PDEs in R^3, computational geometry, computational mechanics, graphics, etc. Being able to make nice 3D plots is super important for this. I agree it's not always necessary, and when a 2D plot suffices, that's the way to go, but that doesn't obviate my need for 3D plots.
- bee_rider 8 months ago
  
  3D plots might be neat if there was some widespread way of displaying them. Unfortunately we can only make 2D projections of 3D plots on our computer screens and pieces of paper.
  Maybe VR will change that at some point. :shrug:
- jampekka 8 months ago
  
  This is the correct take. There are almost always better ways to plot three dimensional data than trying to project 3D geometry to 2D.

rossant 8 months ago

Shameless plug: I'm actively working on a similar project, Datoviz [1], a C/C++ library with thin Python bindings (ctypes). It supports both 2D and 3D but is currently less mature and feature-complete than fastplotlib. It is also lower level (high-level capabilities will soon be provided by VisPy 2.0 which will be built on top of Datoviz, among other possible backends).

My focus is primarily on raw performance, visual quality, and scalability for large datasets—millions, tens of millions of points, or even more.

[1] https://datoviz.org/

Spoilage4218 8 months ago

I have always admired your datoviz library from afar and check the vispy2/vispy2-sandbox libraries on GitHub every few months to check up on it. When do you think 'soon' is?? Really looking forward to it!
- rossant 8 months ago
  
  Thanks! The code is currently managed by Nicolas Rougier in a GitHub repository that will be made public next week. This repository hosts the "graphics server protocol" (GSP), an intermediate layer between Datoviz and the future high-level plotting API. For the latter, we’ll need community feedback to shape an API philosophy that aligns with VisPy users' needs—let's aim to publish a write-up this month.
  Implementing the API on top of GSP should be relatively straightforward, as the core graphics-related mechanisms are handled by GSP/Datoviz. We've created a Slack channel for discussions—contact me privately if you'd like to join.
  - rossant 8 months ago
    
    I wrote a quick draft outlining the vision here. [1]
    [1] https://github.com/vispy/vispy2/blob/main/ARCHITECTURE.md
cycomanic 8 months ago

Cool to see you on here Cyrille, I've been following your work (and Nicolas's) for a long time. Thanks for all the cool stuff you've been doing!
- rossant 8 months ago
  
  Thank you!

749402826 8 months ago

"Fast" is a bold claim, given the complete lack of benchmarks and the fact that it's written entirely in Python...

paddy_m 8 months ago

I'm certain the host heavy lifting is done by numpy which is a python wrapper around Fortran and C. The visualization heavy lifting is done by pygfx/wgpu-py. wgpu-py has C. I think wgpu-py compiles to WASM to run in the browser. More and more packages are taking this route.
[1] https://github.com/pygfx/pygfx [2] https://github.com/pygfx/wgpu-py
- almarklein 8 months ago
  
  All true, except the bit that wgpu-py compiles to WASM. It's all desktop.
  In the plans that we do have for running the browser, Fastplotlib, Pygfx and wgpy-py will still be Python, running on CPython that is compiled to WASM (via Pyodide). But instead of wgpu-py cffi-ing into a C library, it would make JS calls to the WebGPU API.
kushalkolar 8 months ago

In fastplotlib at the end of the day everything is wgpu under the hood, and as the other poster correctly pointed out about numpy being fortran and C wrappers.

ZeroCool2u 8 months ago

Seems like a nice library, but I have a hard time seeing myself using it over plotly. The plotly express API is just so simple and easy. For example, here's the docs for the histogram plot: https://plotly.com/python/histograms/

This code gives you a fully interactive, and performant, histogram plot:

```python

import plotly.express as px df = px.data.tips() fig = px.histogram(df, x="total_bill") fig.show()

```

kushalkolar 8 months ago

Different use cases :) Plotly doesn't give the performance and interactive tools required for many neuroscience visualizations. We also focus more on the primitive graphics and, at least not yet, on the more complex "composite" graphics built with primitives like histograms.

Starlord2048 8 months ago

[flagged]

dang 8 months ago

Please stop.

qoez 8 months ago

[flagged]

dang 8 months ago

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
https://news.ycombinator.com/newsguidelines.html
- qoez 8 months ago
  
  I appreciate the warning and if it's not by claude I apologize, but I do think we should be allowed to express scepticism if things posted are just AI slop (and if we have to fear getting banned or what-have-you as a consequence I genuinely think that's worse for HN long term than the alternative).
  - pvg 8 months ago
    
    If the skepticism is based on nothing but vibes such commentary is functionally equivalent to something the site guidelines ask you to avoid as it is.
  - dang 8 months ago
    
    Don't worry, we wouldn't ban anyone (just) for this. I agree with you that it's a grey area and will take time to work out.
kushalkolar 8 months ago

I dunno why you'd say this, neither of us are fans of LLMs and most of this was written before LLMs were a thing :)
- janalsncm 8 months ago
  
  Maybe Claude was trained on your code. You should take it as a compliment.
bdangubic 8 months ago

asked claude, said it didn’t do it :)