Loading and processing data from particle simulations in Aspherix and LIGGGHTS

Introduction

When performing bulk solids simulations or simulations of fluid-solid systems using methods like DEM or CFD-DEM, data needs to be extracted from simulation files for further processing. LIGGGHTS and Aspherix use versions of the VTK file format for use with ParaView for visualization.

In this document, I want to share a few snippets for loading simulation data that I developed during my PhD thesis, and show how to calculate simple quantities from this data in Python.

Setting up a Miniconda environment for pyvista

I can recommend using the PyVista library for reading data into python for processing. Getting pyvista and vtk to play nice with Python, however, is a challenge. Miniconda helps here by making sure that all required dependencies are installed in compatible versions:

conda create -n postproc python pyvista pandas scipy notebook meshio ipykernel seaborn -c conda-forge

Loading single data files

No matter whether we want to analyze a single output file or an entire series, a function for reading single files is neccessary.

LIGGGHTS and legacy Aspherix

When using a dump custom/vtk and using a file ending of .vtp or .vtk, pyvista can directly read the dataset.

import pyvista as pv

my_data = pv.read("dump/dump.100000.vtp")

Aspherix

Aspherix’ output_settings writes dumps as a collective of files that need to be read and combined.

import pandas as pd
import pyvista as pv
import xml.etree.ElementTree as ET
from pathlib import Path


def read_vtm(path_to_vtm, combine_blocks=True):
    path_to_vtm = Path(path_to_vtm)
    root_node = ET.parse(str(path_to_vtm)).getroot()
    targets = [
        str(path_to_vtm.parent / x.attrib["file"])
        for x in root_node.findall(".//*[@name='Particles']/DataSet")
        if "file" in x.attrib
    ]
    uncombined = [pv.read(x) for x in targets]
    if combine_blocks:
        while len(uncombined) > 1:
            tmp = []
            odd_one = False
            while len(uncombined) > 1:
                tmp.append([uncombined.pop(), uncombined.pop()])
            if len(uncombined) == 1:
                odd_one = uncombined.pop()
            uncombined = [x.merge(y) for x, y in tmp]
            if odd_one:
                uncombined.append(odd_one)
        return uncombined[0]
    return uncombined

my_data = read_vtm("dump/dump.10000.vtm")

Converting PyVista data into pandas DataFrames

While pyvista has lots of methods for processing data, I am more familiar with pandas and typically convert the data directly into the dataframe format.

Its own output, however, gives an easy way to look at your data:

my_data
Output from pyvista, <strong>repr</strong>

I wrote a function that performs this conversion:

import pandas as pd

def pv_to_df(pv_data, fields_to_copy=["x", "y", "z", "radius", "id"]):
    XYZ_POS = {"x": 0, "y": 1, "z": 2}
    df = pd.DataFrame()
    for field in fields_to_copy:
        if field in XYZ_POS.keys():
            df[field] = pv_data.points[:, XYZ_POS[field]]
            continue
        df[field] = pv_data[field]
    return df

This allows for direct analysis like this:

import matplotlib.pyplot as plt

df = pv_to_df(my_data, fields_to_copy=["x", "y", "z", "radius", "id", 'f_Temp[0]'])

fig, ax = plt.subplots(ncols=2)
df.hist('z', ax=ax[0], density=True, bins=20)
df.hist('f_Temp[0]', ax=ax[1], density=True, bins=20)
Example histogram

Extracting time series from dumps

More often than not, the change of a quantity over time is of interest. We can simply apply the above function to a series of files in a dump folder:

import numpy as np
get_timestep = lambda p: str(p).split(".")[-2]

def order_dumps(filelist, timestep=None):
    df = pd.DataFrame({
        'timestep': np.array([get_timestep(x) for x in filelist], dtype=int),
        'path': filelist,
    })
    if timestep:
        df['time'] = df['timestep'] * timestep
    return df.sort_values('timestep')

Small simulations: load all data

For small simulations, loading all data into a single dataframe is rather convenient:

from pathlib import Path
def read_data(x):
    df = pv_to_df(
        pv.read(x.path),
        fields_to_copy=["x", "y", "z", "radius", "id", 'f_Temp[0]']
    )
    df['time'] = x.time
    return df

files = order_dumps(list(Path("./post").glob("dump.*.vtp")), timestep=1e-4)
particle_data = pd.concat([read_data(x) for x in files.itertuples()])
Then, grouping by time or not, functions such as mean(), min(), max() or quantile(which_quantile) can be used:
fig, (ax1, ax2) = plt.subplots(ncols=2, sharex=True)

particle_data.groupby('time').mean()['f_Temp[0]'].plot(ax=ax1)
particle_data.groupby('time').mean()['z'].plot(ax=ax2)
ax1.set(ylabel='mean temperature')
ax2.set(ylabel='mean z coordinate')
fig.tight_layout()
Plots side by side of averaged quantities

Large simulations: iterate over all timesteps

For larger simulations, where not all data fits into memory at the same time, one may just read a file at a time, process the data and then collect the results:

from pathlib import Path
def process_step(x):
    df = pv_to_df(
        pv.read(x.path),
        fields_to_copy=["x", "y", "z", "radius", "id", 'f_Temp[0]']
    )

    this_result = pd.DataFrame({
        'time': [x.time],
        'mean_temperature': [df['f_Temp[0]'].mean()],
        'mean_z': [df['z'].mean()],
    })
    return df

files = order_dumps(list(Path("./post").glob("dump.*.vtp")), timestep=1e-4)
results = pd.concat([process_step(x) for x in files.itertuples()]).reset_index(drop=True)

This is equivalent to the computation performed above, but without the need to keep all of the particle data in memory simultaneously.