Loading Data

The main purpose of swiftsimio is to load data. This section will tell you all about four main objects:

  • swiftsimio.metadata.objects.SWIFTUnits, responsible for creating a correspondence between the SWIFT units and unyt objects.

  • swiftsimio.metadata.objects.SWIFTMetadata, responsible for loading any required information from the SWIFT headers into python-readable data.

  • swiftsimio.reader.SWIFTDataset, responsible for holding all particle data, and keeping track of the above two objects.

  • swiftsimio.reader.SWIFTParticleTypeMetadata, responsible for cataloguing metadata just about individual particle types, like gas, including what particle fields are present.

To get started, first locate any SWIFT data that you wish to analyse. If you don’t have any handy, you can always download our test cosmological volume at:

http://virgodb.cosma.dur.ac.uk/swift-webstorage/IOExamples/cosmo_volume_example.hdf5

with associated halo catalogue at

http://virgodb.cosma.dur.ac.uk/swift-webstorage/IOExamples/cosmo_volume_example.properties

which is needed should you wish to use the velociraptor integration library in the visualisation examples.

To create your first instance of swiftsimio.reader.SWIFTDataset, you can use the helper function swiftsimio.load as follows:

from swiftsimio import load

# Of course, replace this path with your own snapshot should you be using
# custom data.
data = load("cosmo_volume_example.hdf5")

The type of data is now swiftsimio.reader.SWIFTDataset. Have a quick look around this dataset in an iPython shell, or a jupyter notebook, and you will see that it contains several sub-objects:

Using metadata

Let’s begin by reading some useful metadata straight out of our data.metadata object. For instance, we may want to know the box-size of our simulation:

meta = data.metadata
boxsize = meta.boxsize

print(boxsize)

This will output [142.24751067 142.24751067 142.24751067] Mpc - note the units that are attached. These units being attached to everything is one of the key advantages of using swiftsimio. It is really easy to convert between units; for instance if we want that box-size in kiloparsecs,

boxsize.convert_to_units("kpc")

print(boxsize)

Now outputting [142247.5106242 142247.5106242 142247.5106242] kpc. Neat! This is all thanks to our tight integration with unyt. If you have more complex units, it is often useful to specify them in terms of unyt objects as follows:

import unyt

new_units = unyt.cm * unyt.Mpc / unyt.kpc
new_units.simplify()

boxsize.convert_to_units(new_units)

In general, we suggest using unyt unit objects rather than strings. You can find more information about unyt on the unyt documentation website.

There is lots of metadata available through this object; the best way to see this is by exploring the object using dir() in an interactive shell, but as a summary:

  • All metadata from the snapshot is available through many variables, for example the meta.hydro_scheme property.

  • The numbers of particles of different types are available through meta.n_{gas,dark_matter,stars,black_holes}.

  • Several pre-LaTeXed strings are available describing the configuration state of the code, such as meta.hydro_info, meta.compiler_info.

  • Several snapshot-wide parameters, such as meta.a (current scale factor), meta.t (current time), meta.z (current redshift), meta.run_name (the name of this run, specified in the SWIFT parameter file), and meta.snapshot_date (a datetime object describing when the snapshot was written to disk).

  • If you have astropy installed, you can also use the metadata.cosmology object, which is an astropy.cosmology.w0waCDM instance.

Reading particle data

To find out what particle properties are present in our snapshot, we can print the available particle types. For example:

data

prints the available particle types (or, more generally, groups):

SWIFT dataset at cosmo_volume_example.hdf5.
Available groups: gas, dark_matter, stars, black_holes

The properties available for a particle type can be similarly printed:

data.dark_matter

gives:

SWIFT dataset at cosmo_volume_example.hdf5.
Available fields: coordinates, masses, particle_ids, velocities

With compatible python interpreters, the available fields (and other attributes such as functions) can be seen using the tab completion feature, for example typing >>> data.dark_matter. at the command prompt and pressing tab twice gives:

data.dark_matter.coordinates                  data.dark_matter.masses
data.dark_matter.filename                     data.dark_matter.metadata
data.dark_matter.particle_ids                 data.dark_matter.generate_empty_properties()
data.dark_matter.group                        data.dark_matter.units
data.dark_matter.group_metadata               data.dark_matter.velocities
data.dark_matter.group_name

The available fields can also be accessed programatically using the instance of swiftsimio.reader.SWIFTMetadata, data.metadata, which contains several instances of swiftsimio.reader.SWIFTParticleTypeMetadata describing what kinds of fields are present in gas or dark matter:

# Note that gas_properties is an instance of SWIFTParticleTypeMetadata
print(data.metadata.gas_properties.field_names)

This will print a large list, like

['coordinates',
'densities',
...
'temperatures',
'velocities']

These individual attributes can be accessed through the object-oriented interface, for instance,

x_gas = data.gas.coordinates
rho_gas = data.gas.densities
x_dm = data.dark_matter.coordinates

Only at this point is any information actually read from the snapshot, so far we have only read three arrays into memory - in this case corresponding to /PartType0/Coordinates, /PartType1/Coordinates, and /PartType0/Densities.

This allows you to be quite lazy when writing scripts; you do not have to write, for instance, a block at the start of the file with a with h5py.File(...) as handle: and read all of the data at once, you can simply access data whenever you need it through this predictable interface.

Just like the boxsize, these carry symbolic unyt units,

print(x_gas.units)

will output Mpc. We can again convert these to whatever units we like. For instance, should we wish to convert our gas densities to solar masses per cubic megaparsec,

new_density_units = unyt.Solar_Mass / unyt.Mpc**3

rho_gas.convert_to_units(new_density_units)

print(rho_gas.units.latex_repr)

which will output '\\frac{M_\\odot}{\\rm{Mpc}^{3}}'. This is a LaTeX representation of those symbolic units that we just converted our data to - this is very useful for making plots as it can ensure that your data and axes labels always have consistent units.

Named columns

SWIFT can output custom metadata in SubgridScheme/NamedColumns for multi dimensional tables containing columns that carry individual data. One common example of this is the element mass fractions of gas and stellar particles. These are then placed in an object hierarchy, as follows:

print(data.gas.element_mass_fractions)

This will output: Named columns instance with [‘hydrogen’, ‘helium’, ‘carbon’, ‘nitrogen’, ‘oxygen’, ‘neon’, ‘magnesium’, ‘silicon’, ‘iron’] available for “Fractions of the particles’ masses that are in the given element”

Then, to access individual columns (in this case element abundances):

# Access the silicon abundance
data.gas.element_mass_fractions.silicon

User-defined particle types

It is now possible to add user-defined particle types that are not already present in the swiftsimio metadata. All you need to do is specify the three names (see below) and then the particle datasets that you have provided in SWIFT will be automatically read.

import swiftsimio as sw
import swiftsimio.metadata.particle as swp
from swiftsimio.objects import cosmo_factor, a

swp.particle_name_underscores[6] = "extratype"
swp.particle_name_class[6] = "Extratype"
swp.particle_name_text[6] = "Extratype"

data = sw.load(
    "extra_test.hdf5",
)

Reading from an open file

swiftsimio normally opens and closes the HDF5 snapshot file for each operation. This is convenient for interactive use and avoids leaving files open for long periods of time, but sometimes it might be desirable to minimize the number of file open and close operations.

It is possible to pass an open h5py.File object to swiftsimio.load and swiftsimio.mask in place of the filename. In this case swiftsimio will do all file access through the provided file object. This allows us to read multiple datasets while only opening and closing the file once. For example:

import h5py
import swiftsimio as sw

with h5py.File("cosmo_volume_example.hdf5","r") as snap_file:
   data = sw.load(snap_file)
   pos = data.dark_matter.coordinates
   vel = data.dark_matter.velocities
   ids = data.dark_matter.particle_ids

This would open the snapshot file, read the dark matter particle positions, velocities and IDs, then close the file.

Reading from a remote file

swiftsimio is able to read from snapshots hosted on a remote server using the hdfstream python module. This is useful if you’re interested in accessing a small part of a larger snapshot: you can read a small region or a subset of particle properties without downloading the whole snapshot.

To open a remote snapshot, you can pass a hdfstream.RemoteFile object to swiftsimio.load and swiftsimio.mask in place of the filename. For example, you can open one of the SWIFT example snapshots with:

import hdfstream
from swiftsimio import load

snap_file = hdfstream.open("cosma", "Tests/SWIFT/IOExamples/ssio_ci_04_2025/EagleSingle.hdf5")
data = load(snap_file)

Here, data will be a swiftsimio.reader.SWIFTDataset. It functions in the same way as described in the Loading Data section above, except that instead of reading data from a local HDF5 file, it requests data from the server.

Opening a snapshot like this only downloads a small amount of metadata. Accessing particle properties, such as coordinates, will trigger another download:

pos = data.dark_matter.coordinates

This will download the dark matter particle coordinates and return an array with units and cosmological factors attached.

To read part of a remote snapshot, we can use swiftsimio’s Masking feature as we would with a local snapshot, but passing the remote file to swiftsimio.mask swiftsimio.load in place of the filename.

import hdfstream
import swiftsimio as sw

snap_file = hdfstream.open("cosma", "Tests/SWIFT/IOExamples/ssio_ci_04_2025/EagleSingle.hdf5")

mask = sw.mask(snap_file)
# The full metadata object is available from within the mask
boxsize = mask.metadata.boxsize
# load_region is a 3x2 list [[xmin, xmax], [ymin, ymax], [zmin, zmax]]
load_region = [[0.0 * b, 0.5 * b] for b in boxsize]

# Constrain the mask
mask.constrain_spatial(load_region)

# Now load the snapshot with this mask
data = sw.load(snap_file, mask=mask)