Pyro Examples

Datasets

Multi MNIST

This script generates a dataset similar to the Multi-MNIST dataset described in [1].

[1] Eslami, SM Ali, et al. “Attend, infer, repeat: Fast scene understanding with generative models.” Advances in Neural Information Processing Systems. 2016.

imresize(arr, size)[source]
sample_one(canvas_size, mnist)[source]
sample_multi(num_digits, canvas_size, mnist)[source]
mk_dataset(n, mnist, max_digits, canvas_size)[source]
load_mnist(root_path)[source]
load(root_path)[source]

BART Ridership

load_bart_od()[source]

Load a dataset of hourly origin-destination ridership counts for every pair of BART stations during the years 2011-2019.

Source https://www.bart.gov/about/reports/ridership

This downloads the dataset the first time it is called. On subsequent calls this reads from a local cached file .pkl.bz2. This attempts to download a preprocessed compressed cached file maintained by the Pyro team. On cache hit this should be very fast. On cache miss this falls back to downloading the original data source and preprocessing the dataset, requiring about 350MB of file transfer, storing a few GB of temp files, and taking upwards of 30 minutes.

Returns

a dataset is a dictionary with fields:

  • ”stations”: a list of strings of station names

  • ”start_date”: a datetime.datetime for the first observaion

  • ”counts”: a torch.FloatTensor of ridership counts, with shape (num_hours, len(stations), len(stations)).

load_fake_od()[source]

Create a tiny synthetic dataset for smoke testing.

Nextstrain SARS-CoV-2 counts

load_nextstrain_counts(map_location=None) dict[source]

Loads a SARS-CoV-2 dataset.

The original dataset is a preprocessed intermediate metadata.tsv.gz available via nextstrain. The metadata.tsv.gz file was then aggregated to (month,location,lineage) and (lineage,mutation) bins by the Broad Institute’s preprocessing script.

Utilities

class MNIST(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

mirrors = ['https://d2hg8soec8ck9v.cloudfront.net/datasets/mnist/', 'http://yann.lecun.com/exdb/mnist/', 'https://ossci-datasets.s3.amazonaws.com/mnist/']
get_data_loader(dataset_name, data_dir, batch_size=1, dataset_transforms=None, is_training_set=True, shuffle=True)[source]
print_and_log(logger, msg)[source]
get_data_directory(filepath=None)[source]