MultiMAP

MultiMAP.Batch(adata, batch_key='batch', scale=True, embedding=True, seed=0, dimred_func=None, rep_name='X_pca', **kwargs)

Run MultiMAP to correct batch effect within a single AnnData object. Loses the flexibility of individualised dimensionality reduction choices, but doesn’t require a list of separate objects for each batch/dataset to integrate. Runs PCA on a per-batch/dataset basis prior to performing an analysis analogous to Integration(). Adds appropriate .obsp graphs and .obsm['X_multimap'] (if instructed) to the input.

Input

adataAnnData

The object to process. .X data will be used in the computation.

batch_keystr, optional (default: “batch”)

The .obs column of the input object with the categorical variable defining the batch/dataset grouping to integrate on.

scalebool, optional (default: True)

Whether to scale the data to N(0,1) on a per-dataset basis prior to computing the cross-dataset PCAs. Improves integration.

embeddingbool, optional (default: True)

Whether to compute the MultiMAP embedding. If False, will just return the graph, which can be used to compute a regular UMAP. This can produce a manifold quicker, but at the cost of accuracy.

dimred_funcfunction or None, optional (default: None)

The function to use to compute dimensionality reduction on a per-dataset basis. Must accept an AnnData on input and modify it by inserting its dimensionality reduction into .obsm. If None, scanpy.tl.pca() will be used.

rep_namestr, optional (default: “X_pca”)

The .obsm field that the dimensionality reduction function stores its output under.

All other arguments as described in Integration().

MultiMAP.Integration(adatas, use_reps, scale=True, embedding=True, seed=0, **kwargs)

Run MultiMAP to integrate a number of AnnData objects from various multi-omics experiments into a single joint dimensionally reduced space. Returns a joint object with the resulting embedding stored in .obsm['X_multimap'] (if instructed) and appropriate graphs in .obsp. The final object will be a concatenation of the individual ones provided on input, so in the interest of ease of exploration it is recommended to have non-scaled data in .X.

Input

adataslist of AnnData

The objects to integrate. The .var spaces will be intersected across subsets of the objects to compute shared PCAs, so make sure that you have ample features in common between the objects. .X data will be used for computation.

use_repslist of str

The .obsm fields for each of the corresponding adatas to use as the dimensionality reduction to represent the full feature space of the object. Needs to be precomputed and present in the object at the time of calling the function.

scalebool, optional (default: True)

Whether to scale the data to N(0,1) on a per-dataset basis prior to computing the cross-dataset PCAs. Improves integration.

embeddingbool, optional (default: True)

Whether to compute the MultiMAP embedding. If False, will just return the graph, which can be used to compute a regular UMAP. This can produce a manifold quicker, but at the cost of accuracy.

n_neighborsint or None, optional (default: None)

The number of neighbours for each node (data point) in the MultiGraph. If None, defaults to 15 times the number of input datasets.

n_componentsint (default: 2)

The number of dimensions of the MultiMAP embedding.

seedint (default: 0)

RNG seed.

strengths: list of float or None (default: None)

The relative contribution of each dataset to the layout of the embedding. The higher the strength the higher the weighting of its cross entropy in the layout loss. If provided, needs to be a list with one 0-1 value per dataset; if None, defaults to 0.5 for each dataset.

cardinalityfloat or None, optional (default: None)

The target sum of the connectivities of each neighbourhood in the MultiGraph. If None, defaults to log2(n_neighbors).

The following parameter definitions are sourced from UMAP 0.5.1:

n_epochsint (optional, default None)

The number of training epochs to be used in optimizing the low dimensional embedding. Larger values result in more accurate embeddings. If None is specified a value will be selected based on the size of the input dataset (200 for large datasets, 500 for small).

initstring (optional, default ‘spectral’)
How to initialize the low dimensional embedding. Options are:
  • ‘spectral’: use a spectral embedding of the fuzzy 1-skeleton

  • ‘random’: assign initial embedding positions at random.

  • A numpy array of initial embedding positions.

min_distfloat (optional, default 0.1)

The effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the spread value, which determines the scale at which embedded points will be spread out.

spreadfloat (optional, default 1.0)

The effective scale of embedded points. In combination with min_dist this determines how clustered/clumped the embedded points are.

set_op_mix_ratiofloat (optional, default 1.0)

Interpolate between (fuzzy) union and intersection as the set operation used to combine local fuzzy simplicial sets to obtain a global fuzzy simplicial sets. Both fuzzy set operations use the product t-norm. The value of this parameter should be between 0.0 and 1.0; a value of 1.0 will use a pure fuzzy union, while 0.0 will use a pure fuzzy intersection.

local_connectivityint (optional, default 1)

The local connectivity required – i.e. the number of nearest neighbors that should be assumed to be connected at a local level. The higher this value the more connected the manifold becomes locally. In practice this should be not more than the local intrinsic dimension of the manifold.

afloat (optional, default None)

More specific parameters controlling the embedding. If None these values are set automatically as determined by min_dist and spread.

bfloat (optional, default None)

More specific parameters controlling the embedding. If None these values are set automatically as determined by min_dist and spread.

MultiMAP.TFIDF_LSI(adata, n_comps=50, binarize=True, random_state=0)

Computes LSI based on a TF-IDF transformation of the data. Putative dimensionality reduction for scATAC-seq data prior to MultiMAP. Adds an .obsm['X_lsi'] field to the object it was ran on.

Input

adataAnnData

The object to run TFIDF + LSI on. Will use .X as the input data.

n_compsint

The number of components to generate. Default: 50

binarizebool

Whether to binarize the data prior to the computation. Often done during scATAC-seq processing. Default: True

random_stateint

The seed to use for randon number generation. Default: 0

MultiMAP.matrix.MultiMAP(Xs, joint={}, joint_idxs={}, metrics=None, metric_kwds=None, joint_metrics={}, n_neighbors=None, cardinality=None, angular=False, set_op_mix_ratio=1.0, local_connectivity=1.0, n_components=2, spread=1.0, min_dist=None, init='spectral', n_epochs=None, a=None, b=None, strengths=None, random_state=0, verbose=False, graph_only=False)

Run MultiMAP on a collection of dimensionality reduction matrices. Returns a (parameters, neighbor_graph, embedding) tuple, with the embedding optionally skipped if graph_only=True.

Input

Xslist of np.array

The dimensionality reductions of the datasets to integrate, observations as rows.

>>> Xs = [DR_A, DR_B, DR_C]
jointdict of np.array

The joint dimensionality reductions generated for all pair combinations of the input datasets. The keys are to be two-integer tuples, specifying the indices of the two datasets in Xs

>>> joint = {(0,1):DR_AB, (0,2):DR_AC, (1,2):DR_BC}
graph_onlybool, optional (default: False)

If True, skip producing the embedding and only return the neighbour graph.

All other arguments as described in MultiMAP.Integration().