MultiMAP
- MultiMAP.Batch(adata, batch_key='batch', scale=True, embedding=True, seed=0, dimred_func=None, rep_name='X_pca', **kwargs)
Run MultiMAP to correct batch effect within a single AnnData object. Loses the flexibility of individualised dimensionality reduction choices, but doesn’t require a list of separate objects for each batch/dataset to integrate. Runs PCA on a per-batch/dataset basis prior to performing an analysis analogous to
Integration()
. Adds appropriate.obsp
graphs and.obsm['X_multimap']
(if instructed) to the input.Input
- adata
AnnData
The object to process.
.X
data will be used in the computation.- batch_key
str
, optional (default: “batch”) The
.obs
column of the input object with the categorical variable defining the batch/dataset grouping to integrate on.- scale
bool
, optional (default:True
) Whether to scale the data to N(0,1) on a per-dataset basis prior to computing the cross-dataset PCAs. Improves integration.
- embedding
bool
, optional (default:True
) Whether to compute the MultiMAP embedding. If
False
, will just return the graph, which can be used to compute a regular UMAP. This can produce a manifold quicker, but at the cost of accuracy.- dimred_funcfunction or
None
, optional (default:None
) The function to use to compute dimensionality reduction on a per-dataset basis. Must accept an
AnnData
on input and modify it by inserting its dimensionality reduction into.obsm
. IfNone
,scanpy.tl.pca()
will be used.- rep_name
str
, optional (default: “X_pca”) The
.obsm
field that the dimensionality reduction function stores its output under.
All other arguments as described in
Integration()
.- adata
- MultiMAP.Integration(adatas, use_reps, scale=True, embedding=True, seed=0, **kwargs)
Run MultiMAP to integrate a number of AnnData objects from various multi-omics experiments into a single joint dimensionally reduced space. Returns a joint object with the resulting embedding stored in
.obsm['X_multimap']
(if instructed) and appropriate graphs in.obsp
. The final object will be a concatenation of the individual ones provided on input, so in the interest of ease of exploration it is recommended to have non-scaled data in.X
.Input
- adataslist of
AnnData
The objects to integrate. The
.var
spaces will be intersected across subsets of the objects to compute shared PCAs, so make sure that you have ample features in common between the objects..X
data will be used for computation.- use_repslist of
str
The
.obsm
fields for each of the correspondingadatas
to use as the dimensionality reduction to represent the full feature space of the object. Needs to be precomputed and present in the object at the time of calling the function.- scale
bool
, optional (default:True
) Whether to scale the data to N(0,1) on a per-dataset basis prior to computing the cross-dataset PCAs. Improves integration.
- embedding
bool
, optional (default:True
) Whether to compute the MultiMAP embedding. If
False
, will just return the graph, which can be used to compute a regular UMAP. This can produce a manifold quicker, but at the cost of accuracy.- n_neighbors
int
orNone
, optional (default:None
) The number of neighbours for each node (data point) in the MultiGraph. If
None
, defaults to 15 times the number of input datasets.- n_components
int
(default: 2) The number of dimensions of the MultiMAP embedding.
- seed
int
(default: 0) RNG seed.
- strengths:
list
offloat
orNone
(default:None
) The relative contribution of each dataset to the layout of the embedding. The higher the strength the higher the weighting of its cross entropy in the layout loss. If provided, needs to be a list with one 0-1 value per dataset; if
None
, defaults to 0.5 for each dataset.- cardinality
float
orNone
, optional (default:None
) The target sum of the connectivities of each neighbourhood in the MultiGraph. If
None
, defaults tolog2(n_neighbors)
.
The following parameter definitions are sourced from UMAP 0.5.1:
- n_epochsint (optional, default None)
The number of training epochs to be used in optimizing the low dimensional embedding. Larger values result in more accurate embeddings. If None is specified a value will be selected based on the size of the input dataset (200 for large datasets, 500 for small).
- initstring (optional, default ‘spectral’)
- How to initialize the low dimensional embedding. Options are:
‘spectral’: use a spectral embedding of the fuzzy 1-skeleton
‘random’: assign initial embedding positions at random.
A numpy array of initial embedding positions.
- min_distfloat (optional, default 0.1)
The effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the
spread
value, which determines the scale at which embedded points will be spread out.- spreadfloat (optional, default 1.0)
The effective scale of embedded points. In combination with
min_dist
this determines how clustered/clumped the embedded points are.- set_op_mix_ratiofloat (optional, default 1.0)
Interpolate between (fuzzy) union and intersection as the set operation used to combine local fuzzy simplicial sets to obtain a global fuzzy simplicial sets. Both fuzzy set operations use the product t-norm. The value of this parameter should be between 0.0 and 1.0; a value of 1.0 will use a pure fuzzy union, while 0.0 will use a pure fuzzy intersection.
- local_connectivityint (optional, default 1)
The local connectivity required – i.e. the number of nearest neighbors that should be assumed to be connected at a local level. The higher this value the more connected the manifold becomes locally. In practice this should be not more than the local intrinsic dimension of the manifold.
- afloat (optional, default None)
More specific parameters controlling the embedding. If None these values are set automatically as determined by
min_dist
andspread
.- bfloat (optional, default None)
More specific parameters controlling the embedding. If None these values are set automatically as determined by
min_dist
andspread
.
- adataslist of
- MultiMAP.TFIDF_LSI(adata, n_comps=50, binarize=True, random_state=0)
Computes LSI based on a TF-IDF transformation of the data. Putative dimensionality reduction for scATAC-seq data prior to MultiMAP. Adds an
.obsm['X_lsi']
field to the object it was ran on.Input
- adata
AnnData
The object to run TFIDF + LSI on. Will use
.X
as the input data.- n_comps
int
The number of components to generate. Default: 50
- binarize
bool
Whether to binarize the data prior to the computation. Often done during scATAC-seq processing. Default: True
- random_state
int
The seed to use for randon number generation. Default: 0
- adata
- MultiMAP.matrix.MultiMAP(Xs, joint={}, joint_idxs={}, metrics=None, metric_kwds=None, joint_metrics={}, n_neighbors=None, cardinality=None, angular=False, set_op_mix_ratio=1.0, local_connectivity=1.0, n_components=2, spread=1.0, min_dist=None, init='spectral', n_epochs=None, a=None, b=None, strengths=None, random_state=0, verbose=False, graph_only=False)
Run MultiMAP on a collection of dimensionality reduction matrices. Returns a
(parameters, neighbor_graph, embedding)
tuple, with the embedding optionally skipped ifgraph_only=True
.Input
- Xslist of
np.array
The dimensionality reductions of the datasets to integrate, observations as rows.
>>> Xs = [DR_A, DR_B, DR_C]
- jointdict of
np.array
The joint dimensionality reductions generated for all pair combinations of the input datasets. The keys are to be two-integer tuples, specifying the indices of the two datasets in
Xs
>>> joint = {(0,1):DR_AB, (0,2):DR_AC, (1,2):DR_BC}
- graph_only
bool
, optional (default:False
) If
True
, skip producing the embedding and only return the neighbour graph.
All other arguments as described in
MultiMAP.Integration()
.- Xslist of