Main function
This part describes how to use the main function
deconvoluting
to perform single-cell deconvolution.
Useage
The usage of deconvoluting
is as follows:
## nums = deconvoluting(ref, st, cellnames=NULL, genemode, gene.list, var_thresh=0.025, exp_thresh=0.03, hpmode, hp, aver_cell, thre=1e-10, dopar=T, ncores, realtime=F, dir=NULL)
It does contains many parameters. Next, I will divide these parameters into several parts by function, and explain them one by one.
1. Necessary data for deconvolution
ref
and st
are the core data used for
deconvolution. They are required as follows:
ref
, the scRNA-seq data served as reference for deconvolution. It is aMatrix
(ordgCMatrix
) of unprocessed count-level scRNA-seq data. One row represents one gene and one column represents one cell.st
, the spatial transcriptomics to be deconvoluted. It is aMatrix
(ordgCMatrix
) of unprocessed spatial transcriptomics, represents raw counts in each spot. One row represents one gene and one column represents one spot.
2. Mode of gene selection
genemode
, gene.list
,
var_thresh
and exp_thresh
are about how to
deal with genes. genemode
determines the mode of handling
genes and gene.list
, var_thresh
,
exp_thresh
are associated with specific modes. Redeconve
offers 3 alternative modes of dealing with genes:
default
: Use the intersection of genes inref
andst
, without other treatment.customized
: Indicating gene list yourself. Parametergene.list
is the list of genes you indicated. Note that only those genes within the intersection ofref
andst
would be used.filtered
: We will use a built-in functiongene.filter
to screen some genes. This function will first take the intersection ofref
andst
, the use two indices,var_thresh
andexp_thresh
to filter genes. You can customize these two parameters as well.
var_thresh
considers variance of reference. Genes whose variance across all cells in reference do not reach that threshold will be filtered out. The default value is 0.025.exp_thresh
considers expression in spatial transcriptomics. Genes whose average count across all spots in spatial transcriptomics is less that this value will be filtered out. The default value is 0.003.
3. Mode of determining hyperparameter
The hyperparamter is our key to single-cell resolution (See Methods for details). Here we still offers 3 modes to determine the hyperparameter:
default
: We will calculate a hyperparameter according to the number of genes and cells in reference (See Methods for details).customized
: Indicating the hyperparameter yourself.autoselection
: Redeconve will use a procedure to select the optimal hyperparameter. In this procedure, a series of hyperparameter will be set in the vicinity of the hyperparameter selected by modedefault
, and Redeconve will use these hyperparameters to perform deconvolution separately, then return the result with the best hyperparameter. You can see Methods for details about how we determine the best hyperparameter.
- Note that in this procedure, several rounds of deconvolution will be performed, so it may take a long time. Under such circumstances, parallel computing will be beneficial.
4. Parallel computing
Sometimes the reference will contain tens of thousands of cells, or
the spatial transcriptomics will contain tens of thousands of spots
(e.g. when the data is from Slide-seq), then parallel computing is
useful. Redeconve uses the package “doSNOW” to achieve parallel
computing (which means there is a progress bar). Related parameters are
dopar
and ncores
.
dopar
determines whether to use parallel computing or not.ncores
indicates the number of cores to be used in parallel computing. It’s recommended to manually set this parameter rather than use the functiondetectCores
to avoid underlying errors.
! Important tips for parallel computing: !
- Our underlying algorithm makes use of OpenBLAS, which may include
parallel computing inside. Therefore, setting
system("export OPENBLAS_NUM_THREADS=1")
is necessary to avoid underlying errors. - An error may be reported when the number of threads is too large :
Error in socketAccept(socket = socket, blocking = TRUE, open = "a+b",: all connections are in use
. If such error occurs, please reduce the number of cores.
5. Writing real-time results
Even with parallel computing, some dataset is still time-consuming.
Redeconve is able to write results into disk in real time at the cost of
some running speed. Related parameters are realtime
and
dir
.
dopar
determines whethers to write the results into disk in real time or not.dir
indicates the directory to write the results.
For real time results, the result of each spot will be write into a separate csv file, whose name is the barcode of the spot.
6. Other parameters
The left parameters are cellnames
,
normalize
and thre
.
cellnames
: Chances are that you may not want to use all cells in reference to run deconvolution. Then you can indicate which cells will be used by this parameter. If you do not specify this parameter, all cells will be used.normalize
: Redeconve can also be used for bulk RNA-seq deconvolution. When doing this, normalization for reference is not required. When deconvoluting spatial transcriptomics, normalization is recommended.thre
: The estimated cell abundance will not be exactly 0. This parameter indicates that the abundance less than this value will be treated as 0. Generally this value does not need to be adjusted, and the result will remain the same within a relatively big range of this value.
A demo
Next we will use a demo to give an example of how to use this function.
## load the data
#data(basic)
## check the dimensions of sc and st
#dim(sc)
#dim(st)
## check the number cells in each cell type
#table(annotations[,2])
## deconvolution
# this may take a long time
#res = deconvoluting(sc,st,genemode="filt",hpmode="def",aver_cell=25,dopar=T,ncores=8)
sc
andst
are separately reference and spatial transcriptomics.- For there are about 20000 genes,
genemode
is set to"filtered"
with the default threshold of variance and mean expression, which result in #### genes. hpmode
is set to"default"
to improve efficiency.dopar
is set toTRUE
(default value) andncores
is set to 8. You can raise the number of cores to improve efficiency.- This dataset is from the ST platform whose spot radius is about 100
\(\mu\)m, so we set
aver_cell
as 25. - For this dataset is not very large,
realtime
is set toFALSE
(default value). - We want to use all cells in deconvolution, so we do not need to
specify
cellnames
. Also, we do not need to adjustthre
.