The Snakemake API

snakemake.snakemake(snakefile, batch=None, cache=None, report=None, report_stylesheet=None, containerize=False, lint=None, generate_unit_tests=None, listrules=False, list_target_rules=False, cores=1, nodes=None, local_cores=1, max_threads=None, resources={}, overwrite_threads=None, overwrite_scatter=None, default_resources=None, overwrite_resources=None, config={}, configfiles=None, config_args=None, workdir=None, targets=None, dryrun=False, touch=False, forcetargets=False, forceall=False, forcerun=[], until=[], omit_from=[], prioritytargets=[], stats=None, printreason=False, printshellcmds=False, debug_dag=False, printdag=False, printrulegraph=False, printfilegraph=False, printd3dag=False, nocolor=False, quiet=False, keepgoing=False, cluster=None, cluster_config=None, cluster_sync=None, drmaa=None, drmaa_log_dir=None, jobname='snakejob.{rulename}.{jobid}.sh', immediate_submit=False, standalone=False, ignore_ambiguity=False, snakemakepath=None, lock=True, unlock=False, cleanup_metadata=None, conda_cleanup_envs=False, cleanup_shadow=False, cleanup_scripts=True, force_incomplete=False, ignore_incomplete=False, list_version_changes=False, list_code_changes=False, list_input_changes=False, list_params_changes=False, list_untracked=False, list_resources=False, summary=False, archive=None, delete_all_output=False, delete_temp_output=False, detailed_summary=False, latency_wait=3, wait_for_files=None, print_compilation=False, debug=False, notemp=False, all_temp=False, keep_remote_local=False, nodeps=False, keep_target_files=False, allowed_rules=None, jobscript=None, greediness=None, no_hooks=False, overwrite_shellcmd=None, updated_files=None, log_handler=[], keep_logger=False, max_jobs_per_second=None, max_status_checks_per_second=100, restart_times=0, attempt=1, verbose=False, force_use_threads=False, use_conda=False, use_singularity=False, use_env_modules=False, singularity_args='', conda_frontend='conda', conda_prefix=None, conda_cleanup_pkgs=None, list_conda_envs=False, singularity_prefix=None, shadow_prefix=None, scheduler='ilp', scheduler_ilp_solver=None, conda_create_envs_only=False, mode=0, wrapper_prefix=None, kubernetes=None, container_image=None, tibanna=False, tibanna_sfn=None, google_lifesciences=False, google_lifesciences_regions=None, google_lifesciences_location=None, google_lifesciences_cache=False, tes=None, preemption_default=None, preemptible_rules=None, precommand='', default_remote_provider=None, default_remote_prefix='', tibanna_config=False, assume_shared_fs=True, cluster_status=None, cluster_cancel=None, cluster_cancel_nargs=None, cluster_sidecar=None, export_cwl=None, show_failed_logs=False, keep_incomplete=False, keep_metadata=True, messaging=None, edit_notebook=None, envvars=None, overwrite_groups=None, group_components=None, max_inventory_wait_time=20, execute_subworkflows=True, conda_not_block_search_path_envvars=False, scheduler_solver_path=None, conda_base_path=None, local_groupid='local')[source]

Run snakemake on a given snakefile.

This function provides access to the whole snakemake functionality. It is not thread-safe.

  • snakefile (str) – the path to the snakefile

  • batch (Batch) – whether to compute only a partial DAG, defined by the given Batch object (default None)

  • report (str) – create an HTML report for a previous run at the given path

  • lint (str) – print lints instead of executing (None, “plain” or “json”, default None)

  • listrules (bool) – list rules (default False)

  • list_target_rules (bool) – list target rules (default False)

  • cores (int) – the number of provided cores (ignored when using cluster support) (default 1)

  • nodes (int) – the number of provided cluster nodes (ignored without cluster support) (default 1)

  • local_cores (int) – the number of provided local cores if in cluster mode (ignored without cluster support) (default 1)

  • resources (dict) – provided resources, a dictionary assigning integers to resource names, e.g. {gpu=1, io=5} (default {})

  • default_resources (DefaultResources) – default values for resources not defined in rules (default None)

  • config (dict) – override values for workflow config

  • workdir (str) – path to the working directory (default None)

  • targets (list) – list of targets, e.g. rule or file names (default None)

  • dryrun (bool) – only dry-run the workflow (default False)

  • touch (bool) – only touch all output files if present (default False)

  • forcetargets (bool) – force given targets to be re-created (default False)

  • forceall (bool) – force all output files to be re-created (default False)

  • forcerun (list) – list of files and rules that shall be re-created/re-executed (default [])

  • execute_subworkflows (bool) – execute subworkflows if present (default True)

  • prioritytargets (list) – list of targets that shall be run with maximum priority (default [])

  • stats (str) – path to file that shall contain stats about the workflow execution (default None)

  • printreason (bool) – print the reason for the execution of each job (default false)

  • printshellcmds (bool) – print the shell command of each job (default False)

  • printdag (bool) – print the dag in the graphviz dot language (default False)

  • printrulegraph (bool) – print the graph of rules in the graphviz dot language (default False)

  • printfilegraph (bool) – print the graph of rules with their input and output files in the graphviz dot language (default False)

  • printd3dag (bool) – print a D3.js compatible JSON representation of the DAG (default False)

  • nocolor (bool) – do not print colored output (default False)

  • quiet (bool) – do not print any default job information (default False)

  • keepgoing (bool) – keep going upon errors (default False)

  • cluster (str) – submission command of a cluster or batch system to use, e.g. qsub (default None)

  • cluster_config (str,list) – configuration file for cluster options, or list thereof (default None)

  • cluster_sync (str) – blocking cluster submission command (like SGE ‘qsub -sync y’) (default None)

  • drmaa (str) – if not None use DRMAA for cluster support, str specifies native args passed to the cluster when submitting a job

  • drmaa_log_dir (str) – the path to stdout and stderr output of DRMAA jobs (default None)

  • jobname (str) – naming scheme for cluster job scripts (default “snakejob.{rulename}.{jobid}.sh”)

  • immediate_submit (bool) – immediately submit all cluster jobs, regardless of dependencies (default False)

  • standalone (bool) – kill all processes very rudely in case of failure (do not use this if you use this API) (default False) (deprecated)

  • ignore_ambiguity (bool) – ignore ambiguous rules and always take the first possible one (default False)

  • snakemakepath (str) – deprecated parameter whose value is ignored. Do not use.

  • lock (bool) – lock the working directory when executing the workflow (default True)

  • unlock (bool) – just unlock the working directory (default False)

  • cleanup_metadata (list) – just cleanup metadata of given list of output files (default None)

  • drop_metadata (bool) – drop metadata file tracking information after job finishes (–report and –list_x_changes information will be incomplete) (default False)

  • conda_cleanup_envs (bool) – just cleanup unused conda environments (default False)

  • cleanup_shadow (bool) – just cleanup old shadow directories (default False)

  • cleanup_scripts (bool) – delete wrapper scripts used for execution (default True)

  • force_incomplete (bool) – force the re-creation of incomplete files (default False)

  • ignore_incomplete (bool) – ignore incomplete files (default False)

  • list_version_changes (bool) – list output files with changed rule version (default False)

  • list_code_changes (bool) – list output files with changed rule code (default False)

  • list_input_changes (bool) – list output files with changed input files (default False)

  • list_params_changes (bool) – list output files with changed params (default False)

  • list_untracked (bool) – list files in the workdir that are not used in the workflow (default False)

  • summary (bool) – list summary of all output files and their status (default False)

  • archive (str) – archive workflow into the given tarball

  • delete_all_output (bool) – remove all files generated by the workflow (default False)

  • delete_temp_output (bool) – remove all temporary files generated by the workflow (default False)

  • latency_wait (int) – how many seconds to wait for an output file to appear after the execution of a job, e.g. to handle filesystem latency (default 3)

  • wait_for_files (list) – wait for given files to be present before executing the workflow

  • list_resources (bool) – list resources used in the workflow (default False)

  • summary – list summary of all output files and their status (default False). If no option is specified a basic summary will be output. If ‘detailed’ is added as an option e.g –summary detailed, extra info about the input and shell commands will be included

  • detailed_summary (bool) – list summary of all input and output files and their status (default False)

  • print_compilation (bool) – print the compilation of the snakefile (default False)

  • debug (bool) – allow to use the debugger within rules

  • notemp (bool) – ignore temp file flags, e.g. do not delete output files marked as a temp after use (default False)

  • keep_remote_local (bool) – keep local copies of remote files (default False)

  • nodeps (bool) – ignore dependencies (default False)

  • keep_target_files (bool) – do not adjust the paths of given target files relative to the working directory.

  • allowed_rules (set) – restrict allowed rules to the given set. If None or empty, all rules are used.

  • jobscript (str) – path to a custom shell script template for cluster jobs (default None)

  • greediness (float) – set the greediness of scheduling. This value between 0 and 1 determines how careful jobs are selected for execution. The default value (0.5 if prioritytargets are used, 1.0 else) provides the best speed and still acceptable scheduling quality.

  • overwrite_shellcmd (str) – a shell command that shall be executed instead of those given in the workflow. This is for debugging purposes only.

  • updated_files (list) – a list that will be filled with the files that are updated or created during the workflow execution

  • verbose (bool) – show additional debug output (default False)

  • max_jobs_per_second (int) – maximal number of cluster/drmaa jobs per second, None to impose no limit (default None)

  • restart_times (int) – number of times to restart failing jobs (default 0)

  • attempt (int) – initial value of Job.attempt. This is intended for internal use only (default 1).

  • force_use_threads – whether to force the use of threads over processes. helpful if shared memory is full or unavailable (default False)

  • use_conda (bool) – use conda environments for each job (defined with conda directive of rules)

  • use_singularity (bool) – run jobs in singularity containers (if defined with singularity directive)

  • use_env_modules (bool) – load environment modules if defined in rules

  • singularity_args (str) – additional arguments to pass to a singularity

  • conda_prefix (str) – the directory in which conda environments will be created (default None)

  • conda_cleanup_pkgs (snakemake.deployment.conda.CondaCleanupMode) – whether to clean up conda tarballs after env creation (default None), valid values: “tarballs”, “cache”

  • singularity_prefix (str) – the directory to which singularity images will be pulled (default None)

  • shadow_prefix (str) – prefix for shadow directories. The job-specific shadow directories will be created in $SHADOW_PREFIX/shadow/ (default None)

  • conda_create_envs_only (bool) – if specified, only builds the conda environments specified for each job, then exits.

  • list_conda_envs (bool) – list conda environments and their location on disk.

  • mode (snakemake.common.Mode) – execution mode

  • wrapper_prefix (str) – prefix for wrapper script URLs (default None)

  • kubernetes (str) – submit jobs to Kubernetes, using the given namespace.

  • container_image (str) – Docker image to use, e.g., for Kubernetes.

  • default_remote_provider (str) – default remote provider to use instead of local files (e.g. S3, GS)

  • default_remote_prefix (str) – prefix for default remote provider (e.g. name of the bucket).

  • tibanna (bool) – submit jobs to AWS cloud using Tibanna.

  • tibanna_sfn (str) – Step function (Unicorn) name of Tibanna (e.g. tibanna_unicorn_monty). This must be deployed first using tibanna cli.

  • google_lifesciences (bool) – submit jobs to Google Cloud Life Sciences (pipelines API).

  • google_lifesciences_regions (list) – a list of regions (e.g., us-east1)

  • google_lifesciences_location (str) – Life Sciences API location (e.g., us-central1)

  • google_lifesciences_cache (bool) – save a cache of the compressed working directories in Google Cloud Storage for later usage.

  • tes (str) – Execute workflow tasks on GA4GH TES server given by URL.

  • precommand (str) – commands to run on AWS cloud before the snakemake command (e.g. wget, git clone, unzip, etc). Use with –tibanna.

  • preemption_default (int) – set a default number of preemptible instance retries (for Google Life Sciences executor only)

  • preemptible_rules (list) – define custom preemptible instance retries for specific rules (for Google Life Sciences executor only)

  • tibanna_config (list) – Additional tibanna config e.g. –tibanna-config spot_instance=true subnet=<subnet_id> security group=<security_group_id>

  • assume_shared_fs (bool) – assume that cluster nodes share a common filesystem (default true).

  • cluster_status (str) – status command for cluster execution. If None, Snakemake will rely on flag files. Otherwise, it expects the command to return “success”, “failure” or “running” when executing with a cluster jobid as a single argument.

  • cluster_cancel (str) – command to cancel multiple job IDs (like SLURM ‘scancel’) (default None)

  • cluster_cancel_nargs (int) – maximal number of job ids to pass to cluster_cancel (default 1000)

  • cluster_sidecar (str) – command that starts a sidecar process, see cluster documentation (default None)

  • export_cwl (str) – Compile workflow to CWL and save to given file

  • log_handler (list) – redirect snakemake output to this custom log handler, a function that takes a log message dictionary (see below) as its only argument (default None). The log message dictionary for the log handler has to following entries:

  • keep_incomplete (bool) – keep incomplete output files of failed jobs

  • edit_notebook (object) – “notebook.EditMode” object to configure notebook server for interactive editing of a rule notebook. If None, do not edit.

  • scheduler (str) – Select scheduling algorithm (default ilp)

  • scheduler_ilp_solver (str) – Set solver for ilp scheduler.

  • overwrite_groups (dict) – Rule to group assignments (default None)

  • group_components (dict) – Number of connected components given groups shall span before being split up (1 by default if empty)

  • conda_not_block_search_path_envvars (bool) – Do not block search path envvars (R_LIBS, PYTHONPATH, …) when using conda environments.

  • scheduler_solver_path (str) – Path to Snakemake environment (this can be used to e.g. overwrite the search path for the ILP solver used during scheduling).

  • conda_base_path (str) – Path to conda base environment (this can be used to overwrite the search path for conda, mamba, and activate).

  • local_groupid (str) – Local groupid to use as a placeholder for groupid-referrring input functions of local jobs (internal use only, default: local).

  • log_handler

    redirect snakemake output to this list of custom log handlers, each a function that takes a log message dictionary (see below) as its only argument (default []). The log message dictionary for the log handler has to following entries:


    the log level (“info”, “error”, “debug”, “progress”, “job_info”)

    level=”info”, “error” or “debug”

    the log message


    number of already executed jobs


    number of total jobs


    list of input files of a job


    list of output files of a job


    path to log file of a job


    whether a job is executed locally (i.e. ignoring cluster)


    the job message


    the job reason


    the job priority


    the threads of the job


True if workflow execution was successful.

Return type