snakemake package

Subpackages

Submodules

snakemake.api module

snakemake.benchmark module

snakemake.benchmark.BENCHMARK_INTERVAL = 30: Interval (in seconds) between measuring resource usage

snakemake.benchmark.BENCHMARK_INTERVAL_SHORT = 0.5: Interval (in seconds) between measuring resource usage before BENCHMARK_INTERVAL

class snakemake.benchmark.BenchmarkRecord(jobid=None, rule_name=None, wildcards=None, params=None, running_time=None, max_rss=None, max_vms=None, max_uss=None, max_pss=None, io_in=None, io_out=None, cpu_usage=None, cpu_time=None, resources=None, threads=None, input=None)[source]

Bases: object

Record type for benchmark times

cpu_time: CPU usage (user and system) in seconds

cpu_usage: Count of CPU seconds, divide by running time to get mean load estimate

data_collected: Track if data has been collected

first_time: First time when we measured CPU load, for estimating total running time

get_benchmarks(extended_fmt=False)[source]

classmethod get_header(extended_fmt=False)[source]

input: Job input

input_size_mb()[source]

io_in: I/O read in bytes

io_out: I/O written in bytes

jobid: Job ID

max_pss: Maximal PSS in MB

max_rss: Maximal RSS in MB

max_uss: Maximal USS in MB

max_vms: Maximal VMS in MB

mean_load()[source]

params: Job parameters

parse_params()[source]

parse_resources()[source]

parse_wildcards()[source]

prev_time: Previous point when measured CPU load, for estimating total running time

processed_procs: Set with procs that has been skipped

resources: Job resources

rule_name: Rule name

running_time: Running time in seconds

skipped_procs: Set with procs that has been saved

threads: Job threads

timedelta_to_str(x)[source]: Conversion of timedelta to str without fractions of seconds

to_json(extended_fmt)[source]: Return str with the JSON representation of this record

to_tsv(extended_fmt)[source]: Return str with the TSV representation of this record

wildcards: Job wildcards

class snakemake.benchmark.BenchmarkTimer(pid, bench_record, interval=30)[source]

Bases: ScheduledPeriodicTimer

Allows easy observation of a given PID for resource usage

bench_record: BenchmarkRecord to write results to

pid: PID of observed process

procs: Cache of processes to keep track of cpu percent

work()[source]: Write statistics

class snakemake.benchmark.DaemonTimer(interval, function, args=None, kwargs=None)[source]

Bases: Thread

A variant of threading.The timer that is daemonized

cancel()[source]: Stop the timer if it hasn’t finished yet.

run()[source]

Method representing the thread’s activity.

You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.

class snakemake.benchmark.ScheduledPeriodicTimer(interval)[source]

Bases: object

Scheduling of periodic events

Up to self._interval, schedule actions per second, above schedule events in self._interval second gaps.

cancel()[source]: Call to cancel any events

start()[source]: Start the intervalic timer

work()[source]: Override to perform the action

snakemake.benchmark.benchmarked(pid=None, benchmark_record=None, interval=30)[source]

Measure benchmark parameters while within the context manager

Yields a BenchmarkRecord with the results (values are set after leaving context).

If pid is None then the PID of the current process will be used. If benchmark_record is None then a new BenchmarkRecord is created and returned, otherwise, the object passed as this parameter is returned.

Usage:

with benchmarked() as bench_result:
    pass

snakemake.benchmark.print_benchmark_jsonl(records, file_, extended_fmt)[source]: Write benchmark records to file-like the object

snakemake.benchmark.print_benchmark_tsv(records, file_, extended_fmt)[source]: Write benchmark records to file-like the object

snakemake.benchmark.write_benchmark_records(records, path, extended_fmt)[source]: Write benchmark records to file at path

snakemake.checkpoints module

class snakemake.checkpoints.Checkpoint(rule, checkpoints)[source]

Bases: object

checkpoints

get(**wildcards)[source]

rule

class snakemake.checkpoints.CheckpointJob(rule, output)[source]

Bases: object

output

rule

class snakemake.checkpoints.Checkpoints[source]

Bases: object

A namespace for checkpoints so that they can be accessed via dot notation.

register(rule, fallback_name=None)[source]

snakemake.cli module

snakemake.cwl module

snakemake.cwl.cwl(path, basedir, input, output, params, wildcards, threads, resources, log, config, rulename, use_singularity, bench_record, jobid, sourcecache_path, runtime_sourcecache_path)[source]: Load cwl from the given basedir + path and execute it.

snakemake.cwl.dag_to_cwl(dag)[source]: Convert a given DAG to a CWL workflow, which is returned as a JSON object.

snakemake.cwl.job_to_cwl(job, dag, outputs, inputs)[source]: Convert a job with its dependencies to a CWL workflow step.

snakemake.dag module

snakemake.decorators module

snakemake.decorators.dec_all_methods(decorator, prefix='test_')[source]

snakemake.exceptions module

exception snakemake.exceptions.AmbiguousRuleException(filename, job_a, job_b, lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.AzureFileException(msg, lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.CacheMissException[source]: Bases: Exception

exception snakemake.exceptions.CheckSumMismatchException(*args, lineno=None, snakefile=None, rule=None)[source]

Bases: WorkflowError

“should be called to indicate that checksum of a file compared to known hash does not match, typically done with large downloads, etc.

exception snakemake.exceptions.ChildIOException(parent=None, child=None, wildcards=None, lineno=None, snakefile=None, rule=None)[source]: Bases: WorkflowError

exception snakemake.exceptions.CliException(msg)[source]: Bases: Exception

exception snakemake.exceptions.ClusterJobException(job_info, jobid)[source]: Bases: RuleException

exception snakemake.exceptions.CreateCondaEnvironmentException(*args, lineno=None, snakefile=None, rule=None)[source]: Bases: WorkflowError

exception snakemake.exceptions.CreateRuleException(message=None, include=None, lineno=None, snakefile=None, rule=None)[source]: Bases: RuleException

exception snakemake.exceptions.CyclicGraphException(repeatedrule, file, rule=None)[source]: Bases: RuleException

exception snakemake.exceptions.DropboxFileException(msg, lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.FTPFileException(msg, lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.HTTPFileException(msg, lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.IOException(prefix, job, files, include=None, lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.IOFileException(msg, lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.ImproperOutputException(job, files, include=None, lineno=None, snakefile=None)[source]: Bases: IOException

exception snakemake.exceptions.ImproperShadowException(rule, lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.IncompleteCheckpointException(rule, targetfile)[source]: Bases: Exception

exception snakemake.exceptions.IncompleteFilesException(files)[source]: Bases: RuleException

exception snakemake.exceptions.InputFunctionException(msg, wildcards=None, lineno=None, snakefile=None, rule=None)[source]: Bases: WorkflowError

exception snakemake.exceptions.InputOpenException(iofile)[source]: Bases: Exception

exception snakemake.exceptions.LockException[source]: Bases: WorkflowError

exception snakemake.exceptions.LookupError(msg=None, exc=None, query=None, dpath=None)[source]: Bases: WorkflowError

exception snakemake.exceptions.MissingInputException(job, files, include=None, lineno=None, snakefile=None)[source]: Bases: IOException

exception snakemake.exceptions.MissingOutputException(message=None, include=None, lineno=None, snakefile=None, rule=None, jobid='')[source]: Bases: RuleException

exception snakemake.exceptions.MissingOutputFileCachePathException[source]: Bases: Exception

exception snakemake.exceptions.MissingRuleException(file, lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.NCBIFileException(msg, lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.NoRulesException(lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.PeriodicWildcardError(message=None, include=None, lineno=None, snakefile=None, rule=None)[source]: Bases: RuleException

exception snakemake.exceptions.ProtectedOutputException(job, files, include=None, lineno=None, snakefile=None)[source]: Bases: IOException

exception snakemake.exceptions.RemoteFileException(msg, lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.ResourceScopesException(msg, invalid_resources)[source]: Bases: Exception

exception snakemake.exceptions.RuleException(message=None, include=None, lineno=None, snakefile=None, rule=None)[source]

Bases: Exception

Base class for exception occurring within the execution or definition of rules.

property messages

exception snakemake.exceptions.S3FileException(msg, lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.SFTPFileException(msg, lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.SourceFileError(msg)[source]: Bases: WorkflowError

exception snakemake.exceptions.SpawnedJobError[source]: Bases: Exception

exception snakemake.exceptions.TerminatedException[source]: Bases: Exception

exception snakemake.exceptions.UnknownRuleException(name, prefix='', lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.WebDAVFileException(msg, lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.WildcardError(*args, lineno=None, snakefile=None, rule=None)[source]: Bases: WorkflowError

exception snakemake.exceptions.XRootDFileException(msg, lineno=None, snakefile=None)[source]: Bases: RuleException

exception snakemake.exceptions.ZenodoFileException(msg, lineno=None, snakefile=None)[source]: Bases: RuleException

snakemake.exceptions.cut_traceback(ex)[source]

snakemake.exceptions.format_error(ex, lineno, linemaps=None, snakefile=None, show_traceback=False, rule=None)[source]

snakemake.exceptions.format_exception_to_string(ex, linemaps=None)[source]

Returns the error message for a given exception as a string.

Arguments ex – the exception linemaps – a dict of a dict that maps for each snakefile

the compiled lines to source code lines in the snakefile.

snakemake.exceptions.format_traceback(tb, linemaps)[source]

snakemake.exceptions.get_exception_origin(ex, linemaps)[source]

snakemake.exceptions.is_file_not_found_error(exc, considered_files)[source]

snakemake.exceptions.log_verbose_traceback(ex)[source]

snakemake.exceptions.print_exception(ex, linemaps=None)[source]

Print an error message for a given exception.

Arguments ex – the exception linemaps – a dict of a dict that maps for each snakefile

the compiled lines to source code lines in the snakefile.

snakemake.exceptions.print_exception_warning(ex, linemaps=None, footer_message='')[source]

Print an error message for a given exception using logger warning.

Arguments ex – the exception linemaps – a dict of a dict that maps for each snakefile

the compiled lines to source code lines in the snakefile.

snakemake.exceptions.update_lineno(ex, linemaps)[source]

snakemake.gui module

snakemake.io module

class snakemake.io.AnnotatedString(value)[source]

Bases: str, AnnotatedStringInterface

property flags: Dict[str, Any]

is_callable()[source]

Return type:: bool

new_from(new_value)[source]

class snakemake.io.AnnotatedStringInterface[source]

Bases: ABC

abstract property flags: Dict[str, Any]

abstract is_callable()[source]

Return type:: bool

is_flagged(flag)[source]

Return type:: bool

class snakemake.io.AttributeGuard(name)[source]: Bases: object

class snakemake.io.ExistsDict(cache)[source]: Bases: dict

class snakemake.io.IOCache(max_wait_time)[source]

Bases: IOCacheStorageInterface

clear()[source]

async collect_mtime(path)[source]

deactivate()[source]

property exists_in_storage

property exists_local

property mtime

async mtime_inventory(jobs, n_workers=8)[source]

property size

snakemake.io.IOFile(file, rule=None)[source]

class snakemake.io.InputFiles(toclone=None, fromdict=None, plainstr=False, strip_constraints=False, custom_map=None)[source]

Bases: Namedlist

property size

property size_files

property size_files_gb

property size_files_kb

property size_files_mb

property size_gb

property size_kb

property size_mb

class snakemake.io.Log(toclone=None, fromdict=None, plainstr=False, strip_constraints=False, custom_map=None)[source]: Bases: Namedlist

class snakemake.io.Namedlist(toclone=None, fromdict=None, plainstr=False, strip_constraints=False, custom_map=None)[source]

Bases: list

A list that additionally provides functions to name items. Further, it is hashable, however, the hash does not consider the item names.

get(key, default_value=None)[source]

items()[source]

keys()[source]

update(items)[source]

class snakemake.io.OutputFiles(toclone=None, fromdict=None, plainstr=False, strip_constraints=False, custom_map=None)[source]: Bases: Namedlist

class snakemake.io.Params(toclone=None, fromdict=None, plainstr=False, strip_constraints=False, custom_map=None)[source]: Bases: Namedlist

class snakemake.io.PeriodicityDetector(min_repeat=20, max_repeat=100)[source]

Bases: object

is_periodic(value)[source]: Returns the periodic substring or None if not periodic.

class snakemake.io.QueueInfo(queue, finish_sentinel, last_checked=None, finished=False)[source]

Bases: object

consume(wildcards)[source]

finish_sentinel: Any

finished: bool = False

items: List[Any]

last_checked: Optional[float] = None

queue: Queue

update_last_checked()[source]

class snakemake.io.ReportObject(caption, category, subcategory, labels, patterns, htmlindex)

Bases: tuple

caption: Alias for field number 0

category: Alias for field number 1

htmlindex: Alias for field number 5

labels: Alias for field number 3

patterns: Alias for field number 4

subcategory: Alias for field number 2

class snakemake.io.Resources(toclone=None, fromdict=None, plainstr=False, strip_constraints=False, custom_map=None)[source]: Bases: Namedlist

class snakemake.io.Wildcards(toclone=None, fromdict=None, plainstr=False, strip_constraints=False, custom_map=None)[source]: Bases: Namedlist

snakemake.io.ancient(value)[source]: A flag for an input file that shall be considered ancient; i.e. its timestamp shall have no effect on which jobs to run.

snakemake.io.apply_wildcards(pattern, wildcards)[source]

snakemake.io.checkpoint_target(value)[source]

snakemake.io.contains_wildcard(path)[source]

snakemake.io.contains_wildcard_constraints(pattern)[source]

snakemake.io.directory(value)[source]: A flag to specify that output is a directory, rather than a file or named pipe.

snakemake.io.ensure(value, non_empty=False, sha256=None)[source]

snakemake.io.expand(*args, **wildcard_values)[source]

Expand wildcards in given filepatterns.

Arguments *args – first arg: filepatterns as list or one single filepattern,

second arg (optional): a function to combine wildcard values (itertools.product per default)

**wildcard_values – the wildcards as keyword arguments: with their values as lists. If allow_missing=True is included wildcards in filepattern without values will stay unformatted.

snakemake.io.flag(value, flag_type, flag_value=True)[source]

snakemake.io.from_queue(queue, finish_sentinel=None)[source]

snakemake.io.get_flag_value(value, flag_type)[source]

snakemake.io.get_wildcard_constraints(pattern)[source]

snakemake.io.get_wildcard_names(pattern)[source]

snakemake.io.glob_wildcards(pattern, files=None, followlinks=False)[source]: Glob the values of the wildcards by matching the given pattern to the filesystem. Returns a named tuple with a list of values for each wildcard.

snakemake.io.iocache(func)[source]

snakemake.io.is_callable(value)[source]

snakemake.io.is_flagged(value, flag)[source]

Return type:: bool

snakemake.io.limit(pattern, **wildcards)[source]

Limit wildcards to the given values.

Arguments: **wildcards – the wildcards as keyword arguments

with their values as lists

snakemake.io.local(value)[source]: Mark a file as a local file. This disables the application of a default storage provider.

snakemake.io.lutime(file, times)[source]

snakemake.io.multiext(prefix, *extensions)[source]: Expand a given prefix with multiple extensions (e.g. .txt, .csv, _peaks.bed, …).

snakemake.io.pipe(value)[source]

snakemake.io.pretty_print_iofile(iofile)[source]

Return type:: str

snakemake.io.protected(value)[source]: A flag for a file that shall be write-protected after creation.

snakemake.io.regex_from_filepattern(filepattern)[source]

async snakemake.io.remove(file, remove_non_empty_dir=False, only_local=False)[source]

snakemake.io.repeat(value, n_repeat)[source]: Flag benchmark records with the number of repeats.

snakemake.io.report(value, caption=None, category=None, subcategory=None, labels=None, patterns=[], htmlindex=None)[source]

Flag output file or directory as to be included into reports.

In the case of a directory, files to include can be specified via a glob pattern (default: *).

Arguments value – File or directory. caption – Path to a .rst file with a textual description of the result. category – Name of the (optional) category in which the result should be displayed in the report. subcategory – Name of the (optional) subcategory columns – Dict of strings (may contain wildcard expressions) that will be used as columns when displaying result tables patterns – Wildcard patterns for selecting files if a directory is given (this is used as

input for snakemake.io.glob_wildcards). Pattern shall not include the path to the directory itself.

snakemake.io.service(value)[source]

snakemake.io.sourcecache_entry(value, orig_path_or_uri)[source]

snakemake.io.strip_wildcard_constraints(pattern)[source]: Return a string that does not contain any wildcard constraints.

snakemake.io.temp(value)[source]: A flag for an input or output file that shall be removed after usage.

snakemake.io.temporary(value)[source]: An alias for temp.

snakemake.io.touch(value)[source]

snakemake.io.unpack(value)[source]

snakemake.io.update_wildcard_constraints(pattern, wildcard_constraints, global_wildcard_constraints)[source]

Update wildcard constraints

Parameters:

pattern (str) – pattern on which to update constraints
wildcard_constraints (dict) – dictionary of wildcard:constraint key-value pairs
global_wildcard_constraints (dict) – dictionary of wildcard:constraint key-value pairs

async snakemake.io.wait_for_files(files, latency_wait=3, wait_for_local=False, ignore_pipe_or_service=False, consider_local=frozenset({}))[source]: Wait for given files to be present in the filesystem.

snakemake.ioflags module

snakemake.ioflags.before_update(value)[source]: Flag an input file to be used as is in storage/on-disk before being updated in a later rule. This flag leads to the input file being considered as not being created by any other job.

snakemake.ioflags.register_in_globals(_globals)[source]

snakemake.ioflags.update(value)[source]: A flag for an output file that shall be updated instead of overwritten.

snakemake.jobs module

snakemake.logging module

class snakemake.logging.ColorizingTextHandler(nocolor=False, stream=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>, mode=None, formatter=None, filter=None)[source]

Bases: StreamHandler

Custom handler that combines colorization and Snakemake-specific formatting.

BLACK = 0

BLUE = 4

BOLD_SEQ = '\x1b[1m'

COLOR_SEQ = '\x1b[%dm'

CYAN = 6

GREEN = 2

MAGENTA = 5

RED = 1

RESET_SEQ = '\x1b[0m'

WHITE = 7

YELLOW = 3

can_color_tty(mode)[source]: Colors are supported when: 1. Terminal is not “dumb” 2. Running in subprocess mode 3. Using a TTY on non-Windows systems

colors = {'CRITICAL': 5, 'DEBUG': 4, 'ERROR': 1, 'INFO': 2, 'WARNING': 3}

decorate(record, message)[source]: Add color to the log message based on its level.

emit(record)[source]: Emit a log message with custom formatting and color.

property is_tty

yellow_info_events = [LogEvent.RUN_INFO, LogEvent.SHELLCMD, LogEvent.JOB_STARTED, None]

class snakemake.logging.DefaultFilter(quiet, debug_dag, dryrun)[source]

Bases: object

filter(record)[source]

class snakemake.logging.DefaultFormatter(quiet, show_failed_logs=False, printshellcmds=False)[source]

Bases: Formatter

format(record)[source]: Override format method to format Snakemake-specific log messages.

format_d3dag(msg)[source]: Format for d3dag log.

format_dag_debug(msg)[source]: Format for dag_debug log.

format_group_error(msg)[source]: Format for group_error log.

format_group_info(msg)[source]: Format for group_info log.

format_host(msg)[source]: Format for host log.

format_info(msg)[source]: Format ‘info’ level messages.

format_job_error(msg)[source]: Format for job_error log.

format_job_finished(msg)[source]: Format for job_finished log.

format_job_info(msg)[source]: Format for job_info log.

format_progress(msg)[source]: Format for progress log.

format_run_info(msg)[source]: Format the run_info log messages.

format_shellcmd(msg)[source]: Format for shellcmd log.

class snakemake.logging.LoggerManager(logger)[source]

Bases: object

cleanup_logfile()[source]

get_logfile()[source]

logfile_hint()[source]

setup(mode, handlers, settings)[source]

setup_logfile()[source]

stop()[source]

snakemake.logging.format_dict(dict_like, omit_keys=None, omit_values=None)[source]

snakemake.logging.format_percentage(done, total)[source]: Format percentage from given fraction while avoiding superfluous precision.

snakemake.logging.format_resource_names(resources, omit_resources=['_cores', '_nodes'])[source]

snakemake.logging.format_resources(dict_like, *, omit_keys={'_cores', '_nodes'}, omit_values=None)

snakemake.logging.format_wildcards(dict_like, omit_keys=None, omit_values=None)

snakemake.logging.get_event_level(record)[source]

Gets snakemake log level from a log record. If there is no snakemake log level, returns the log record’s level name.

Return type:: tuple[LogEvent, str]
Parameters:: record (logging.LogRecord)
Returns:: tuple[LogEvent, str]

snakemake.logging.is_quiet_about(quiet, msg_type)[source]

snakemake.logging.show_logs(logs)[source]: Helper method to show logs.

snakemake.logging.timestamp()[source]: Helper method to format the timestamp.

snakemake.modules module

class snakemake.modules.ModuleInfo(workflow, name, snakefile=None, meta_wrapper=None, config=None, skip_validation=False, replace_prefix=None, prefix=None)[source]

Bases: object

get_rule_whitelist(rules)[source]

get_snakefile()[source]

get_wrapper_tag()[source]

use_rules(rules=None, name_modifier=None, exclude_rules=None, ruleinfo=None, skip_global_report_caption=False)[source]

class snakemake.modules.WorkflowModifier(workflow, parent_modifier=None, globals=None, config=None, base_snakefile=None, skip_configfile=False, skip_validation=False, skip_global_report_caption=False, resolved_rulename_modifier=None, local_rulename_modifier=None, rule_whitelist=None, rule_exclude_list=None, ruleinfo_overwrite=None, allow_rule_overwrite=False, replace_prefix=None, prefix=None, replace_wrapper_tag=None, namespace=None, rule_proxies=None)[source]

Bases: object

inherit_rule_proxies(child_modifier)[source]

modify_path(path, property=None)[source]

modify_rulename(rulename)[source]

modify_wrapper_uri(wrapper_uri, pattern=re.compile('^master/'))[source]

skip_rule(rulename)[source]

snakemake.modules.get_name_modifier_func(rules=None, name_modifier=None, parent_modifier=None)[source]

snakemake.notebook module

class snakemake.notebook.JupyterNotebook(path, cache_path, source, basedir, input_, output, params, wildcards, threads, resources, log, config, rulename, conda_env, conda_base_path, container_img, singularity_args, env_modules, bench_record, jobid, bench_iteration, cleanup_scripts, shadow_dir, is_local)[source]

Bases: ScriptBase

draft()[source]

draft_and_edit(listen)[source]

editable = True

execute_script(fname, edit=None)[source]

abstract get_interpreter_exec()[source]

abstract get_language_name()[source]

insert_preamble_cell(preamble, notebook)[source]

remove_preamble_cell(notebook)[source]

write_script(preamble, fd)[source]

class snakemake.notebook.PythonJupyterNotebook(path, cache_path, source, basedir, input_, output, params, wildcards, threads, resources, log, config, rulename, conda_env, conda_base_path, container_img, singularity_args, env_modules, bench_record, jobid, bench_iteration, cleanup_scripts, shadow_dir, is_local)[source]

Bases: JupyterNotebook

get_interpreter_exec()[source]

get_language_name()[source]

get_preamble()[source]

class snakemake.notebook.RJupyterNotebook(path, cache_path, source, basedir, input_, output, params, wildcards, threads, resources, log, config, rulename, conda_env, conda_base_path, container_img, singularity_args, env_modules, bench_record, jobid, bench_iteration, cleanup_scripts, shadow_dir, is_local)[source]

Bases: JupyterNotebook

get_interpreter_exec()[source]

get_language_name()[source]

get_preamble()[source]

snakemake.notebook.get_cell_sources(source)[source]

snakemake.notebook.get_exec_class(language)[source]

snakemake.notebook.notebook(path, basedir, input, output, params, wildcards, threads, resources, log, config, rulename, conda_env, conda_base_path, container_img, singularity_args, env_modules, bench_record, jobid, bench_iteration, cleanup_scripts, shadow_dir, edit, sourcecache_path, runtime_sourcecache_path)[source]: Load a script from the given basedir + path and execute it.

snakemake.output_index module

class snakemake.output_index.OutputIndex(rules)[source]

Bases: object

Look up structure for rules, that can be queried by the output products which they create.

match(targetfile)[source]

Returns all rules that match the given target file, considering only the prefix and suffix up to the first wildcard.

To further verify the match, the returned rules should be checked with Rule.is_producer(targetfile).

Return type:: set[Rule]

match_producers(targetfile)[source]

Returns all rules that match and produce the given target file.

Return type:: set[Rule]

snakemake.parser module

snakemake.path_modifier module

class snakemake.path_modifier.PathModifier(replace_prefix, prefix, workflow)[source]

Bases: object

apply_default_storage(path)[source]: Apply the defined default remote provider to the given path and return the updated _IOFile. Asserts that default remote provider is defined.

property modifies_prefixes: bool

modify(path, property=None)[source]

replace_prefix(path, property=None)[source]

snakemake.persistence module

snakemake.profiles module

class snakemake.profiles.ProfileConfigFileParser[source]

Bases: YAMLConfigFileParser

parse(stream)[source]

Parses the keys and values from a config file.

NOTE: For keys that were specified to configargparse as action=”store_true” or “store_false”, the config file value must be one of: “yes”, “no”, “on”, “off”, “true”, “false”. Otherwise an error will be raised.

Parameters:: stream (IO) – A config file input stream (such as an open file object).
Returns:: Items where the keys are strings and the values are either strings or lists (eg. to support config file formats like YAML which allow lists).
Return type:: OrderedDict

snakemake.resources module

class snakemake.resources.DefaultResources(args=None, from_other=None, mode='full')[source]

Bases: object

property args

bare_defaults = {'tmpdir': 'system_tmpdir'}

classmethod decode_arg(arg)[source]

defaults = {'disk_mb': 'max(2*input.size_mb, 1000)', 'mem_mb': 'min(max(2*input.size_mb, 1000), 8000)', 'tmpdir': 'system_tmpdir'}

classmethod encode_arg(name, value)[source]

set_resource(name, value)[source]

class snakemake.resources.GroupResources[source]

Bases: object

classmethod basic_layered(toposorted_jobs, constraints, run_local, additive_resources=None, sortby=None)[source]

Basic implementation of group job resources calculation

Each toposort level is individually sorted into a series of layers, where each layer fits within the constraints. Resource constraints represent a “width” into which the layer must fit. For instance, with a mem_mb constraint of 5G, all the jobs in a single layer must together not consume more than 5G of memory. Any jobs that would exceed this constraint are pushed into a new layer. The overall width for the entire group job is equal to the width of the widest layer.

Additive resources (by default, “runtime”) represent the “height” of the layer. They are not directly constrained, but their value will be determined by the sorting of jobs based on other constraints. Each layer’s height is equal to the height of its tallest job. For instance, a layer containing a 3hr job will have a runtime height of 3 hr. The total height of the entire group job will be the sum of the heights of all the layers.

Note that both height and width are multidimensial, so layer widths will be calculated with respect to every constraint created by the user.

In this implementation, there is no mixing of layers, which may lead to “voids”. For instance, a layer containing a tall, skinny job of 3hr length and 1G mem combined with a short, fat job of 10min length and 20G memory would have a 2hr 50min period where 19G of memory are not used. In practice, this void will be filled by the actual snakemake subprocess, which performs real-time scheduling of jobs as resources become available. But it may lead to overestimation of resource requirements.

To help mitigate against voids, this implementation sorts the jobs within a toposort level before assignment to layers. Jobs are first sorted by their overall width relative to the available constraints. So the fattest jobs will grouped together on the same layer. Jobs are then sorted by the resources specified in sortby, by default “runtime”. So jobs of similar length will be grouped on the same layer.

Users can help mitigate against voids by grouping jobs of similar resource dimensions. Eclectic groups of various runtimes and resource consumptions will not be estimated as efficiently as groups of homogeneous consumptions.

Parameters:

toposorted_jobs (list of lists of jobs) – Jobs sorted into toposort levels: the jobs in each level only depend on jobs in previous levels.
constraints (dict of str -> int) – Upper limit of resource allowed. Resources without constraints will be treated as infinite
run_local (bool) – True if the group is being run in the local process, rather than being submitted. Relevant for Pipe groups and Service groups
additive_resources (list of str, optional) – Resources to be treated as the “height” of each layer, i.e. to be summed across layers.
sortby (list of str, optional) – Resources by which to sort jobs prior to layer assignment.

Returns:

Total resource requirements of the group job

Return type:

Dict of str -> int,str

Raises:

WorkflowError – Raised if an individual job requires more resources than the constraints allow (chiefly relevant for pipe groups)

class snakemake.resources.ParsedResource(orig_arg, value)[source]

Bases: object

orig_arg: str

value: Any

class snakemake.resources.ResourceScopes(*args, **kwargs)[source]

Bases: UserDict

Index of resource scopes, where each entry is ‘RESOURCE’: ‘SCOPE’

Each resource may be scoped as local, global, or excluded. Any resources not specified are considered global.

classmethod defaults()[source]

property excluded

Resources not submitted to cluster jobs

These resources are used exclusively by the global scheduler. The primary case is for additive resources in GroupJobs such as runtime, which would not be properly handled by the scheduler in the sub-snakemake instance. This scope is not currently intended for use by end-users and is thus not documented

Return type:: set

property globals

Resources tallied across all job and group submissions.

Return type:: set

property locals

Resources are not tallied by the global scheduler when submitting jobs

Each submitted job or group gets its own pool of the resource, as specified under –resources.

Return type:: set

snakemake.resources.eval_resource_expression(val, threads_arg=True)[source]

snakemake.resources.infer_resources(name, value, resources)[source]: Infer resources from a given one, if possible.

snakemake.resources.is_humanfriendly_resource(value)[source]

snakemake.resources.parse_resources(resources_args, fallback=None)[source]: Parse resources from args.

snakemake.ruleinfo module

class snakemake.ruleinfo.InOutput(paths, kwpaths, modifier)

Bases: tuple

kwpaths: Alias for field number 1

modifier: Alias for field number 2

paths: Alias for field number 0

class snakemake.ruleinfo.RuleInfo(func=None)[source]

Bases: object

apply_modifier(modifier, rulename, prefix_replacables={'benchmark', 'input', 'log', 'output'})[source]: Update this ruleinfo with the given one (used for ‘use rule’ overrides).

ref_attributes = {'func', 'path_modifier'}

snakemake.rules module

class snakemake.rules.Rule(name, workflow, lineno=None, snakefile=None)[source]

Bases: RuleInterface

apply_input_function(func, wildcards, incomplete_checkpoint_func=<function Rule.<lambda>>, raw_exceptions=False, groupid=None, **aux_params)[source]

apply_path_modifier(item, path_modifier, property=None)[source]

property benchmark

check_caching()[source]

check_output_duplicates()[source]: Check Namedlist for duplicate entries and raise a WorkflowError on problems. Does not raise if the entry is empty.

check_wildcards(wildcards)[source]

property conda_env

property container_img

expand_benchmark(wildcards)[source]

expand_conda_env(wildcards, params=None, input=None)[source]

expand_group(wildcards)[source]: Expand the group given wildcards.

expand_input(wildcards, groupid=None)[source]

expand_log(wildcards)[source]

expand_output(wildcards)[source]

expand_params(wildcards, input, output, job, omit_callable=False)[source]

expand_resources(wildcards, input, attempt, skip_evaluation=None)[source]

get_some_product()[source]

static get_wildcard_len(wildcards)[source]

Return the length of the given wildcard values.

Arguments wildcards – a dict of wildcards

get_wildcards(requested_output, wildcards_dict=None)[source]

Return wildcard dictionary by 1. trying to format the output with the given wildcards and comparing with the requested output 2. matching regular expression output files to the requested concrete ones.

Arguments requested_output – a concrete filepath

property group

has_products()[source]

has_wildcards()[source]: Return True if rule contains wildcards.

property input

property is_cwl

property is_notebook

is_producer(requested_output)[source]: Returns True if this rule is a producer of the requested output.

property is_run

property is_script

property is_shell

property is_template_engine

property is_wrapper

property lineno

property log

property name

property output

property params

products(include_logfiles=True)[source]

register_wildcards(wildcard_names)[source]

property restart_times

set_input(*input, **kwinput)[source]

Add a list of input files. Recursive lists are flattened.

Arguments input – the list of input files

set_log(*logs, **kwlogs)[source]

set_output(*output, **kwoutput)[source]

Add a list of output files. Recursive lists are flattened.

After creating the output files, they are checked for duplicates.

Arguments output – the list of output files

set_params(*params, **kwparams)[source]

set_wildcard_constraints(**kwwildcard_constraints)[source]

property snakefile

update_wildcard_constraints()[source]

property wildcard_constraints

property wildcard_names

class snakemake.rules.RuleProxy(rule)[source]

Bases: object

property benchmark

property input

property log

property output

property params

class snakemake.rules.Ruleorder[source]

Bases: object

add(*rulenames)[source]: Records the order of given rules as rule1 > rule2 > rule3, …

compare(rule1, rule2)[source]: Return whether rule2 has a higher priority than rule1.

snakemake.scheduler module

snakemake.shell module

class snakemake.shell.shell(cmd, *args, iterable=False, read=False, bench_record=None, **kwargs)[source]

Bases: object

classmethod check_output(cmd, **kwargs)[source]

classmethod cleanup()[source]

conda_block_conflicting_envvars = True

classmethod executable(cmd)[source]

classmethod get_executable()[source]

static iter_stdout(proc, cmd, tmpdir)[source]

classmethod kill(jobid)[source]

classmethod prefix(prefix)[source]

classmethod suffix(suffix)[source]

classmethod terminate(jobid)[source]

classmethod win_command_prefix(cmd)[source]: The command prefix used on windows when specifying a explicit shell executable. This would be “-c” for bash. Note: that if no explicit executable is set commands are executed with Popen(…, shell=True) which uses COMSPEC on windows where this is not needed.

snakemake.sourcecache module

class snakemake.sourcecache.GenericSourceFile(path_or_uri)[source]

Bases: SourceFile

get_filename()[source]

get_path_or_uri()[source]

property is_local

is_persistently_cacheable()[source]

class snakemake.sourcecache.GithubFile(repo=None, path=None, tag=None, branch=None, commit=None, host=None)[source]

Bases: HostingProviderFile

get_path_or_uri()[source]

class snakemake.sourcecache.GitlabFile(repo=None, path=None, tag=None, branch=None, commit=None, host=None)[source]

Bases: HostingProviderFile

get_path_or_uri()[source]

class snakemake.sourcecache.HostingProviderFile(repo=None, path=None, tag=None, branch=None, commit=None, host=None)[source]

Bases: SourceFile

Marker for denoting github source files from releases.

get_basedir()[source]

get_filename()[source]

property is_local

is_persistently_cacheable()[source]

join(path)[source]

mtime()[source]

If possible, return mtime of the file. Otherwise, return None.

Return type:: Optional[float]

property ref

valid_repo = re.compile('^.+/.+$')

class snakemake.sourcecache.LocalGitFile(repo_path, path, tag=None, ref=None, commit=None)[source]

Bases: SourceFile

get_basedir()[source]

get_filename()[source]

get_path_or_uri()[source]

property is_local

is_persistently_cacheable()[source]

join(path)[source]

property ref

class snakemake.sourcecache.LocalSourceFile(path)[source]

Bases: SourceFile

abspath()[source]

get_filename()[source]

get_path_or_uri()[source]

property is_local

is_persistently_cacheable()[source]

isabs()[source]

mtime()[source]: If possible, return mtime of the file. Otherwise, return None.

simplify_path()[source]

class snakemake.sourcecache.SourceCache(cache_path, runtime_cache_path=None)[source]

Bases: object

cache_whitelist = ['https://raw.githubusercontent.com/snakemake/snakemake-wrappers/\\d+\\.\\d+.\\d+']

exists(source_file)[source]

get_path(source_file)[source]

open(source_file, mode='r')[source]

property runtime_cache_path

class snakemake.sourcecache.SourceFile[source]

Bases: ABC

get_basedir()[source]

get_cache_path()[source]

abstract get_filename()[source]

abstract get_path_or_uri()[source]

abstract property is_local

abstract is_persistently_cacheable()[source]

join(path)[source]

mtime()[source]: If possible, return mtime of the file. Otherwise, return None.

simplify_path()[source]

snakemake.sourcecache.infer_source_file(path_or_uri, basedir=None)[source]

snakemake.spawn_jobs module

snakemake.storage module

snakemake.target_jobs module

snakemake.target_jobs.parse_target_jobs_cli_args(target_jobs_args)[source]

snakemake.utils module

class snakemake.utils.AlwaysQuotedFormatter(quote_func=None, *args, **kwargs)[source]

Bases: QuotedFormatter

Subclass of QuotedFormatter that always quotes.

Usage is identical to QuotedFormatter, except that it always acts like “q” was appended to the format spec, unless u (for unquoted) is appended.

format_field(value, format_spec)[source]

class snakemake.utils.Paramspace(dataframe, filename_params=None, param_sep='~', filename_sep='_', single_wildcard=None)[source]

Bases: object

A wrapper for pandas dataframes that provides helpers for using them as a parameter space in Snakemake.

This is heavily inspired by @soumitrakp work on JUDI (https://github.com/ncbi/JUDI).

By default, a directory structure with on folder level per parameter is created (e.g. column1~{column1}/column2~{column2}/***).

The exact behavior can be tweaked with four parameters:

filename_params takes a list of column names of the passed dataframe. These names are used to build the filename (separated by ‘_’) in the order in which they are passed. All remaining parameters will be used to generate a directory structure. Example for a data frame with four columns named column1 to column4:

Paramspace(df, filename_params=["column3", "column2"]) ->

column1~{value1}/column4~{value4}/column3~{value3}_column2~{value2}

If filename_params="*", all columns of the dataframe are encoded into the filename instead of parent directories.

param_sep takes a string which is used to join the column name and column value in the generated paths (Default: ‘~’). Example:

Paramspace(df, param_sep=":") ->

column1:{value1}/column2:{value2}/column3:{value3}/column4:{value4}

filename_sep takes a string which is used to join the parameter entries listed in filename_params in the generated paths (Default: ‘_’). Example:

Paramspace(df, filename_params="*", filename_sep="-") ->

column1~{value1}-column2~{value2}-column3~{value3}-column4~{value4}

single_wildcard takes a string which is used to replace the default behavior of using a wildcard for each column in the dataframe with a single wildcard that is used to encode all column values. The given string is the name of that wildcard. The value of the wildcard for individual instances of the paramspace is still controlled by above other arguments. The single_wildcard mechanism can be handy if you want to define a rule that shall be used for multiple paramspaces with different columns.

instance(wildcards)[source]: Obtain instance (dataframe row) with the given wildcard values.

property instance_patterns: Iterator over all instances of the parameter space (dataframe rows), formatted as file patterns of the form column1~{value1}/column2~{value2}/… or of the provided custom pattern.

property wildcard_pattern: Wildcard pattern over all columns of the underlying dataframe of the form column1~{column1}/column2~{column2}/*** or of the provided custom pattern.

class snakemake.utils.QuotedFormatter(quote_func=None, *args, **kwargs)[source]

Bases: Formatter

Subclass of string.Formatter that supports quoting.

Using this formatter, any field can be quoted after formatting by appending “q” to its format string. By default, shell quoting is performed using “shlex.quote”, but you can pass a different quote_func to the constructor. The quote_func simply has to take a string argument and return a new string representing the quoted form of the input string.

Note that if an element after formatting is the empty string, it will not be quoted.

format_field(value, format_spec)[source]

snakemake.utils.R(code)[source]

Execute R code.

This is deprecated in favor of the script directive. This function executes the R code given as a string. The function requires rpy2 to be installed.

Parameters:: code (str) – R code to be executed

class snakemake.utils.SequenceFormatter(separator=' ', element_formatter=<string.Formatter object>, *args, **kwargs)[source]

Bases: Formatter

string.Formatter subclass with special behavior for sequences.

This class delegates the formatting of individual elements to another formatter object. Non-list objects are formatted by calling the delegate formatter’s “format_field” method. List-like objects (list, tuple, set, frozenset) are formatted by formatting each element of the list according to the specified format spec using the delegate formatter and then joining the resulting strings with a separator (space by default).

format_element(elem, format_spec)[source]

Format a single element

For sequences, this is called once for each element in a sequence. For anything else, it is called on the entire object. It is intended to be overridden in subclases.

format_field(value, format_spec)[source]

class snakemake.utils.Unformattable(errormsg='This cannot be used for formatting')[source]: Bases: object

snakemake.utils.argvquote(arg, force=True)[source]: Returns an argument quoted in such a way that CommandLineToArgvW on Windows will return the argument string unchanged. This is the same thing Popen does when supplied with a list of arguments. Arguments in a command line should be separated by spaces; this function does not add these spaces. This implementation follows the suggestions outlined here: https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/

snakemake.utils.available_cpu_count()[source]

Return the number of available virtual or physical CPUs on this system. The number of available CPUs can be smaller than the total number of CPUs when the cpuset(7) mechanism is in use, as is the case on some cluster systems.

Adapted from https://stackoverflow.com/a/1006301/715090

snakemake.utils.cmd_exe_quote(arg)[source]: Quotes an argument in a cmd.exe compliant way.

snakemake.utils.find_bash_on_windows()[source]: Find the path to a usable bash on windows. The first attempt is to look for a bash installed with a git conda package. Alternatively, try bash installed with ‘Git for Windows’.

snakemake.utils.format(_pattern, *args, stepout=1, _quote_all=False, quote_func=None, **kwargs)[source]

Format a pattern in Snakemake style.

This means that keywords embedded in braces are replaced by any variable values that are available in the current namespace.

snakemake.utils.linecount(filename)[source]

Return the number of lines of the given file.

Parameters:: filename (str) – the path to the file

snakemake.utils.listfiles(pattern, restriction=None, omit_value=None)[source]

Yield a tuple of existing filepaths for the given pattern.

Wildcard values are yielded as the second tuple item.

Parameters:

pattern (str) – a filepattern. Wildcards are specified in snakemake syntax, e.g. “{id}.txt”
restriction (dict) – restrict to wildcard values given in this dictionary
omit_value (str) – wildcard value to omit

Yields:

tuple – The next file matching the pattern, and the corresponding wildcards object

snakemake.utils.makedirs(dirnames)[source]: Recursively create the given directory or directories without reporting errors if they are present.

snakemake.utils.min_version(version)[source]: Require minimum snakemake version, raise workflow error if not met.

snakemake.utils.os_sync()[source]: Ensure flush to disk

snakemake.utils.read_job_properties(jobscript, prefix='# properties', pattern=re.compile('# properties = (.*)'))[source]

Read the job properties defined in a snakemake jobscript.

This function is a helper for writing custom wrappers for the snakemake –cluster functionality. Applying this function to a jobscript will return a dict containing information about the job.

snakemake.utils.report(text, path, stylesheet=None, defaultenc='utf8', template=None, metadata=None, **files)[source]

Create an HTML report using python docutils.

This is deprecated in favor of the –report flag.

Attention: This function needs Python docutils to be installed for the python installation you use with Snakemake.

All keywords not listed below are interpreted as paths to files that shall be embedded into the document. The keywords will be available as link targets in the text. E.g. append a file as keyword arg via F1=input[0] and put a download link in the text like this:

report('''
==============
Report for ...
==============

Some text. A link to an embedded file: F1_.

Further text.
''', outputpath, F1=input[0])

Instead of specifying each file as a keyword arg, you can also expand
the input of your rule if it is completely named, e.g.:

report('''
Some text...
''', outputpath, **input)

Parameters:

text (str) – The “restructured text” as it is expected by python docutils.
path (str) – The path to the desired output file
stylesheet (str) – An optional path to a CSS file that defines the style of the document. This defaults to <your snakemake install>/report.css. Use the default to get a hint on how to create your own.
defaultenc (str) – The encoding that is reported to the browser for embedded text files, defaults to utf8.
template (str) – An optional path to a docutils HTML template.
metadata (str) – E.g. an optional author name or email address.

snakemake.utils.simplify_path(path)[source]: Return a simplified version of the given path.

snakemake.utils.update_config(config, overwrite_config)[source]

Recursively update dictionary config with overwrite_config in-place.

See https://stackoverflow.com/questions/3232943/update-value-of-a-nested-dictionary-of-varying-depth for details.

Parameters:

config (dict) – dictionary to update
overwrite_config (dict) – dictionary whose items will overwrite those in config

snakemake.utils.validate(data, schema, set_default=True)[source]

Validate data with JSON schema at given path.

Parameters:

data (object) – data to validate. Can be a config dict or a pandas data frame.
schema (str) – Path to JSON schema used for validation. The schema can also be in YAML format. If validating a pandas data frame, the schema has to describe a row record (i.e., a dict with column names as keys pointing to row values). See https://json-schema.org. The path is interpreted relative to the Snakefile when this function is called.
set_default (bool) – set default values defined in schema. See https://python-jsonschema.readthedocs.io/en/latest/faq/ for more information

snakemake.workflow module

snakemake.wrapper module

snakemake.wrapper.find_extension(source_file, sourcecache)[source]

snakemake.wrapper.get_conda_env(path, prefix=None)[source]

snakemake.wrapper.get_path(path, prefix=None)[source]

snakemake.wrapper.get_script(path, sourcecache, prefix=None)[source]

snakemake.wrapper.is_script(source_file)[source]

snakemake.wrapper.is_url(path)[source]

snakemake.wrapper.wrapper(path, input, output, params, wildcards, threads, resources, log, config, rulename, conda_env, conda_base_path, container_img, singularity_args, env_modules, bench_record, prefix, jobid, bench_iteration, cleanup_scripts, shadow_dir, sourcecache_path, runtime_sourcecache_path)[source]: Load a wrapper from https://github.com/snakemake/snakemake-wrappers under the given path + wrapper.(py|R|Rmd) and execute it.

snakemake package

Subpackages

Submodules

snakemake.api module

snakemake.benchmark module

snakemake.checkpoints module

snakemake.cli module

snakemake.cwl module

snakemake.dag module

snakemake.decorators module

snakemake.exceptions module

snakemake.gui module

snakemake.io module

snakemake.ioflags module

snakemake.jobs module

snakemake.logging module

snakemake.modules module

snakemake.notebook module

snakemake.output_index module

snakemake.parser module

snakemake.path_modifier module

snakemake.persistence module

snakemake.profiles module

snakemake.resources module

snakemake.ruleinfo module

snakemake.rules module

snakemake.scheduler module

snakemake.shell module

snakemake.sourcecache module

snakemake.spawn_jobs module

snakemake.storage module

snakemake.target_jobs module

snakemake.utils module

snakemake.workflow module

snakemake.wrapper module

Module contents