Slurm: Difference between revisions

From Halfface
Jump to navigation Jump to search
No edit summary
Line 9: Line 9:
  chown munge /etc/munge/munge.key
  chown munge /etc/munge/munge.key
  systemctl start munge
  systemctl start munge
==troubleshooting==
Restore node.
scontrol update nodename=www state=resume


==test installation==
==test installation==
Line 50: Line 47:
  #SBATCH -J simple
  #SBATCH -J simple
  sleep 60
  sleep 60
==troubleshooting==
Restore node.
scontrol update nodename=www state=resume

Revision as of 22:07, 12 January 2016

install slurm under fedora 21

Build slurm

rpmbuild -ta slurm*.tar.bz2

Install rpms.

yum -y install munge slurm slurm-plugins slurm-munge

configure munge

dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.key
chmod 0600 /etc/munge/munge.key
chown munge /etc/munge/munge.key
systemctl start munge

test installation

Generate a credential on stdout.

munge -n

Check if a credential can be locally decoded.

munge -n | unmunge

Check if a credential can be remotely decoded.

munge -n | ssh somehost unmunge

Run a quick benchmark.

remunge

how does it work

scontrol show config

check priorities of jobs using the command

scontrol show job".

job control

Submit a job

sbatch /tmp/slurm_test_1

List jobs:

squeue

Get job details:

scontrol show job 106

Suspend a job (root only):

scontrol suspend 135

Resume a job (root only):

scontrol resume 135

Kill a job. Users can kill their own jobs, root can kill any job.

scancel 135

Hold a job

scontrol hold 139

Release a job:

scontrol release 139

List partitions:

sinfo

example job script.

#!/usr/bin/env bash
#SBATCH -p defq
#SBATCH -J simple
sleep 60

troubleshooting

Restore node.

scontrol update nodename=www state=resume