Slurm

From Halfface
Revision as of 07:37, 26 October 2015 by Ekaanbj (talk | contribs) (Created page with "==install slurm under fedora 21== # Build slurm rpmbuild -ta slurm*.tar.bz2 # Install rpms. yum -y install munge slurm slurm-plugins slurm-munge # configure munge dd if=/dev/...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

install slurm under fedora 21

  1. Build slurm
rpmbuild -ta slurm*.tar.bz2
  1. Install rpms.

yum -y install munge slurm slurm-plugins slurm-munge

  1. configure munge

dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.key chmod 0600 /etc/munge/munge.key chown munge /etc/munge/munge.key systemctl start munge

  1. Change config to better suit fedora 21
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmdPidFile=/var/run/slurm/slurmd.pid
StateSaveLocation=/var/lib/slurm/slurm.state
SlurmdSpoolDir=/var/lib/slurm/slurm.spool
  1. create corresponding directories.
DIR=/var/run/slurm;chown slurm:slurm $DIR ; chmod 755 $DIR
  1. update systemd config files for slurm daemons to point to new location of pid file.
vim /usr/lib/systemd/system/slurmctld.service /usr/lib/systemd/system/slurmd.service

test installation

  1. Test the installation.

Generate a credential on stdout.

munge -n

Check if a credential can be locally decoded.

munge -n | unmunge

Check if a credential can be remotely decoded.

munge -n | ssh somehost unmunge

Run a quick benchmark.

remunge
  1. how does it work
scontrol show config
  1. check priorities of jobs using the command
scontrol show job".
  1. Submit a job
sbatch /tmp/slurm_test_1
  1. List jobs:
squeue
  1. Get job details:
scontrol show job 106
  1. Suspend a job (root only):
scontrol suspend 135
  1. Resume a job (root only):
scontrol resume 135
  1. Kill a job. Users can kill their own jobs, root can kill any job.
scancel 135
  1. Hold a job
scontrol hold 139
  1. Release a job:
scontrol release 139
  1. List partitions:
sinfo
  1. example job script.
#!/usr/bin/env bash
#SBATCH -p defq
#SBATCH -J simple
sleep 60