Navigation and service

Big Days on JUWELS

What Big Days/Big Weeks are:

Big Days follow the tradition of the previous Big Week event series on JUQUEEN. For a Big Day on JUWELS, the system will be reserved exclusively for jobs requesting at least 512 nodes. The Big Day event series is intended to prepare for upcoming Big Week events on JUWELS, including the upcoming Booster module installed in 2020. Similar to Big Days, a Big Week will reserve the system for large production jobs for an entire week. The technical details, e.g. how jobs will be schedule, are explained below. The Big Day and Big Week events are part of the JSC's Exascale application preparation services.

Execution of Big Days:

During a Big Day, between 09:00 and 16:00 local time, the batch and mem192 partition of the JUWELS system are closed. The GPU and development partitions remain available. Only the large partition is open for jobs requesting at least 512 nodes. During the first four hours (09:00 - 13:00), only jobs with less than 30 minutes wall-clock time are scheduled, afterwards followed by longer-running large jobs. The time window for the execution of short jobs is chosen to allow for the execution of a sufficiently large number. At 16:00, the batch and mem192 partition are opened again but large jobs can continue to execute up to their 24-hour wall-clock limit. In order to optimize system utilization and minimize wait times during the Big Day, JSC staff may optimize job placement. In particular, scheduling may not be priority based.

Frequency of Big Days:

Big Days will initially be scheduled regularly every two weeks. The first Big Days are planned for

  • Tuesday, 2020-03-10
  • Tuesday, 2020-03-24
  • Tuesday, 2020-04-07

JSC may adjust the schedule, with sufficient prior notice, in order to align with maintenance periods and/or to account for other events (e.g., courses) on the system. The date of the next Big Day can be found in the "message of the day"/MOTD (Highmessage) when logging into a JUWELS login node.

Preparation of Big Days:

In order to be able to prepare and schedule the Big Days, we kindly ask interested users to submit jobs at least one day before the start of the Big Day and get in touch with the technical advisor of their project and the SC support. Please note that this request for early submission does not exclude (re-)submission of jobs during the Big Day itself, but it will enable JSC to anticipate interest and overall participation.

MPI tuning for large jobs:

JSC suggests to use the latest installed versions of ParaStation MPI and Intel MPI for large jobs as several improvements related to the memory consumption for large task counts have been included in both implementations recently. Open MPI is currently available in the Devel2019a stage and will be moved to the production stage by the time of the first Big Day. However, little experience with large scale jobs with Open MPI on JUWELS is available.

JSC publishes advice for appropriate MPI tuning parameters via the mpi-settings module. This module becomes available after loading the MPI runtime. For ParaStation and Intel MPI, please use the following command to leverage a memory-scalable low-level transport layer:

ml mpi-settings/large-job-mpi

Known problems affecting large jobs:

There are currently no problems known to block the execution of large jobs on JUWELS. However, together with Intel, JSC is currently investigating problems with the occurrence of seemingly random stalls on individual cores that affect the scaling efficiency for large core counts. This issue occurs on multiple supercomputers at different sites with similar Intel Xeon processors as used in JUWELS. Currently three applications have been identified to suffer from this problem.