restore_from_checkpoint_folder = restore_from_checkpoint_folder) File "/root/github/Challenge/Task_1/fets_challenge/experiment.py", line 286, in run_challenge_experiment task_runner = copy(plan).get_task_runner(collaborator_data_loaders[col]) File "/root/setup/envs/venv/lib/python3.7/site-packages/openfl/federated/plan/plan.py", line 389, in get_task_runner self.runner_ = Plan.build(**defaults) File "/root/setup/envs/venv/lib/python3.7/site-packages/openfl/federated/plan/plan.py", line 182, in build instance = getattr(module, class_name)(**settings) File "/root/setup/envs/venv/lib/python3.7/site-packages/openfl/federated/task/runner_fets_challenge.py", line 43, in __init__ model, optimizer, train_loader, val_loader, scheduler, params = create_pytorch_objects(fets_config_dict, train_csv=train_csv, val_csv=val_csv, device=device) File "/root/setup/envs/venv/lib/python3.7/site-packages/GANDLF/compute/generic.py", line 55, in create_pytorch_objects ) = get_class_imbalance_weights(parameters["training_data"], parameters) File "/root/setup/envs/venv/lib/python3.7/site-packages/GANDLF/utils/tensor.py", line 357, in get_class_imbalance_weights loader_type="penalty", File "/root/setup/envs/venv/lib/python3.7/site-packages/GANDLF/data/ImagesFromDataFrame.py", line 200, in ImagesFromDataFrame subject.load() File "/root/setup/envs/venv/lib/python3.7/site-packages/torchio/data/subject.py", line 368, in load image.load() File "/root/setup/envs/venv/lib/python3.7/site-packages/torchio/data/image.py", line 498, in load tensor = torch.cat(tensors) RuntimeError: [enforce fail at CPUAllocator.cpp:67] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 17856000 bytes. Error code 12 (Cannot allocate memory) ``` I do see the following in the log ``` Device requested via CUDA_VISIBLE_DEVICES: 0 Total number of CUDA devices: 1 Device finally used: 0 Sending model to aforementioned device Memory Total : 15.8 GB, Allocated: 0.3 GB, Cached: 0.3 GB Device - Current: 0 Count: 1 Name: Tesla V100-PCIE-16GB Availability: True ```" />

Hi, I have been trying to run the challenge script on GPU. Am using 1 V100 GPU with 16GB of memory, have CUDA_VISIBLE_DEVICES=0, and have set `devices = 'cuda'` in the [script](https://github.com/FETS-AI/Challenge/blob/main/Task_1/FeTS_Challenge.py#L536) . However I keep encountering the following, any idea what could be going wrong? Thanks Note: the script runs fine for `small_split.csv` but **NOT** for `partitioning_1.csv` ``` Traceback (most recent call last): File "FeTS_Challenge.py", line 568, in restore_from_checkpoint_folder = restore_from_checkpoint_folder) File "/root/github/Challenge/Task_1/fets_challenge/experiment.py", line 286, in run_challenge_experiment task_runner = copy(plan).get_task_runner(collaborator_data_loaders[col]) File "/root/setup/envs/venv/lib/python3.7/site-packages/openfl/federated/plan/plan.py", line 389, in get_task_runner self.runner_ = Plan.build(**defaults) File "/root/setup/envs/venv/lib/python3.7/site-packages/openfl/federated/plan/plan.py", line 182, in build instance = getattr(module, class_name)(**settings) File "/root/setup/envs/venv/lib/python3.7/site-packages/openfl/federated/task/runner_fets_challenge.py", line 43, in __init__ model, optimizer, train_loader, val_loader, scheduler, params = create_pytorch_objects(fets_config_dict, train_csv=train_csv, val_csv=val_csv, device=device) File "/root/setup/envs/venv/lib/python3.7/site-packages/GANDLF/compute/generic.py", line 55, in create_pytorch_objects ) = get_class_imbalance_weights(parameters["training_data"], parameters) File "/root/setup/envs/venv/lib/python3.7/site-packages/GANDLF/utils/tensor.py", line 357, in get_class_imbalance_weights loader_type="penalty", File "/root/setup/envs/venv/lib/python3.7/site-packages/GANDLF/data/ImagesFromDataFrame.py", line 200, in ImagesFromDataFrame subject.load() File "/root/setup/envs/venv/lib/python3.7/site-packages/torchio/data/subject.py", line 368, in load image.load() File "/root/setup/envs/venv/lib/python3.7/site-packages/torchio/data/image.py", line 498, in load tensor = torch.cat(tensors) RuntimeError: [enforce fail at CPUAllocator.cpp:67] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 17856000 bytes. Error code 12 (Cannot allocate memory) ``` I do see the following in the log ``` Device requested via CUDA_VISIBLE_DEVICES: 0 Total number of CUDA devices: 1 Device finally used: 0 Sending model to aforementioned device Memory Total : 15.8 GB, Allocated: 0.3 GB, Cached: 0.3 GB Device - Current: 0 Count: 1 Name: Tesla V100-PCIE-16GB Availability: True ```

Created by ambrish
Thanks! I was using 120G CPU RAM which was insufficient it seems. 150G seems to be adequate.
This is CPU Memory Error (RAM). I get the same issues with 32GB RAM, which cannot even be solved by increasing the available space in the hard disk for virtual paging. I can run it on 128GB RAM, not sure if 64GB is enough.

Steps for using GPU - processes running out of memory page is loading…