I've constructed an AWS Batch queue in the us-east-1 region. However, jobs that are run using this queue are unable to download files from synapse after authentication. The same code works fine if I run it on an EC2 instance (not through AWS Batch). Thanks.
Created by mpetljak Excellent! I've confirmed that it works now. Thanks so much! @mpetljak We've made some changes to the bucket configuration to support VPC Endpoints. Can you try again to confirm if this is working now from your Batch job. The VPC id is vpc-063bea2aaa81bcae1. Have you considered making the bucket requester pays so that you wouldn't have to have a whitelist of IP addresses? Thanks. Ok thanks @mpetljak.
The traceroute without any hops I think confirms that the batch instance is connecting to S3 directly with its private IP using an S3 VPC Endpoint which is resulting in the 403 as the private IP isn't known in the current whitelist. If possible if you can verify the existence of the S3 VPC Endpoint and if possible get the VpcId would be great but in the meantime I will see what adjustments we can make for this use case generally anyway. @jordan.kiang FYI, I used the script available at https://docs.opendata.aws/genomics-workflows/orchestration/cromwell/cromwell-overview/ to create my AWS resources. The output from the submit host on EC2 (which can download):
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default ip-10-0-128-1.e 0.0.0.0 UG 0 0 0 eth0
10.0.128.0 * 255.255.240.0 U 0 0 0 eth0
instance-data.e * 255.255.255.255 UH 0 0 0 eth0
172.17.0.0 * 255.255.0.0 U 0 0 0 docker0
traceroute to s3.amazonaws.com (52.216.140.222), 30 hops max, 60 byte packets
1 216.182.226.50 (216.182.226.50) 12.844 ms 216.182.231.50 (216.182.231.50) 15.775 ms 216.182.229.186 (216.182.229.186) 38.253 ms
2 100.66.12.250 (100.66.12.250) 20.704 ms 100.65.80.240 (100.65.80.240) 8.094 ms 100.66.8.84 (100.66.8.84) 10.971 ms
3 100.66.38.200 (100.66.38.200) 16.046 ms 100.66.10.134 (100.66.10.134) 18.975 ms 100.66.11.218 (100.66.11.218) 54.628 ms
4 100.66.35.1 (100.66.35.1) 8.192 ms 100.66.34.127 (100.66.34.127) 3.828 ms 100.66.42.248 (100.66.42.248) 14.349 ms
5 100.66.33.185 (100.66.33.185) 7.401 ms 100.66.34.167 (100.66.34.167) 8.154 ms 100.66.33.173 (100.66.33.173) 7.363 ms
6 100.65.71.137 (100.65.71.137) 19.236 ms 100.66.33.149 (100.66.33.149) 3.545 ms 100.65.70.9 (100.65.70.9) 22.771 ms
7 s3-1.amazonaws.com (52.216.140.222) 0.799 ms 0.773 ms 0.743 ms
The output from the batch instance (which fails):
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default ip-10-0-32-1.ec 0.0.0.0 UG 0 0 0 eth0
10.0.32.0 0.0.0.0 255.255.224.0 U 0 0 0 eth0
instance-data.e 0.0.0.0 255.255.255.255 UH 0 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
traceroute to s3.amazonaws.com (52.216.138.141), 30 hops max, 60 byte packets
1 * * *
2 * * *
3 * * *
4 * * *
5 * * *
6 * * *
7 * * *
8 * * *
9 * * *
10 * * *
11 * * *
12 * * *
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * * Hi @mpetljak,
That IP address is included in the IP whitelist covering us-east-1 so that isn't the issue.
Is there an S3 VPC Endpoint configured in this VPC on the route table used by the subnet with the EC2 instance that the Batch job is running on? I've set up an AWS Batch job using your container image and I'm able to retrieve an entity in a bucket with the same IP policy, but not if there is an S3 VPC Endpoint configured. If so it causes the routing to go through a private address that's not on the whitelist resulting in a 403 like this.
Also is the EC2 instance you mentioned in [a reply](https://www.synapse.org/#!Synapse:syn2580853/discussion/threadId=6934&replyId=22059) above and the EC2 instance created by AWS Batch running in the same VPC and subnet?
Could you run the following commands on both the AWS Batch provisioned instance where the job is failing and the EC2 instance where the job runs successfully and show the output?
```
/sbin/route
```
```
# an AWS batch provisioned instance might not have traceroute installed...
sudo yum install traceroute
traceroute s3.amazonaws.com
```
Thanks! Output is 54.156.213.64. Thanks. They are still having an issue replicating your error. Could you run the following command by submitting it as an AWS Batch job command and get the output from the logs or just run the command directly on the EC2 instance in the Batch ECS cluster on which the job was failing with the 403 error we could see the public gateway IP that the Batch job ran with and compare it to our whitelist.
```
curl icanhazip.com
``` Our engineering team is looking into it currently. I'll get back to you if they have anymore questions. Thanks! Is it possible that the IP filter that limits downloads of syn18759102 to instances running in the us-east-1 region does not cover spot instances created by AWS batch? Thanks, Is the test entity limited to downloading from us-east-1? Thanks. That worked in AWS Batch and using EC2 directly. Thanks. Could you try the download with EC2 direct and AWS Batch but using this test file from base synapse storage? The test file is: syn21904837 Here's the Dockerfile:
FROM continuumio/miniconda3:4.6.14
SHELL ["/bin/bash", "-c"]
RUN apt-get update && \
apt-get install --no-install-recommends -y build-essential dpkg-dev gnupg lsb-release parallel procps && \
export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && \
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
apt-get update && apt-get install -y google-cloud-sdk
RUN pip install --upgrade pip && \
pip install synapseclient This is an interesting issue. It looks like it could be a bug so I've filed an issue for our development team. However they might want to recreate the issue using your docker environment, if that is shareable or amendable to uploading to synapse they would be most appreciable. I repeated the same code. Please note that I used docker to run the identical code on ec2 and aws batch. The ec2 download works until it runs out of local disk space. Thanks. Ok that looks like a permissions error, but could you repeat the code on an ec2 and upload the output? I uploaded the log. Thanks. Feel free to upload here: syn21902072 The code is executed on each batch node. Can I email you the contents of the debug output as it contains signed URLs? Thanks. Also is the code you posted executed before running the batch job, it will need to be executed by the script run on every node the batch job recruits.
RE:
```
import synapseclient
syn = synapseclient.Synapse()
syn.login(username, password)
entity = syn.get("syn18759102")
filepath = entity.path
``` Have you tried
```
syn = synapseclient.Synapse(debug=True)
syn.login() # provide credentials as usual here
``` The code I'm using is:
```
import synapseclient
syn = synapseclient.Synapse()
syn.login(username, password)
entity = syn.get("syn18759102")
filepath = entity.path
```
Synapse prints out a welcome message, so it appears that login succeeded.
This is what is printed to stderr.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 475 100 475 0 0 108k 0 --:--:-- --:--:-- --:--:-- 115k
Traceback (most recent call last):
File "", line 4, in
File "/opt/conda/lib/python3.7/site-packages/synapseclient/client.py", line 603, in get
return self._getWithEntityBundle(entityBundle=bundle, entity=entity, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/synapseclient/client.py", line 719, in _getWithEntityBundle
self._download_file_entity(downloadLocation, entity, ifcollision, submission)
File "/opt/conda/lib/python3.7/site-packages/synapseclient/client.py", line 777, in _download_file_entity
downloadPath = self._downloadFileHandle(entity.dataFileHandleId, objectId, objectType, downloadPath)
File "/opt/conda/lib/python3.7/site-packages/synapseclient/client.py", line 1614, in _downloadFileHandle
expected_md5=fileHandle.get('contentMd5'))
File "/opt/conda/lib/python3.7/site-packages/synapseclient/client.py", line 1707, in _download_from_URL
exceptions._raise_for_status(response, verbose=self.debug)
File "/opt/conda/lib/python3.7/site-packages/synapseclient/core/exceptions.py", line 149, in _raise_for_status
raise SynapseHTTPError(message, response=response)
synapseclient.core.exceptions.SynapseHTTPError: 403 Client Error: Forbidden
Have you tried with a dummy/test synapse file? Also if you could post the actual error and possibly a way for us to re-create the error that would most helpful! We use the python client and authenticate with a username and password before attempting to download on a batch node. Thanks. Hi @mpetljak
Do the individual nodes that batch source have a synapse config file to authenticate your user details?