Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not sure why everyone is going on about certificate transparency logs when the answer is right there in the user agent. The company is scanning the ipv4 space and came upon your IP and port.


Finding IP does not mean finding the domain. When doing HTTP request to IP you specify the domain you want to connect to. For example you can configure your /etc/hosts to have xxxnakedhamsters.google.com pointing to 8.8.8.8 and make the http request, which will cause Google getting the domain request (i.e. header Host: xxxnakedhamsters.google.com) and it will refuse it or try to redirect to http. Of course it's only related to HTTP because HTTPS will require certificate. That's why they're speaking about certificates.


But there's no evidence in the OP's post that they have, in fact, discovered the domain. The only thing posted is that there is a GET request to a listening web server.

The OP and all the people talking about certificates are making the same assumption. Namely that the scanning company discovered the DNS name for the server and tried to connect. When, if fact, they simply iterate through IP address blocks and make get requests to any listening web servers they find.


I really doubt CloudFlare gives them an IPv4 and they can see all the logs for said IPv4


OP states that the domain was discovered


No they didn't. They said "How did the internet find my subdomain?" They're assuming the internet found their subdomain. They don't provide any evidence that happened, just that they found their IP address.


Depending on the web server's configuration, you very much _can_ find the domain which is configured on an IP address, by attempting to connect to that IP address via HTTPS and seeing what certificate gets served. Here's an example:

https://138.68.161.203/

> Web sites prove their identity via certificates. Firefox does not trust this site because it uses a certificate that is not valid for 138.68.161.203. The certificate is only valid for the following names: exhaust.lewiscollard.com, www.exhaust.lewiscollard.com


I don't think that does you any good for Cloudflare, though. They will definitely be using SNI.


That doesn't really matter, though. While OP is using Cloudflare, the actual server behind it is still a publicly-accessible IP address that an IPv4 space scanner can easily stumble upon.


I misunderstood, I thought the subdomain was an R2 bucket. If it's just normal Cloudflare proxying to some backend this is probably the most likely answer.

That said, while I think it's not the case here, using Cloudflare doesn't mean the underlying host is accessible, as even on the free tier you can use Cloudflare Tunnels, which I often do.


they only state they are using cloudflare for DNS, they didn't say if they were proxying the connection


Also a valid point. I guess without more details all we can really do is speculate about the exact setup. That said, I do now agree that the most likely answer is "the underlying host was accessible and caught by an IPv4 scanner" since well, that's pretty much what it says anyway.


First thing I’d do for an IP that answers is a reverse lookup, so I expect that’s at least in the list of things they’d try.


> When doing HTTP request to IP you specify the domain you want to connect to

No, you make HTTP requests to an IP, not a domain. You convert the domain name to an IP in an earlier step (via a DNS query). You can connect to servers using their raw IPs and open ports all day if you like, which is what's happening here. Yes servers will (likely) reject the requests by looking at the host header, but they will still receive the request.


It's rather hilarious that nobody mentioned this in 7 hours. What am I missing?

~5 billion scans in a few hours is nothing for a company with decent resources. OP: in case you didn't follow, they're literally trying every possible IPv4 address and seeing if something exists on standard ports at that address.

I believe it would be harder to find out your domain that way if you were using SNI and only forwarded/served requests that used the correct host. But if you aren't using SNI, your server is probably just responding to any TLS connect request with your subdomain's cert, which will reveal your hostname.


> What am I missing?

That it was in fact mentioned many hours earlier, in more than one top level comment.


I was referring more to the fact that the user agent explicitly contained the answer, rather than suggestions that it was IP scanning. But you're right I do see one comment that mentions that. And many more likely assumed the OP already figured that part out.


The user agent contains a partial answer. IP scanning doesn't give you the actual subdomain, so the question is slightly wrong or there are missing pieces.


Judging by the logs (user agents really) right now in the submission, it's hard to tell if the requests were actually for the domain (since the request headers aren't included) or just for the IP.


Yes, that's the question being wrong option I listed.


> What am I missing?

It's very common for people to read only up to the point they feel they can comment, then skip immediately to the comment. So, basically, noone read it.


Funny, that'd be so unthinkable for me to do! But you're probably right.


Just the default hostname. It won't reveal all of them or any of the IP addresses of that box. secret-freedom-fighter.ice-cream-shop.example.com could have the same IP as example.com and you'd only know example.com


If you've got one cert with a subject alt name for each host, they'd see them all. If you use SNI and they have different certificates, the domains might still be in Certificate Transparency logs. If a wildcard cert is used, that could help to conceal the exact subdomain.


Okay. But how did they get the proper host header?


There are a couple easy possibilities depending on server config.

1. Not using SNI, and all https requests just respond with the same cert. (Example, go to https://209.216.230.207/ and you'll get a certificate error. Go to the cert details and you'll see the common name is news.ycombinator.com).

2. http upgrades to https with a redirect to the hostname, not IP address. (Example, go to http://209.216.230.207/ and you get a 301 redirect to https://news.ycombinator.com)


Could be a number of ways for example a default TLS cert, or a default vhost redirect.

I actually had a job once a few years ago where I was asked to hide a web service from crawlers and so I did some of these things to ensure no info leaked about the real vhost.


I don't think op said that they had the correct host header?


Who says they did?


Also it's Palo Alto. They're not some kiddie scripters. https://en.m.wikipedia.org/wiki/Palo_Alto_Networks


Hm?

They sell you security but provide you with CVEs en masse.

https://www.cybersecuritydive.com/news/palo-alto-networks--h...


Ah yes we all know if you sell a firewall the code has to be 100% bug free unbreakable


Looking at how they earned their 100s of CVEs, script kiddie almost looks like a compliment


Am I google when I come with the useragent 'google here, no evil'?


That perfectly fits midwit meme. Lots of people are smart enough to know transparency logs - but not smart enough to read OP post and understand the details.


The details aren't there, so it's "assume" rather than "understand".

The only proper response to OP's question is to ask for clarification: is the subdomain pointing to a separate IP? Are the logs vhost-specific or not?

If you don't get the answers, all you can do is to assume, and both assumptions may end up being right or wrong (with varying probability, perhaps).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: