Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: VisaWhen – Data on US visa issuance backlogs
104 points by underyx on July 1, 2021 | hide | past | favorite | 26 comments
Heya! Not the usual sort of thing to be posted here, but I wanted to show off what I made yesterday. Here's a sample page about H1-B visas issued in Bogota:

<https://visawhen.com/consulates/bogota/h1b>

The code is source-available (not open source) at <https://github.com/underyx/visawhen>. It's my first time choosing a source-available license over MIT, mainly out of fear of existing immigration startups just gobbling this data and code up; frankly I didn't think the implications through though, I just threw a safe license on there.

The way the project works is:

- Use requests-html to find publicly available PDFs from government pages

- Use camelot to OCR the PDFs and extract data tables from them

- Since the previous step takes crazy long for my tastes (around 8000 pages at around 5 seconds each) I've used dask to split the work into chunks and parallel-process them across my laptop's CPUs.

- Do data cleanup and processing with pandas, and save all of it to a SQLite file.

- Take data from the SQLite file with next.js and generate a static HTML page for each possible embassy - visa type combination

- The pages use ECharts to visualize data, and Bulma as a CSS framework

- Build and host each commit via Netlify

- But proxy to Netlify from CloudFlare, which I believe has more edge locations in the free plan

- Collect any donations via Ko-Fi

- Use Google Analytics to have a general idea about visitor counts

- Use FullStory session recordings to find out about bugs – I've fixed quite a few and I think I'll probably remove this tracking after a bit of time

…and that's where I'm at now. I'm pretty happy about the results. Most pages load in less than 300ms, which is something I care about all too much. More importantly, I've shared the site with some immigration communities I'm part of, and the response has been very positive! Let me know what y'all think.



I downloaded your consulates.sqlite3 file and opened it up in https://datasette.io/ on my laptop - if you do the same (and run "datasette install datasette-vega" to get the charting plugin)

Having done that...

http://127.0.0.1:8001/consulates/backlogs?_facet=Post+Slug&_...

Full page screenshot here: https://static.simonwillison.net/static/2021/consulate-backl...

Shows an interesting graph where the number of L1 visa issuances in London drops from around 500 a month to 0 around March/April 2020, eventually climbing to between 19 and 65 per month in the past few months up to today.

This is a really neat dataset, congrats!


That’s super cool, very useful. I would be glad if instead there weren’t absurd delays.


Pretty cool. I once drove from Victoria to Calgary and back to get a US tourist visa because Vancouver had a 30 day wait and Calgary had 3 days.

I lucked out a drove there and back during a chinook, so roads were good, the drive was epic.

I recall I was able to see the bookings available at each consulate, but I think I’d been preliminarily approved, or paid something at that stage.

Nice work on the site, adding any opacity to beauracratic processes is a positive in my book.


Neat! I’ve built a little site on some similar US visa data, https://visa.ooo


This site is awesome, thanks for sharing. Wish I had something like this a few years ago. The USCIS is nearly impossible to reach by phone, unless you memorize a very specific set of options and get lucky.


Nice website! Seems like this is exactly the missing 'Step 1' from https://visawhen.com/ :)


Maybe alphabetize the list of visa types.


Yeah, or I was thinking they should be sorted by amount issued, most to least popular.


That's not clear from that list.

I was expecting to see it in alphabetical order, but then was confused that it wasn't.

Good job on that site though!


Oh, sorry, I worded that comment confusingly. The list is not sorted at all yet. My intention was to later (this weekend?) get the numbers for visa type popularity and display them next to the type (as well a description of the visa in human language :D). Then I feel like it'd be a lot more logical to see the list sorted descending by popularity.


Probably best to add some sorting filter then.

Someone interested in waiting times is probably just interested to find their visa type they applied for quickly in a long list.

And then there are some who just want to see ranking, i.e. what you said.


Would be cool to see a list of consulates ordered by speed for a given visa class. In the US ATM but have to leave the country to renew the H1B (ugh), but don't care where I do it. The thing has been authorised, just need to do the DS-160 and interview clown parade.


Sqlite to the rescue. Here is a list ordered by delay. https://dpaste.com/6JRADR63T Looks like Guayaquil is a hot ticket!


Have you considered using GitHub Actions to automate the data scraping? A GitHub Actions workflow is allowed to run for up to 6 hours so even with the PDF processing you are doing it may be enough time to generate the full site.


Yes! I actually had that set up on a previous project (which I just migrated into this one at https://visawhen.com/nvc): https://github.com/underyx/nvc-backlog/blob/main/.github/wor...

This setup was lovely and worked quite well. Except it broke due to an absolutely obscure, insane, hopeless bug: https://github.community/t/cannot-resolve-travel-state-gov-h...


Oh what a pain! I have onset if you could work around that by making HTTP calls to http://IP.OF.TRAVEL.GOV/ and manually setting the host header? No idea if you could get TLS working like that though.


It would be cool when you select a consulate and then select the visa type, if next to the visa type was a brief description of the visa. For those who don't know what all these visa type codes mean.


Yeah, thanks for the note! I was thinking of doing that as well. I left it for later cause it’ll be annoying to gather all the names :D



Very cool. Might be interesting to add some stats on how a particular consulate compares to average of all backlogs and of those in the same country.


Thanks for sharing details about the architecture and infrastructure behind this. That's pretty neat.


Can you post a link without the <>? It’s not clickable on mobile.


Let me try: https://visawhen.com/consulates/bogota/h1b

And also the repo: https://github.com/underyx/visawhen

They weren’t clickable before I added the < > either, I just thought that might fix it. Maybe I’ll have to ask dang.


Oh that’s strange, hmm.

Edit:

> Before 2020 March, Vancouver issued 3.6 TN visas per month on average. If we naively assume that the number of visa applications didn’t change during COVID, they are now 13.7 months behind expectations.

Wow I knew they were behind but jesus. Pretty neat!



Thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: