More

gjreda · on June 7, 2024

Grab some baked goods at Hewn (in Evanston), visit the Bahai temple, and then walk across the street to Gillson Park to wander the Lake Michigan shore and eat your goods from Hewn.

The easternmost portion of Northwestern's campus also has a nice walking/biking path along the lakeshore with a great view looking back towards the Chicago skyline.

Kon-Peki · on June 7, 2024

If you're a tourist, know that you can take the Purple Line elevated train from downtown Chicago. At the end of the train line, it is about 0.5 KM east, along a lovely tree-lined street with sidewalks. The street turns into cobblestones after a short distance; it is a very wealthy area.

sitkack · on June 7, 2024

Take off your sandals, and soak your toes in that crisp clear water, but only a little past your ankles or the moment will have spoiled by the crackle of the beach nazis on atvs, "No SWIMMING! GET OUT OF THE WATER".

contingencies · on June 7, 2024

That's a shame. I recall last summer seeing some dedicated and well-wetsuited people swimming much closer to town, along the concrete shoreline to the north of Ohio Street Beach.

Kon-Peki · on June 7, 2024

That's the difference between the city and the suburbs. Swimming in the city is fine if a lifeguard is on duty.

https://www.chicagoparkdistrict.com/parks-facilities/beaches

Swimming in the Great Lakes is very dangerous, especially if you are not familiar with the behavior of the lakes.

https://glsrp.org/statistics/

torstenvl · on June 8, 2024

Swimming in the Great Lakes is not dangerous at all. They account for 14 drowning deaths per year out of 4500 across the country. That's a quarter of the annual U.S. drownings in the Atlantic despite having more U.S. coastline.

npongratz · on June 7, 2024

I hate Illinois Nazis.

gjreda · on June 7, 2024

The recent changes have gotten absurd

kwhitefoot · on June 7, 2024

So much for the land of the free.

gjreda · on July 6, 2023

https://gregreda.com/

I've mostly written technical, code-centric posts on Python, ML, and data science. Some of my early posts (2013) were wildly popular at the time and hit the top of HN and various subreddits.

I haven't written much recently, but I've been trying to branch outside of technical posts as I felt like my profession had started to become too much of my identity.

The post I'm most proud of:

- https://gregreda.com/2022/11/30/this-ones-for-me/ - Feeling pride and catharsis after years of bad health luck (leukemia, bad bike crash, cardiac arrest).

My most popular posts:

- https://gregreda.com/2013/03/03/web-scraping-101-with-python... - Web scraping tutorial using Python and beautifulsoup

- http://www.gregreda.com/2015/02/15/web-scraping-finding-the-... - Another web scraping tutorial with Python, but this time for sites that dynamically load content

- https://gregreda.com/2013/10/26/intro-to-pandas-data-structu... - The start of a series of posts on Python's pandas library

- https://gregreda.com/2013/07/15/unix-commands-for-data-scien... - Some useful unix commands for data processing

- https://gregreda.com/2015/08/23/cohort-analysis-with-python/ - Tutorial on doing cohort analysis using Python and pandas

- https://gregreda.com/2017/01/07/freelance-data-science-exper... - My experience as a freelance data scientist

- https://gregreda.com/2018/02/04/hiring-data-scientists/ - My approach to hiring data scientists (though my thoughts on this have evolved over the last five years).

gjreda · on June 1, 2023

Not specific to this model, but beyond the large players (OpenAI, Cohere, etc) are there any free hosted versions of the open(ish) LLMs? Even the smaller 7B parameter ones? I'm prototyping out a project and using OpenAI for now, but it feels like there has to be a hosted alternative somewhere.

I spent some time today exploring HuggingFace's Inference API but if the model is sufficiently large (> 10gb), HF requires you to use their commercial offerings.

dzhiurgis · on June 1, 2023

> HF requires you to use their commercial offerings

Some of which are quite affordable ($80 per month). Larger ones can be like 2000 a month which is still ok to prototyping phase. You're basically paying for aws/gcp infrastructure.

I quite liked the UX of it, very intuitive. My trouble was finding a model that executes out-of-the-box tho. All of the GPT ones crash on startup.

YetAnotherNick · on June 2, 2023

https://chat.lmsys.org/

gjreda · on May 31, 2023

I recently prototyped out a "chat over PDF documents" project.[1] I opted to use LanceDB for vector (embeddings) storage and retrieval and found it really nice to use.

I'm working on using it in a large project now.

[1] - https://github.com/gjreda/scratch-pdf-bot

d4rkp4ttern · on June 1, 2023

Interesting. Curious what you found better about lance compared to any of the other vecdbs like qdrant, chroma or others.

gjreda · on June 1, 2023

I initially built this same "chat with PDFs" prototype with LangChain and qdrant. I then rebuilt it from scratch for the sake of learning and comparison.

Some context: I've been a jack-of-all-trades data scientist / machine learning engineer for the past 15 years (officially titled as an MLE the last four years).

I share that only because I think it plays a role in how I'm typically accustomed to working.

1. I found LangChain to be overkill for this use-case. While it might allow some to move more quickly when building, I found it to be cumbersome. My suspicion is this is largely because of my background - I understand how to build much of what's "under the hood" in LangChain. Because of this, I think it felt overly abstracted and I found the docs difficult to navigate and sometimes incomplete.

2. I used Qdrant via their docker image and it was simple to setup and start using. I didn't try to push the limits with it, so I can't say anything about performance. Because Qdrant runs as an http service, I found that it didn't fit well into my workflow - I'm accustomed to being able to visually inspect my data inside the interpreter, debugging, trying out commands, interacting and experimenting with my results, etc. Again, my suspicion is this is my own bias in how I typically work. Qdrant otherwise seemed very nice.

3. LanceDB felt powerful yet lightweight, and fit well into my workflow. It was far more intuitive for me. It was as if sqlite, the python data ecosystem, and a vector database had a child and named it LanceDB. Under the hood, it's built on Apache Arrow and integrates nicely with pandas, allowing me to seamlessly go from LanceDB table on disk, to pandas dataframe, and into some analysis or investigation of my LanceDB query results. This line [1] is a great example of why I liked it. This feels nicer to me than the world of API params and HTTP requests.

1. https://github.com/gjreda/scratch-pdf-bot/blob/main/gpt_pdf_...

d4rkp4ttern · on June 2, 2023

Thank you for elaborating. I concur about langchain and Qdrant.

With langchain it was struggle to figure out what was going on under the hood, I had to pull together multiple pieces from multiple notebooks simply to see what the Conversational Retriever Chain does. And then it was trivial to implement a variant of it myself with all pieces transparently in one place.

I like the Qdrant interface and docs and seems to work well so far. For local testing I used their python client rather than docker and it was seamless to switch to their cloud. My usecase doesn’t involve pandas (maybe I wanted a breather from years of pandas-wrangling!); I think the OpenAI cookbook repo has examples of using pandas in combination with Qdrant ( and many others).

gjreda · on Oct 10, 2022

> 5/1000 colonoscopy patients have complications (some fatal) which is way higher than the base rate for colon cancer.

Can you provide a source for this?

theptip · on Oct 10, 2022

I copied that stat from the link I posted in another comment: https://news.ycombinator.com/item?id=33153494

gjreda · on Oct 6, 2022

At least for Twitter, I think this still happens if you move everyone into a list and only view the list.

alexpotato · on Oct 6, 2022

You can also use search queries like this:

https://twitter.com/search?q=filter%3Afollows%20-filter%3Are...

vorpalhex · on Oct 6, 2022

And unfortunately lists are the only functional way to use Twitter.

gjreda · on June 15, 2022

Expectations are overwhelmingly for 75bps. Prior to yesterday, 50bps was expected.

- https://www.cmegroup.com/trading/interest-rates/countdown-to...

sentirist · on June 15, 2022

Exactly. Not sure what the point of these articles are with the futures implied probability.

The probability right now of 225-250 for July is incredible as well.

gjreda · on June 5, 2022

Imatinib (Gleevac) revolutionized treatment for patients with chronic myeloid leukemia (CML). Prior to the drug’s discovery, CML patients generally had seven years to live (possibly less depending on how advanced the cancer was). Now their lifespan mirrors the general population.

I’d highly recommend the book The Philadelphia Chromosome if you’re interested in learning more.

macdaknife · on June 6, 2022

I am currently taking a similar drug for Myelofibrosis on a clinical trial holed up after a stem cell transplant here at MD Anderson Cancer Center. It's called Itacitinib, and is supposed to prevent GVHD. It's by the same company that makes Jakafi (brand name). These are expensive drugs for sure, and my DR is running the clinical trial. Just wanted to chime in from the inside. ;p

moneywoes · on June 6, 2022

Best of luck for your treatment

gjreda · on Dec 13, 2020

Recently wrote some code to scrape a friend's reviews and ratings from Goodreads. Maybe it'll be useful to folks here: https://gregreda.com/2020/11/17/scraping-pages-behind-login-...

gjreda · on Oct 29, 2020

Here's an excerpt from their October 2019 letter to shareholders. TL;DR - they're a public company and the markets told them they needed to keep growing in order to survive.

> For restaurant inventory, we will rapidly expand our recent pilots of putting non-partnered restaurants on the platform. For reasons we’ve discussed many times, we believe non-partnered options are the wrong long-term answer for diners, restaurants and shareholders. It is expensive for everyone, a suboptimal diner experience and rife with operational challenges. With that said, it is extremely efficient and cheap to add non-partnered inventory to our platform and it can at least ensure that all of our current and potential new diners have the option to order from any of their favorite restaurants now, even if it’s not the best solution. By leveraging non-partnered options, we believe we can more than double the number of restaurants on our platform by the end of 2020.

> At the same time, because we know that partnered relationships are critical to the long-term success of this business, we will be investing aggressively in our independent restaurant sales organization to support converting as many of these non-partnered restaurants to partnered relationships as quickly as possible and to take advantage of other innovations in the restaurant space, like virtual restaurants.

https://s2.q4cdn.com/772508021/files/doc_financials/2019/q3/...