Mock Turtles All the Way Down

May 08, 2024

Mock Turtles All the Way Down

Today's article delves into the intricacies of creating mock hotel data for my travelectable website. I originally started using the Amadeus Developers' API to populate the site but it faces a few issues:

It's quota restricted. If my site gathers any kind of traffic (granted that's a big if), I'll have to rate limit my users or pay to increase the limit without any expected revenue to cover the new expenses.
Because it's a 3rd party aggregator for hotel chains and airlines, its descriptive content - at least in the test environment I'm using - can leave me wanting. Sometimes the hotel descriptions are simply a grouping of nearly disjoint keywords.
The test sandbox limits certain API calls to certain locations. Star ratings are only available for London and New York and, even then, further restricted to certain hotels.
Since the data is sourced from vendors' live systems, I can't control the rate availability. This is a problem for a proof of concept website.

Luckily (?) I have my handy Generative AI tools on hand to help me build a robust, reusable mock data framework quickly (?). It's as simple as typing the following query and sitting back, no?

User: Generate a list of 40 top hotels in New York.  For each hotel, generate a list of 4 room rates with amenities and an estimated room rate.

No.

LLMs - though they're better at understanding natural language than the average computer, are still horribly literal. And, since, they have the power to continue generating text unabated, their literal misinterpretation becomes incomprehensible by the time they've finished. They're also horrible at counting. And, unless directed specifically, they're shy about being wrong, so they may refuse to answer your question at all (and yet they have no problems lying in completely unrelated areas with unmitigated confidence).

When I tried variations of the above query, I often got a response that I should check various travel sites for rates. This isn't helpful when you're attempting to programmatically generate as much data as possible in one go.

LLMs also suffer from what I'll deem "token fatigue," where something like 'estimated room rate': $120' becomes 'intimate room rate: $120' as the length of the output increases. As a public service announcement - beware of 500-word essays discussing the root causes of WWII. By the end of the essay, you may be informed that an underlying cause of the war was due to ill will between the Weimaraner Republic and French pastries.

And, then, sometimes, they'll just stop generating output at all. Lovely.

I spent nearly a day massaging test queries for domestic US vacation spots only to receive answers for Quebec City, Cancun, and...Mongolia. When I asked for 50, I'd logically get 18 back. Or 37. Or 50, where Savannah was listed twice.

I finally recognized my own folly in my original strategy and understood that I didn't need to drink from the firehose in order to succeed. I could build up my data slowly with my own safeguards (say restricting the total number of locations I query or stopping asking for hotels when I've reached 40 in a given location).

Asking the LLM to structure the data also seems to make its output more reliable and has the added bonus of my being able to write code to store it easily.

Here are four example queries I'm using to build my hotel mock data. I'll curate the list of selected destinations ahead of time (probably 50-100 US domestic and 50 international), but these queries should allow me to build up a rich, plausible data set for the site:

Write a 100-word marketing description for visiting New York. Format it as {'description':<description>}

Provide the lat and long for New York as well as 5 points of interest. Structure your output like this:

{'city':<city>,'state':<state>,'lat':<latitude>,'long':<longitude>,'points_of_interest':[<point_of_interest:>...]}

If the city doesn't have a state (like Washington D.C), leave the state blank.

List 10 hotels within 5 miles of New York's lat/long (40.7128,-74.0060). Provide their star ratings, addresses, distance in miles from Chicago's lat/long, a 50-word description of the hotel, and an estimated teaser rate per night for both summer and winter seasons. The rate doesn't need to be up-to-date, but estimate the value based on historical rates.  Also account for the rate being off-season or in-season as appropriate.
Format the response as:
[{"name":<name>,"address":<address>,"distance":<distance>,"star_rating":star_rating, "description":description,"winter_offer_rate":winter_offer_rate,"summer_offer_rate":summer_offer_rate},...]

For "The Langham Chicago Hotel", provide 4 different room offers. Format it as follows:
[{"room_type":<room_type>, "room_description":<room_description>,"occupancy":<occupancy>,"amenities":[<list_of_amenities>],"winter_rate":<winter_rate>,"summer_rate":<summer_rate>,"cancellation_policy":cancellation_policy}]
Don't worry if the information isn't up-to-date. Provide a best estimate that matches historical information.  For the rates, take into account the seasonality of Chicago, so higher rates aren't present in the off-season.

Using the above queries, I'm able to limit the erratic nature of the responses from the LLM while also being able to generate sufficient data to build up a comprehensive mock data library without needing copious amounts of post-output tinkering. I can do this on individual test searches on my site instead of batching the requests ahead of time in one bundle.

The data - specifically the rates and the room descriptions - is unlikely to be "real," but it will have a strong sense of verisimilitude and, more importantly, stability, for the test site.

I was tempted to generate fake hotels as well just to make the entire experience lie within the realm of suspended disbelief, but Meta.ai offered these gems for Chicago-themed suggestions

Eddie Vedder's Escape
Kanye's Kingdom
Oprah's Oasis
Capone's Castle

and ChatGPT was no better. So, for now, I'll stick with real hotels and pseudo rates.

Until next time my human and robot friends.

Search This Blog

Chicago Bot Dog

Mock Turtles All the Way Down

Comments

Post a Comment

Popular Posts

How I Got Here - I'm a Manager!

A Non-Quantified Unsolicited Opinion on Metrics for Business