99 more Bottles for Ethel and a Wee Heavy Brew
Last time, we created our boozeparser to parse csv data in order to feed my initial attempt at bottle inventory into our database for the Ethel API. We had a set of simple requirements, and we accomplished the first two: 1) should be able to run from the command line and 2) should accept a .csv
file as an argument. We also imported our models from Ethel by copying and pasting our models.rs
and added a struct for our Record
to aid in reading our csv file as well.
Next: let's pick a way to talk to Ethel over HTTP locally. We could go super simple and use curl
via a rust wrapper. We could also choose to use a higher level HTTP client to make some parts (like JSON serialization) easier. I also wouldn't mind finding a client support a blocking protocol: I don't need asynchronous processing here, most of these items needed to be created in order due to the dependency of different records on each other (such as the Bottle table relying upon category, subcategory, as well as storage).
After poking around and doing some research, plus looking at some documentation for different prospective crates on crates.io, I decided to trying reqwest, with the json and blocking features added in our Cargo.toml
.
I added a GET
call using reqwest to our root of the API for Ethel, knowing it would respond with 'hello world', and printed this into our application to test our call (making sure I had Ethel running locally). I added a new function create_categories
to hold our new request (knowing we will probably get some warnings due to not using Results):
Let's add a call after parsing our csv then build and run our code:
Success! You can see the output of the body of our request to Ethel yields our expected "Hello, world!" output from our root function we created waaaaaay back in our first experiments with Rocket and Rust Part II. Now we can start actually doing things.
Okay okay... I may have jumped ahead. First, we should make sure we have our results properly de-duped into arrays to create after we imported our models. Then we can feed each list (in dependency order, Categories, then Storage, then Sub Categories, then, finally, Bottles which rely upon the others). Another edge case I want to discuss is whether I want to worry about pulling existing records to check for existence, or assume a "clean" db on running the parser. Decisions, decisions...
Let's make note of it, then move on for now. My choices are either to pull the entire existing list for comparison or to create a "search by name" APIs in Ethel which do not exist today. I could also add a unique constraint on the name for bottles, as well, and catch duplicate issues on create... Okay I said we would move on for now, moving on 😅.
Let's focus on how to get our lists of records. I decided to use the HashSet
collection type to gather up my individual collections of records. Using HashSet
provides some additional advantages beyond using Vec
: mainly, it handles duplicates for us (will just replace the old with the new version of the same entry). This does require some additions to our models we want to create HashSets
for, though, as we need to support Eq
, PartialEq
, and Hash
in order to support the functionality of our chosen collection.
It's been awhile since we looked at the entirety of main.rs
. Let's see what our code looks like with our new HashSet
for categories and updating our create_categories
function to actually call our real API!
I made some decisions while writing this section of code: I decided to maintain a HashMap
of all the created categories to reference by name later (we will need the real database id's of our categories to create sub-categories, for example) and will try this pattern for all of the models. Let's do the thing!
Success! Alright, not we are going to run into our problem from earlier: how do we make sure not to create categories already created in the database. I could just clear the db again... but in the case of a partial failure, it would be really nice to be able to rerun the script and only have to create things we hadn't previously. I am going to add a call to Ethel to populate my maps.
I am not in love with this function and I may revisit it, but it is super readable, so I will take it. This makes the same kind of HashMap<String, Category>
so we can easily lookup the categories by the name of the category from the csv, since this is all we have. Then we simply have to add a check to see if the key already exists in our HashMap
before adding to our HashSet
of categories to create!
Finally, when the new categories are created, I can extend my existing HashMap
containing my existing categories with the new HashMap
of created ones!
categories.extend(create_categories(client.clone(), new_categories)?);
Now we have a model for retrieving existing records. Let's do the same for our other records! While starting to fiddle with my sub category creation I had another realization. A real face palm moment. If I go down the path of gathering all the information first then creating records in batches... It means I would need to create a new kind of storage for the structs which reference other tables... For example, the NewSubCategory
struct requires and ID for a the category.
I have a few choices: I could store the Record
s I serialize from CSV then iterate in memory, doing a pass on the records for each struct. Another option is creating an new way to store the references with the string values temporarily then doing a lookup once I create the records in a cascading fashion. Finally, I could do all the creation inline while deserializing the Record
in the file.
The last one would be "least code" but also the most likely to crash or run into issues. I am waffling on just doing it this way because, as discussed early in this project, this is only meant to "run once" in order to parse some data into my database then never have to touch it again. I think since I added the check for not recreating records I already I have, rerunning it is also low risk, and only hurting myself and my machine.
I would never recommend doing something like this in production, for sure, but this is a great example of KISS. There is no need for me to make a really awesome performant little application right now: I just need to get the damn data in the damn database 😂. So, to be clear, my current proposal is structuring my app like this:
- Look up existing records to reference ids for new creation and prevent duplication
- Parse the CSV file
- While deserializing each record, create each needed value inline:
- Category
- Sub Category
- Storage
- Bottle
Alright, let's repurpose our category code and start creating the other functions we need! This means we can get rid of our HashSet
implementation for uniqueness as well, since we can just check if the HashMap
of the individual struct types has the key equivalent to the name of the field on the Record
.
While I could probably do some more cleanup, here is my eventual, working result! I did into a fun problem around what key to use the for storage: I can't just use the name, because the name on the storage is not unique. My shortcut was to cram the name, room, and shelf all together to create a unique hash key and use it. I will also say... I am not proud of having this all crammed on one file 😅.
I decided to have the script output the items it created, as well as print some more sensible error messages if I ran into an issue looking up a key for some reason (the more verbose error handling was a result of not quite getting my compound hash key for the storage right the first time).
After running the script a couple times (yay, dupe checking!) I have the five records in my test csv in my database! Now when we call the bottles endpoint in Ethel, we get the following response:
Success! Before we feed the entire csv in, I want to test one more thing: a record with two sub categories. I updated my csv file to the following:
Perfect! Everything appears to be working as expected. Let's feed it the real csv now... Okay that is a lot of terminal output! But everything worked, and now I officially have all of the original bottle list of 129 bottles into my database and available to my API.
Stemma Brewing's Wee Heavy Scotch Ale
We had a bit of cold snap recently in the PNW. Some folks call it "June-uary", the last gasp of Pacific moisture smacking into the coast before the drier, sunny summer. I like this weather, it's part of the reason I stay. It also let's me not feel like a complete weirdo while I decide to drink some heavy rich beer usually relegated to the darker winter months!
Stemma Brewing in Bellingham, WA has some stellar beers. I am a regular drinker of their coffee oatmeal stout! Recently a local store had special edition beer from them as a part of their Staff Animation series, Beers with Captain Mark: a wee heavy scotch ale weighing in at 9.8% ABV. It is rich, boozy, and very very yummy. I am glad it's only a small can!