How Amazon Q Developer helped me to save days of work

How Amazon Q Developer helped me to save days of work

A 15-Minute love-story about accelerating data collection, prompt engineering and lessons learned.

Introduction

Recently, I had an eye-opening experience with Amazon Q Developer that I'm excited to share. In just 15 minutes, I created a Python script that crawls the entire official Bundesliga DataHub API – a task that would have typically taken one to two full days of work. Our team needed comprehensive sports data to enhance our AI services for content localization. The challenge was to efficiently gather data on seasons, competitions, clubs, players, and official team staff members from the Bundesliga, spanning from 1908 to 2024. This was no small feat, considering the vast amount of data and the complexity of the API structure.

In this blog post, I'll take you through my journey with Amazon Q Developer, from the initial prompt to the final, comprehensive Python script. I'll share the (few) challenges I faced and the key learnings from this experience. Get comfortable, for this 15-minute love story between a solution architect and an AI coding assistant.

Why is a Solution Architect collecting Sportsdata?

One of our teams was tasked with a project requiring comprehensive data from the Bundesliga DataHub API. They needed everything from historical seasons dating back to 1908 up to the most recent 2024 data, including details about competitions, clubs, players, and official team staff members. In total, this amounted to 13,498 single API requests – a daunting task. I wanted to help the team by taking the lead on this.

My traditional approach would have involved using tools like Postman, which would require:

  1. Calling the APIs

  2. Copy-pasting the responses

  3. Saving the responses as files and sharing them with the team

Conservatively, I estimated this would take 1-2 full days of focused work – a significant chunk of time, especially when juggling multiple projects and deadlines. Instead of falling into old habits, I turned to Amazon Q Developer. Could it help me automate this process? Would it understand the nuances of the Bundesliga sports data API? Could it handle and understand relationships in data? Most importantly, would it truly save me time, or would I end up spending hours correcting its output?

You might already guess how it ended, but please continue reading as I take you through this experience, step by step, and show you how Amazon Q Developer not only met but exceeded my expectations, proving that sometimes, the right tool can indeed feel like magic.

Pair Programming with an AI Assistant

With a clear understanding of the task at hand, the whole love story began with a simple prompt (sensitive information replaced with placeholders in all examples):

Assist me in writing a python script that crawls football sportdata 
from an API. The API provides several endpoints to access XML feeds 
for seasons, competitions, clubs and players.

The script should start with downloading all available seasons by 
making a GET request to this endpoint: {endpoint url}

The response from Amazon Q was a solid baseline script. Without further instructions, Amazon Q Developer made an educated guess on the potential data structure of the response. I helped Q out by providing an example of a response from our seasons endpoint. This is where the real magic began.

Update the script to parse the following example response from the API:
{example response}

Amazon Q Developer responded with an updated and precise version capable of handling the expected data structure correctly. I continued to add more instructions to expand the script's capabilities:

  • Processing competitions data

  • Downloading club data for each competition and season

  • Implementing specific folder structures for data organization

  • Processing player and coach data

  • Downloading referee data

The following prompt template was used to download and traverse various data types:

Update the script to also process and download the competitions
data using the competitions endpoint: {endpoint url}.

Here is an example response: 
{example response}

Without some further instructions from my side, Amazon Q Developer made a suggestion on how to organize the downloaded files. As the script evolved, I provided additional instructions for better organization of the output.

Do not include a timestamp in the filename. Just overwrite it.
Save the response in the following folder structure 
./out/<seasonId>/<competitionId>

When running the script, I recognized that combinations of seasons and competitions resulted in an error payload. Instead of having all these files with error payloads in my .out folder, I stopped the script for a moment and gave Amazon Q Developer feedback of this runtime behaviour:

Update the script to skip the processing in any case there is 
a Status node in the response. 

Having a status node in the response is indicating an error like 
missing permissions or unavailable data for the given combination.

As more and more data were crawled, I observed situations, that resulted in SSL related errors. Amazon Q Developer to the rescue, it was able to help me out and implement a retry logic with exponential backoff using this simple prompt:

The script is throwing the following error:
requests.exceptions.SSLError: 
HTTPSConnectionPool(host='{}', port=443): Max retries exceeded 
with url: {} (Caused by SSLError(SSLEOFError(8, 
'[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation 
of protocol (_ssl.c:1000)')))

How can I mitigate this?

After implementing a retry logic with exponential backoff to handle SSL-related errors, the script was complete. Had to remove the feed processing parts of the script due to security and compliance reasons, but this is what Amazon Q Developer created:

def fetch_xml_data(url, max_retries=3, backoff_factor=0.3):
    session = requests.Session()

    retries = Retry(total=max_retries,
                    backoff_factor=backoff_factor,
                    status_forcelist=[500, 502, 503, 504])

    session.mount('https://', HTTPAdapter(max_retries=retries))

    for attempt in range(max_retries + 1):
        try:
            response = session.get(url, timeout=30, verify=False)  # Increased timeout and disabled SSL verification
            if response.status_code == 200:
                return response.content
            else:
                print(f"Failed to fetch data from {url}. Status code: {response.status_code}")
                return None
        except requests.exceptions.SSLError as e:
            if attempt < max_retries:
                wait_time = backoff_factor * (2 ** attempt)
                print(f"SSL Error occurred. Retrying in {wait_time:.2f} seconds...")
                time.sleep(wait_time)
            else:
                print(f"Max retries exceeded. Failed to fetch data from {url}.")
                print(f"SSL Error: {str(e)}")
                return None
        except requests.exceptions.RequestException as e:
            print(f"An error occurred while fetching data from {url}: {str(e)}")
            return None

# Main function
def main():
    # Fetch seasons data
    seasons, seasons_root = fetch_and_process_data(SEASONS_ENDPOINT, process_seasons)
    if seasons is None:
        print("Failed to fetch seasons data. Exiting.")
        return

    # Fetch competitions data
    competitions, competitions_root = fetch_and_process_data(COMPETITIONS_ENDPOINT, process_competitions)
    if competitions is None:
        print("Failed to fetch competitions data. Exiting.")
        return

    # Process and save seasons and competitions data
    save_xml_to_file(ET.tostring(seasons_root), f".out/seasons.xml")
    save_xml_to_file(ET.tostring(competitions_root), f".out/competitions.xml")

    # Fetch and save club data for each season and competition
    for season in seasons:
        for competition in competitions:
            club_endpoint = f"..."
            clubs, clubs_root = fetch_and_process_data(club_endpoint, process_clubs)

            if clubs is not None:
                file_path = f".out/{season['id']}/{competition['id']}/clubs.xml"
                save_xml_to_file(ET.tostring(clubs_root), file_path)

                print(f"\nClubs for Season {season['name']} and Competition {competition['name']}:")
                for club in clubs:
                    print(f"  {club['name']} (ID: {club['id']})")
                    print(f"    Short Name: {club['short_name']}")
                    print(f"    Type: {club['type']}")

                    # Fetch and save player data for this club
                    player_endpoint = f"..."
                    players, players_root = fetch_and_process_data(player_endpoint, process_players)

                    if players is not None:
                        player_file_path = f".out/{season['id']}/{competition['id']}/{club['id']}/players.xml"
                        save_xml_to_file(ET.tostring(players_root), player_file_path)

                        print(f"    Players:")
                        for player in players:
                            print(f"      {player['first_name']} {player['last_name']} (ID: {player['id']}, Shirt: {player['shirt_number']})")
                        print(f"    Total players found: {len(players)}")
                    else:
                        print(f"    Skipping player data for this club due to error or unavailable data.")

                    # Fetch and save team officials data for this club
                    officials_endpoint = f"..."
                    officials, officials_root = fetch_and_process_data(officials_endpoint, process_officials)

                    if officials is not None:
                        officials_file_path = f".out/{season['id']}/{competition['id']}/{club['id']}/officials.xml"
                        save_xml_to_file(ET.tostring(officials_root), officials_file_path)

                        print(f"    Team Officials:")
                        for official in officials:
                            print(f"      {official['first_name']} {official['last_name']} (ID: {official['id']}, Function: {official['function']})")
                        print(f"    Total team officials found: {len(officials)}")
                    else:
                        print(f"    Skipping team officials data for this club due to error or unavailable data.")

                    print("  ---")
                print(f"Total clubs found: {len(clubs)}")
            else:
                print(f"Skipping club data for Season {season['name']} and Competition {competition['name']} due to error or unavailable data.")

if __name__ == "__main__":
    main()

I had a timer running in parallel: In just 15 minutes, Amazon Q Developer and me had created a robust Python script capable of requesting, processing and downloading all relevant data feeds.

The script efficiently handled 13,498 single API requests, saving the responses in a structured format, and implementing error handling for various scenarios.

Lessons Learned

The power of clear communication

What struck me most was the accuracy and completeness of the code generated by Amazon Q Developer. I didn't need to make any manual modifications to the script. It was purely created through our back-and-forth conversation, with Amazon Q understanding the context and requirements perfectly. The quality of the output was directly related to the clarity and specificity of my prompts.

Clear, specific prompts play a crucial role in getting the desired output from Amazon Q Developer. When I provided detailed context and examples of API responses, the quality and accuracy of the generated code improved dramatically. Effective communication skills along domain expertise makes the difference when working with such AI tools.

For instance, when I shared an example of the API response structure, Amazon Q Developer was able to adapt the script precisely to handle the correct data structure. This iterative process of providing examples and getting refined code became the cornerstone of our collaboration.

Update the script to also process and download the competitions
data using the competitions endpoint: {endpoint url}.

Here is an example response: 
{example response}

Amazon Q Developer just needed one or two examples about the data structure the API returns. After that, Amazon Q was able to make a right guess on most of the following data structures of players, clubs and referees added as features afterwards.

It is an iterative development not a single-prompt-shop

The success of this project was largely due to the iterative approach I took with Amazon Q Developer. Instead of expecting a perfect script on the first try, I built the functionality piece by piece, adding features incrementally. This approach allowed for better control and understanding of the process, and it helped in identifying and addressing potential issues early on. Each iteration brought us closer to the final product, with prompts like:

Update the script to process and download club data for every 
competition of every season using the club endpoint: {endpoint url}

Breaking down prompts into smaller, more manageable parts resulted in a more robust script but also allowed me to learn and adapt my interaction style with the AI assistant.

Balancing AI assistance with human oversight

While Amazon Q Developer proved to be an incredibly powerful tool, this experience reinforced the importance of maintaining human oversight in the coding process. The AI was excellent at generating code based on my prompts, but it was my responsibility to ensure that the generated code aligned with our overall objectives and met our specific requirements. For example, when the AI made assumptions about error handling, I was able to provide more specific instructions.

Update the script to skip the processing in any case there is 
a Status node in the response. Independent of the status code. 
Having a status node in the response is indicating an error 
like missing permissions or unavailable data for the given combination.

A matter of mindsets

My role shifted from writing code to problem-solving and providing context, allowing for a more strategic approach to development. Although Amazon Q Developer did the hard work of typing, my instructions and how I framed the problem made the difference. It's not about replacing human expertise but augmenting it, allowing us to focus on higher-level problem-solving and creativity. The code generated by Amazon Q Developer was a testament to the potential of AI's understanding of the context and requirements if we as engineers are able to shift mindsets.

As we continue to integrate AI tools into our development processes, it's crucial to cultivate this new mindset that balances technical expertise with the ability to effectively leverage AI capabilities.

Conclusion

In just 15 minutes, Amazon Q Developer proved its value as a powerful assistant, helping me automate a task that would have otherwise taken days. Through clear communication, an iterative approach, and a balance of AI assistance with human oversight, I was able to create a robust Python script to efficiently crawl the Bundesliga DataHub API.

There is a critical shift in how we approach software development. It's no longer just about writing code; it's about how effectively we can leverage AI to augment our capabilities. Amazon Q Developer didn't replace my expertise—it complemented it, enabling me to focus on problem-solving and strategy while the AI handled the heavy lifting.

As AI tools become more integral to our workflows, embracing this shift will not only boost productivity but also free up time for creativity, innovation (or drink more coffee with my engineer fellows). The key is to see AI as a partner, working alongside us to push the boundaries of what we can achieve. And when used effectively, it truly can feel like magic.