Dealing with dataset explosions

Hi, I’m joining tables and I wanted to see if you had any ideas on how to prevent the tables from exploding and creating so many rows.

I want to be able to have the continent and the city on the joined table. I also want to be able to have the name of the people on that table but I don’t want to see the explosion that is taking place below. I understand why the explosion is taking place and why I am ending up with so many rows and I also understand that joining tables like I am doing it is not optimal. I just wanted to know if there was any way to work around the explosion and maybe end up with something like what you can see on table 4. Is there any way to prevent the cross joining?

table 1 table 2
Continent Country Name Country City
North America USA James USA DC
North America USA John USA NYC
North America USA Matthew USA LA
Europe UK Nick Canada Liverpool
table 3: tables joined on country
Continent Country Name City
North America USA James DC
North America USA James NYC
North America USA James LA
North America USA John DC
North America USA John NYC
North America USA John LA
North America USA Matthew DC
North America USA Matthew NYC
North America USA Matthew LA
Europe UK Nick Liverpool
table 4
Continent Country Name City
North America USA James DC
North America USA John NYC
North America USA Matthew LA
Europe UK Nick Liverpool

Hello @mateoleon210 - It seems that you need to join the tables using both Country as well as Name column. However, I don’t see the Name column being present in Table 2. In that case, how are you deriving that James belong to DC and not NYC or LA. If there’s any custom logic that can help us to derive that mapping between Name and City then I can try to think through it. Thank you!

1 Like

Hi @mateoleon210 - As a best practice you need a right granularity to associate two tables, which in this case it is not clear whether James Belongs to DC or NYC within USA. Data needs to be prepared accordingly.

I understand that there is no way to determine a mapping between name and city.

My example on table 4 just shows a random arrangement of people and city where there are no explosions.

Is there any way to achieve anything like table 4? where the relation between name and city is irrelevant and there are no explosions?

@mateoleon210 - In case, you are looking for some sort of option for exactly the same context that you shared, then I believe you can create a calculated filed at the Analysis level to generate running counts Name and Country (which is getting generated due to the explosion). Now you can use that field and set a filter to like = 1 to only select a single record for any such multiple combination to hide the explosion.

The expression will look something like:

runningCount({Continent}, [{Continent} ASC], [{Name}, Country])

Although, this is highly un-recommended, since you mentioned that that is the only available scenario having constraints, I just tried to share across my thought process. Hope this helps! Thank you!

1 Like

What’s the purpose of the join? Is it so that you can filter by a city to find users in the country where the city is located?